I've always found it a little funny that SQL was originally designed for non-programmers, sort of like AppleScript. I used to think neither of those panned out, but in fact there really are a lot of smart not-programmers who can use it. At a company I work with many of the support staff have been learning SQL to help customers pull reports from our data warehousey reporting database. So maybe the article is onto something about the value of the language.
On the other hand, SQL-the-language isn't essential to relational databases. I have often wondered where C.J. Date has been the last few years. I actually love SQL, but it does have its limitations. I wouldn't mind a solid relational database with an alternative query language. It's such a missed opportunity for a great VC pitch: Tutorial D, NoSQL before it was cool. :-)
It's very tedious to write and change a big query. It's really hard to read a big block of SQL.
And it's bizarrely rigid about certain rules, meaning "English-like" is more of a hindrance because you're tempted to think, "This makes sense in English," even though SQL doesn't allow it.
Good? Bad? It's just... a thing from that era. Another go-round of "I know, if it reads like English, it'll be easy to use", which in the present day manifests itself in stuff like Cucumber.
I really believe that Spark's "more-than-SQL" query interface is how things should be. SQL is of course the gold standard and probably represents north of 90% of analytic workloads, but there a lot of queries that (especially for us programmers) are much easier to express procedurally/functionally, rather than purely declaratively.
Many years ago, I worked on a project that required changes made to a reporting system that the original author had used procedural means to do data selection and updates. It was a highly convoluted "mess". The section I had to update was going to take some time to analyse the code to ensure that the specific updates would work.
I looked at what the code was doing and restructured into a set of queries that would do the same task in a simple sequence. It turned out that this simplification also sped up the entire update process by at least 1 if not 2 orders of magnitude.
There are few cases where processing the data procedurally is better than using SQL queries. Most time that it has been done procedurally, I have found that there are better ways to do what you require by judicious thinking and using SQL (the bronze standard).
I should say that my definition of procedural processing involves opening queries and processing tuples one by one. If you have a different definition then please let me know.
I mostly agree, but I think it fundamentally relies on a good query plan optimizer to figure out the proper order to execute joins, rewriting predicates, etc.
So I think I take the position that the machine is not perfect [1], and doesn't always provide a perfect abstraction of a fast declarative answering interface. Sometimes you really do need to tell it how to access the data. This is why, for example, some SQL query engines let the user provide join hints.
That said, I do agree that procedural queries are mostly a quick fix, and not very future-proof (against future improvements to the query engine).
And FWIW, Spark's DataFrame API [2] is not actually that procedural; it lets you specify something that feels like a direct query execution plan, but actually still gets optimized underneath.
To add onto this, you could theoretically call SQL a logic programming language and we've had good success teaching some of our non-programmers a different logic programming language (sort of like datalog) to do some work in.
Could be something there with logic programming. Or could not be something there.
> So maybe the article is onto something about the value of the language.
The strongest feature of SQL is the lack of specificity about "How it should return results" and primarily deals with the details of what the expected results are.
I work on a SQL engine which honestly breaks so many rules of "How to do things" and basically tries to avoid nearly everything a sane SQL engine engineer would do.
But the advantage of the lack of forcing implementation is that a new idea like this could still implement the "give me expected results" part of the implementation.
Whenever it doesn't - it needs fixing, not documenting in a "vs vs" comparison.
> I actually love SQL, but it does have its limitations. I wouldn't mind a solid relational database with an alternative query language
MDX is a pretty interesting thing to think with.
Mostly because if you're used to spread-sheets, it is a more natural way of expressing what you want generally in a straight forward lookup order - get me some columns from these rows, where some condition is satisfied.
That makes index & cube lookups so much easier to detangle for an engine than a more free-form SQL tree which has so much more variety in it.
Also, SQL was designed back when most programming was done in assembler or C. Compared to assembler, SQL is easy. Programming itself has gotten a lot easier since.
SQL was designed by IBM in the 1970s, to run on mainframes. The community by whom and for whom it was developed was focused on PL/I and COBOL and largely didn't use C.
The first implementation of SQL in C was probably Oracle V3 (released in 1983). IBM's equivalent SQL offerings around the same time (SQL/DS for VM/CMS and DB2 for MVS) were written in PL/I dialects and assembly, and COBOL, Fortran and PL/I were supported as application languages. I believe IBM's first forays into using C with SQL was the development of OS/2 EE Database Manager in C, somewhat later in the 1980s. (The current DB2 Linux/Unix/Windows code base is originally descended from OS/2 EE Database Manager.)
I was going by Oracle's "first commercial implementation," as a starting point. I found it it was originally written in Fortran, but later ported to C. I wouldn't have guessed fortran.
Oracle was originally written in fortran and then redone in C, which it has been written in ever since. In fact, when we ported Oracle to the mainframe for the first time we were faced with an issue -- do we rewrite Oracle in assembler (there was no c compiler on the mainframe back then) or do we write a C compiler? We wrote a C compiler.
On a side note, I remember watching an interview or reading an article (probably from Larry Ellison) who said IBM gave 1/3 of it's business to Intel, another 1/3 to Microsoft and the final 1/3 to Oracle. Pretty crazy.
By most accounts Oracle V2 was primarily written in PDP-11 assembler. Tom Kyte's suggestion it was written in Fortran contradicts most other sources.
IBM's problem was that they were slow to commercialise the technology despite having a significant head start. IBM built multiple prototype RDBMS systems but they weren't willing to ship the result as a commercial product, and Oracle beat them to it. And even then when they did finally ship, their initial offerings were restricted to their mainframe systems (first VM/CMS and then MVS), while Oracle back then was open to porting their database to just about anything. IBM didn't really come to the cross-platform party until the 1990s, by which time Oracle was well established in that market.
(Disclaimer: Former Oracle employee, although I never worked on the database side of the business, and I'm talking about stuff that happened when I was a kid, or even before I was born.)
Tutorial D obviously isn't SQL, but its not noSQL, either; despite the name, noSQL refers to non-relational stores, not relational stores with alternative query languages.
But as much as I think it's better than SQL, I don't think Tutorial D (or D as a concept more broadly) offers enough to really displace SQL.
> noSQL refers to non-relational stores, not
> relational stores with alternative query languages.
Very well said. The concept of relational databases (based on relational/algebraic theory) is thoroughly independent of the dialects to access said databases. Currently it's SQL.
SQl is not relational for all the hype that ORACLE/IBM/Microsoft have put out over the decades.
SQL fails to be relational because it doesn't deal with relations. At its base, it deals with bags of records and not relations of tuples.
Tutorial D is relational but has a lot of warts. D4 was relational but changed to allow access to SQL databases.
If my understanding is correct Rel (by Dave Voorhis) is a Java implemented Relational DBMS (a true RDBMS). There are a variety of projects also are designed to implement a true RDBMS. Another might be Andl (by David Bennett) and there is also SIRA_PRISE (by Erwin Smout). There are more and are documented at www.thethirdmanifesto.com, if you are interested.
SQL was designed for mainframe programmers, made to look similar to COBOL and PL/I presumably with the idea that at some point in the future it could be integrated into one of those.
Its syntax has all the drawbacks of COBOL's syntax: too many useful words end up as reserved; it doesn't compose very well, leading to statements with very complex syntax rules; and it lulls users into a false sense of security by looking like natural language while being something very different (see HAVING vs. WHERE).
>I work with many of the support staff have been learning SQL to help customers pull reports from our data warehousey reporting database.
Why wouldn't you use a visual query tool for this? Tableau and similar apps generate pretty decent SQL queries nowadays. Even old-school BusinessObjects does a decent job (although requires way more initial modelling).
SQL is just much more powerful. If your staff learns SQL they can pull out very complex reports easily, aggregating and combining, joining data. Expressiveness of SQL allows you to get better results you can get from any visual tool.
I deal a lot with SQL for reporting and analysis and wouldn't call a 300-line query easy. I found that visual tools can cover 90% of the cases for querying, merging, aggregating, filtering and calculating. They also help with hand-written SQL for the remaining 10% because you can use them to generate parts of SQL that can be used as subqueries or as a starting point in a hand-written SQL query. Also debugging complex SQL queries can be a pain, because SQL is a declarative language. You can't just put a breakpoint in a SQL query and see capture its intermediate state.
I guess you are right. Also often you will need to aggregate data from various sources (not only relational database but log files, XML, CSV files etc). I have seen Ab Initio used for this before.
I think Tutorial D would be more suitable as an ORM than a layer on top of SQL, I feel there would be less of an impedance mismatch if a Tutorial D implementation is a DSL. There is a Haskell implementation, but I'm not going to be doing work in Haskell any time soon.
"2. SQL is intended to be accessible to users without formal training in mathematics or computer
programming. It is designed to be typed on a keyboard. Therefore it is framed in familiar English
keywords, and avoids specialized mathematical concepts or symbols."
http://researcher.ibm.com/researcher/files/us-dchamber/SQL-e...
"Closeness to Natural Language - Clearly, the majority of users
should not have to learn either the relational calculus or algebra
in order to interact with data bases. However, requesting data by
its properties is far more natural than devising a particular algorithm
or sequence of operations for its retrieval. Thus, a
calculus-oriented language provides a good target language for a
more user-oriented source language."
On the other hand, SQL-the-language isn't essential to relational databases. I have often wondered where C.J. Date has been the last few years. I actually love SQL, but it does have its limitations. I wouldn't mind a solid relational database with an alternative query language. It's such a missed opportunity for a great VC pitch: Tutorial D, NoSQL before it was cool. :-)