Code In PostgreSQL: You can use WITH to name specific parts of SQL
This series of articles
This is the second of the Code in PostgreSQL series of articles.
Articles in this series
- You can do lots with just IN, ORDER BY and LIMIT
- You can use WITH to name specific parts of SQL
- Combining data from multiple tables with INNER JOIN
- A NoSQL developer might not know about: GROUP BY ( Soon to be published )
- Sub select ( Soon to be published )
- Variables ( Soon to be published )
- Prepare JSON ( Soon to be published )
- Create Function (and testing it) ( Soon to be published )
- Custom Aggregates ( Soon to be published )
The reason why SQL is so important
When developing systems we often have a choice of writing code (NodeJS, C#, Python or PHP etc) or SQL. I believe that sometimes the decision to write code is taken without fully evaluating how much of the task could be offloaded to SQL.
In this series of articles I wish to show the huge benefits of using and learning SQL by examining progressively more difficult scenarios with increasing amounts of SQL knowledge. In doing this I hope to illustrate that sometimes large amounts of extra code is written for what SQL can achieve quicker, with less complexity and more readability.
To be more specific, we should try to follow the rule of least power more often.
About the Ergast data set
For this series of articles we will be using the Ergast data set, which is a provided under the Attribution-NonCommercial-ShareAlike 3.0 Unported Licence.
Setting up the Ergast database within PostgreSQL
To set up the Ergast database within PostgreSQL I did the following:
I allowed psql
and friends to work without me having to put in a password all the time by configuring the PostgreSQL environmental variables.
export PGUSER=postgres PGPASSWORD=postgres PGDATABASE=postgres PGHOST=127.0.0.1
Then import the Ergast database. NOTE: At the time of writing I was unable to install the PostgreSQL version.
wget -O /tmp/f1db_ansi.sql.gz http://ergast.com/downloads/f1db_ansi.sql.gz
cat /tmp/f1db_ansi.sql.gz | \
gzip -d | \
sed 's/int(..)/int/' | \
sed 's/ \+AUTO_INCREMENT//' | \
sed "s/\\\'/\'\'/g" | \
sed 's/UNIQUE KEY \"\(\w\+\)\"/UNIQUE /' | \
sed 's/^ *KEY .*(\"\(.*\)\")/CHECK ("\1" > 0)/' | \
sed 's/ date NOT NULL DEFAULT .0000.*,/ date,/'| psql
Assumed level of SQL Knowledge
In this JavaScript example we will assume the writer has sufficient SQL knowledge to use a WHERE
statement along with the ability to only return certain fields using SELECT
. After this we will see how this can be accomplished in one single SQL statement using IN
, ORDER BY
and LIMIT
.
Where we are
We saw in the previous article how we could select rows from one table using a condition on another table. In doing this we noticed that the SQL we created did not give a name to the section of code within the IN (...)
clause, which we did in the JavaScript code.
For the next few articles our SQL will be simple enough that we will not feel a great need to name sections of code. At some point however, as the complexity ramps up, we're going to want to start naming sections of our SQL to enhance readability.
The original SQL
SELECT
"driverStandings".points,
"driverStandings"."driverId",
2017 as year
FROM "driverStandings"
WHERE "raceId" IN (
SELECT "raceId" FROM races
WHERE year = 2017
ORDER BY "round" DESC
LIMIT 1
)
ORDER BY
"driverStandings".points DESC,
"driverStandings"."driverId" ASC
The aim
We would like to give a name to the section of code within the IN (...)
clause.
Common Table Expressions
Functions in code are magical, yes they have a name, take parameters and give you back a result, but the amazing thing is that they have a name. Because they have names you are not forced to think about their internal workings. You can think about a named function as a distinct thing in and of itself, which frees your mind to think about the larger picture and higher level concepts.
Giving names to things is not an alien concept to SQL. As you can see in "The original SQL" we actually gave the third column whose value is 2017
the name "year". You can also name, or alias other things within your SQL but this falls far short of how we think (or not) about functions.
Common Table Expressions are the closest I have found to being able to name and refer to sections of SQL.
The SQL
WITH "racesIn2017" as (
SELECT "raceId" FROM races WHERE year = 2017
ORDER BY "round" DESC
LIMIT 1
)
SELECT
"driverStandings".points,
"driverStandings"."driverId",
2017 as year
FROM "driverStandings"
WHERE "raceId" IN ( SELECT "raceId" FROM "racesIn2017" )
ORDER BY "driverStandings".points DESC
IN the above code we have taken the contents of the IN (...)
statement and moved it into a common table expression called racesIn2017
at the top. We then have to select from that common table expression within the IN (...)
clause again so in characters typed we have something longer, but we have achieved something else...
There is no WHERE
within the IN (...)
clause any more, that complexity has been moved to the common table expression. WHERE is not difficult to understand but obviously the complexity we could have abstracted away and named in the common table expressions could have been far far greater.
Pro's
- The ability to name and refer to a section of code is incredibly important because the developer can think about it as one thing, as opposed to the details that make it up.
Con's
- Common table expressions, from what I have seen, are optimized individually, not in the context of the main query. This looks to have changed in PostgreSQL 12.
Results
The results are identical to the previous version
points | driverId | year |
---|---|---|
363 | 1 | 2017 |
317 | 20 | 2017 |
305 | 822 | 2017 |
205 | 8 | 2017 |
200 | 817 | 2017 |
168 | 830 | 2017 |
100 | 815 | 2017 |
87 | 839 | 2017 |
54 | 832 | 2017 |
43 | 13 | 2017 |
43 | 807 | 2017 |
40 | 840 | 2017 |
28 | 154 | 2017 |
19 | 825 | 2017 |
17 | 4 | 2017 |
13 | 838 | 2017 |
8 | 835 | 2017 |
5 | 826 | 2017 |
5 | 836 | 2017 |
0 | 18 | 2017 |
0 | 814 | 2017 |
0 | 828 | 2017 |
0 | 841 | 2017 |
0 | 842 | 2017 |
0 | 843 | 2017 |