This series of articles
This is the first of the Code In Postgres series of articles.
When developing systems we often have a choice of writing code (NodeJS, C#, Python or PHP etc) or SQL. I believe that sometimes the decision to write code is taken without fully evaluating how much of the task could be offloaded to SQL.
In this series of articles I wish to show the huge benefits of using and learning SQL by examining progressively more difficult scenarios with increasing amounts of SQL knowledge. In doing this I hope to illustrate that sometimes large amounts of extra code is written for what SQL can achieve quicker, with less complexity and more readability.
To be more specific, we should try to follow the rule of least power more often.
About the Ergast data set
For this series of articles we will be using the Ergast data set, which is a provided under the Attribution-NonCommercial-ShareAlike 3.0 Unported Licence.
Lets say you want to find out the final points for drivers in the 2017 Formula 1 World Championship. The schema for the tables we will be using is as follows:
An example of the data you would find in these tables is shown below:
Assumed level of SQL Knowledge
WHERE statement along with the ability to only return certain fields using
SELECT. After this we will see how this can be accomplished in one single SQL statement using
ORDER BY and
|969||2017||1||1||Australian Grand Prix||2017-03-26||05:00:00||https://en.wikipedia.org/wiki/2017_Australian_Grand_Prix|
|970||2017||2||17||Chinese Grand Prix||2017-04-09||06:00:00||https://en.wikipedia.org/wiki/2017_Chinese_Grand_Prix|
|971||2017||3||3||Bahrain Grand Prix||2017-04-16||15:00:00||https://en.wikipedia.org/wiki/2017_Bahrain_Grand_Prix|
|972||2017||4||71||Russian Grand Prix||2017-04-30||12:00:00||https://en.wikipedia.org/wiki/2017_Russian_Grand_Prix|
|973||2017||5||4||Spanish Grand Prix||2017-05-14||12:00:00||https://en.wikipedia.org/wiki/2017_Spanish_Grand_Prix|
|974||2017||6||6||Monaco Grand Prix||2017-05-28||12:00:00||https://en.wikipedia.org/wiki/2017_Monaco_Grand_Prix|
|975||2017||7||7||Canadian Grand Prix||2017-06-11||18:00:00||https://en.wikipedia.org/wiki/2017_Canadian_Grand_Prix|
|976||2017||8||73||Azerbaijan Grand Prix||2017-06-25||13:00:00||https://en.wikipedia.org/wiki/2017_Azerbaijan_Grand_Prix|
|977||2017||9||70||Austrian Grand Prix||2017-07-09||12:00:00||https://en.wikipedia.org/wiki/2017_Austrian_Grand_Prix|
driverStandings table has the points for every driver in every race. The problem here is that there is no record in the
driverStandings table for which season a
raceId belongs to. So we need to get a bit creative… Here is one possible solution:
- Looking at the
yearcolumn, we can find all the
- If we can get all
racesin a given
yearwe should be able to get the last race because the
roundwill be the highest within that year.
- Find the
driverIdfor the drivers who were in that
raceIdby reading the
- Sort them by
- The very first row contains the
driverIdwhich has the most points in that season. This
driverIdis the world champion. The ones later on denote their final position (assuming the points differ).
This code, despite there being a lot of it is relatively straight forward. We get a list of
round from the
qryRaces function. Once we have this we will order by the
round from largest to smallest and take the first one. This is the last race of the season.
After this we feed that
raceId directly into the
qryStandings functions to get the results from the last race. Finally we are forced to use a more complicated sorting function for stability, because some drivers have the same amount of points before presenting our desired columns.
- There’s some nice re-usable functions here.
- The main code is quite concise and easy to understand.
- Longer than SQL
- We downloaded more data than necessary, in this case it is not too bad but it could have been much worse.
ORDER BY) and limiting (
LIMIT) to get the highest
raceId within 2017.
IN clause can be used to match a column against a set of numbers. For example we may choose to write
WHERE "raceId" IN (4, 5, 7) which would be the same thing as writing
WHERE "raceId" = 4 OR "raceId" = 5 OR "raceId" = 7.
The smart thing here is that we are using a query to get the last
raceId within the
IN clause instead of a directly specified list of
ORDER BY statement is used to perform sorting of the final record set, you can sort by multiple fields or use many types of expressions.
- The inside of the
INclause can be ran and understood individually.
- Is the
LIMIT 1a trick?
- It seems in code you can give the contents of IN clause a name (
raceIds) but this is not possible using SQL’s
IN, or is it?.