This series of articles

This is the first of the Code In Postgres series of articles.

When developing systems we often have a choice of writing code (NodeJS, C#, Python or PHP etc) or SQL. I believe that sometimes the decision to write code is taken without fully evaluating how much of the task could be offloaded to SQL.

In this series of articles I wish to show the huge benefits of using and learning SQL by examining progressively more difficult scenarios with increasing amounts of SQL knowledge. In doing this I hope to illustrate that sometimes large amounts of extra code is written for what SQL can achieve quicker, with less complexity and more readability.

To be more specific, we should try to follow the rule of least power more often.

All JavaScript examples will assume a reasonably high level of competence.

About the Ergast data set

For this series of articles we will be using the Ergast data set, which is a provided under the Attribution-NonCommercial-ShareAlike 3.0 Unported Licence.

The aim

Lets say you want to find out who the Formula 1 World Champion was in 2017. The schema for the tables we will be using is as follows:

ERD Diagram

An example of the data you would find in these tables is shown below:

drivers

driverId driverRef number code forename surname dob nationality url
1 hamilton 44 HAM Lewis Hamilton 1985-01-07 British http://en.wikipedia.org/wiki/Lewis_Hamilton
2 heidfeld   HEI Nick Heidfeld 1977-05-10 German http://en.wikipedia.org/wiki/Nick_Heidfeld
3 rosberg 6 ROS Nico Rosberg 1985-06-27 German http://en.wikipedia.org/wiki/Nico_Rosberg
4 alonso 14 ALO Fernando Alonso 1981-07-29 Spanish http://en.wikipedia.org/wiki/Fernando_Alonso

driversStandings

driverStandingsId raceId driverId points position positionText wins
64782 855 3 63 7 7 0
64795 856 1 196 5 5 2
64797 856 4 212 3 3 1
64805 856 2 34 10 10 0
64810 856 3 67 7 7 0

The driverStandings table has the points for every driver in every race and world champions would be the driver with the most points in a single season. The problem here is that there is no record in the driverStandings table for which season a raceId belongs to. So we need to get a bit creative… Here is one possible solution:

  1. Looking at the races table’s year column, we can find all the raceId in 2017.
  2. Find the points and driverId for all of those raceId by reading the driverStandings table.
  3. Sort them by points descending.
  4. The very first row contains the driverId which has the most points in that season. This driverId is the world champion.

Implementing the JavaScript

Assumed level of SQL Knowledge for the JavaScript example

In this JavaScript example we will assume the writer has sufficient SQL knowledge to use a WHERE statement along with the ability to only return certain fields using SELECT. After this we will see how this can be accomplished in one single SQL statement using IN, ORDER BY and LIMIT.

Libraries

sql-spitting-image/limit.js

/**
 * Gets the first n rows from a set of rows.
 *
 * @param n number
 * @return (rows: Rows[]) => Rows[]
 */
function limit(n) {
    return function(rows) {
        return rows.slice(0, n);
    }
}


module.exports = limit;

sql-spitting-image/orderBy.js

/**
 * Creates a `sortFn` for the JavaScript [Array.prototype.sort](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort) function.
 *
 * @param columnName string The name of the column to sort by.
 * @param direction string If this is 'ASC' sort ascending, otherwise descending.
 * @return (a: number, b: number) => number The `sortFn`.
 */
function getSingleCompareFunction(columnName, direction) {

    const flipper = direction.toLowerCase() == 'asc' ? 1 : -1;

    return function singleCompareFunction(rowA, rowB) {
        return (rowA[columnName] - rowB[columnName]) * flipper;
    }
}


/**
 * Orders a set of rows
 *
 * @param columnName string
 * @param direction string ( 'ASC' || 'DESC' )
 * @param rows Row[]
 * @return Row[]
 */
function orderBy(columnName, direction='ASC') {

    const compareFunction = getSingleCompareFunction(columnName, direction);

    return function(rows) {
        return rows.sort(compareFunction);
    };
}


orderBy.getSingleCompareFunction = getSingleCompareFunction;
module.exports = orderBy;

Main Code

const { flatten, runQuery, output } = require('./_utils');
const limit = require('./sql-spitting-image/limit');
const orderBy = require('./sql-spitting-image/orderBy');


/**
 * interface RaceResult { round: number; }
 * interface MainResult { points: number; driverId: number; year: number; }
 */


/**
 * Get data from the `results` table.
 *
 * @param year number
 * @return Promise<RaceResult[]>
 */
function qryRaces(year) {
    return runQuery('select "raceId" from races where year = $1', [year]);
}


/**
 * Gets all driver standings at a given set of raceIds
 *
 * @param raceIds number[]
 * @return Promise<MainResult[]>
 */
function qryStandings(raceIds) {

    const promises = raceIds.map((raceId) => {
        const sql = `
            select
            "driverStandings".points,
            "driverStandings"."driverId",
            2017 as year
            from "driverStandings"
            where "raceId" = $1
            `;
        return runQuery(sql, [raceId]);
    });

    return Promise.all(promises).then(flatten);
}


qryRaces(2017)
    .then(racesResults => {
        return qryStandings(racesResults.map(({raceId}) => raceId));
    })
    .then(orderBy('points', 'desc'))
    .then(limit(1))
    .then(output)
    .catch(err => { console.log("ERROR:", err) });

This code, despite there being a lot of it is relatively straight forward. We get a list of raceId from the qryRaces function and pass this to the qryStandings function. The qryStandings function fans out a series of Promise and collates them back again returning the standings of all drivers after all races in 2017, which is quite a lot of unnecessary data. Once we have all this data we then use a series of (included) library functions to sort the data by points and take the first result.

Pro’s

  • There’s some nice re-usable functions here.
  • The main code is quite concise and easy to understand.

Con’s

  • Longer than SQL
  • We downloaded a reasonably large amount of data from the database, which could be slow.
  • There are some non trivial constructs here such as flattening Arrays and using Promise.all to wait for a series of Promises.

SQL

SELECT
    "driverStandings".points,
    "driverStandings"."driverId",
    2017 as year
FROM "driverStandings"
WHERE "raceId" IN (SELECT "raceId" FROM races WHERE year = 2017)
ORDER BY "driverStandings".points DESC
LIMIT 1

The IN clause can be used to match a column against a set of numbers. For example we may choose to write WHERE "raceId" IN (4, 5, 7) which would be the same thing as writing WHERE "raceId" = 4 OR "raceId" = 5 OR "raceId" = 7.

The smart thing here is that we are using a query to return the numbers for the IN clause instead of a directly specified list.

ORDER BY statements are used to perform sorting of the record set, you can sort by multiple fields or use many types of expressions.

Lastly LIMIT controls how many items to return. It can also be combined with OFFSET to skip rows.

Pro’s

  • Shorter than the JavaScript.
  • If this were called by JavaScript we would need only one Promise, which is much easier to write and reason about.
  • The inside of the IN clause can be ran and understood individually.

Con’s

  • Is the order by / limit 1 a trick?
  • It seems in code you can give the contents of IN clause a name (raceIds) but this is not possible using SQL’s IN, or is it?.

Results

points driverId year
363 1 2017

(1 row)