This series of articles

This is the first of the Code In Postgres series of articles.

When developing systems we often have a choice of writing code (NodeJS, C#, Python or PHP etc) or SQL. I believe that sometimes the decision to write code is taken without fully evaluating how much of the task could be offloaded to SQL.

In this series of articles I wish to show the huge benefits of using and learning SQL by examining progressively more difficult scenarios with increasing amounts of SQL knowledge. In doing this I hope to illustrate that sometimes large amounts of extra code is written for what SQL can achieve quicker, with less complexity and more readability.

To be more specific, we should try to follow the rule of least power more often.

All JavaScript examples will assume a reasonably high level of competence.

About the Ergast data set

For this series of articles we will be using the Ergast data set, which is a provided under the Attribution-NonCommercial-ShareAlike 3.0 Unported Licence.

The aim

Lets say you want to find out the final points for drivers in the 2017 Formula 1 World Championship. The schema for the tables we will be using is as follows:

ERD Diagram

An example of the data you would find in these tables is shown below:

Assumed level of SQL Knowledge

In this JavaScript example we will assume the writer has sufficient SQL knowledge to use a WHERE statement along with the ability to only return certain fields using SELECT. After this we will see how this can be accomplished in one single SQL statement using IN, ORDER BY and LIMIT.

races

raceId year round circuitId name date time url
969 2017 1 1 Australian Grand Prix 2017-03-26 05:00:00 https://en.wikipedia.org/wiki/2017_Australian_Grand_Prix
970 2017 2 17 Chinese Grand Prix 2017-04-09 06:00:00 https://en.wikipedia.org/wiki/2017_Chinese_Grand_Prix
971 2017 3 3 Bahrain Grand Prix 2017-04-16 15:00:00 https://en.wikipedia.org/wiki/2017_Bahrain_Grand_Prix
972 2017 4 71 Russian Grand Prix 2017-04-30 12:00:00 https://en.wikipedia.org/wiki/2017_Russian_Grand_Prix
973 2017 5 4 Spanish Grand Prix 2017-05-14 12:00:00 https://en.wikipedia.org/wiki/2017_Spanish_Grand_Prix
974 2017 6 6 Monaco Grand Prix 2017-05-28 12:00:00 https://en.wikipedia.org/wiki/2017_Monaco_Grand_Prix
975 2017 7 7 Canadian Grand Prix 2017-06-11 18:00:00 https://en.wikipedia.org/wiki/2017_Canadian_Grand_Prix
976 2017 8 73 Azerbaijan Grand Prix 2017-06-25 13:00:00 https://en.wikipedia.org/wiki/2017_Azerbaijan_Grand_Prix
977 2017 9 70 Austrian Grand Prix 2017-07-09 12:00:00 https://en.wikipedia.org/wiki/2017_Austrian_Grand_Prix

driversStandings

driverStandingsId raceId driverId points position positionText wins
64782 855 3 63 7 7 0
64795 856 1 196 5 5 2
64797 856 4 212 3 3 1
64805 856 2 34 10 10 0
64810 856 3 67 7 7 0

The driverStandings table has the points for every driver in every race. The problem here is that there is no record in the driverStandings table for which season a raceId belongs to. So we need to get a bit creative… Here is one possible solution:

  1. Looking at the races table’s year column, we can find all the raceId in 2017.
  2. If we can get all races in a given year we should be able to get the last race because the round will be the highest within that year.
  3. Find the points and driverId for the drivers who were in that raceId by reading the driverStandings table.
  4. Sort them by points descending.
  5. The very first row contains the driverId which has the most points in that season. This driverId is the world champion. The ones later on denote their final position (assuming the points differ).

Implementing the JavaScript

Libraries

sql-spitting-image/limit.js

/**
 * Gets the first n rows from a set of rows.
 *
 * @param n number
 * @return (rows: Rows[]) => Rows[]
 */
function limit(n) {
    return function(rows) {
        return rows.slice(0, n);
    }
}


module.exports = limit;

sql-spitting-image/select.js

function select(spec) {

    function mapper(row) {
        let r = {};
        spec.forEach(([theFrom, theTo]) => {
            r[theTo] = row[theFrom]
        });
        return r;
    }

    return function selectImpl(rows) {
        return rows.map(mapper);
    };
}


module.exports = select;

sql-spitting-image/orderBy.js

/**
 * Creates a `sortFn` for the JavaScript [Array.prototype.sort](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort) function.
 *
 * @param columnName string The name of the column to sort by.
 * @param direction string If this is 'ASC' sort ascending, otherwise descending.
 * @return (a: number, b: number) => number The `sortFn`.
 */
function getSingleCompareFunction(columnName, direction) {

    const flipper = direction.toLowerCase() == 'asc' ? 1 : -1;

    return function singleCompareFunction(rowA, rowB) {
        return (rowA[columnName] - rowB[columnName]) * flipper;
    }
}


/**
 * Orders a set of rows
 *
 * @param columnName string
 * @param direction string ( 'ASC' || 'DESC' )
 * @param rows Row[]
 * @return Row[]
 */
function orderBy(columnName, direction='ASC') {

    const compareFunction = getSingleCompareFunction(columnName, direction);

    return function(rows) {
        return rows.sort(compareFunction);
    };
}


orderBy.getSingleCompareFunction = getSingleCompareFunction;
module.exports = orderBy;

sql-spitting-image/orderByMulti.js

const { getSingleCompareFunction } = require('./orderBy');

/**
 * Gets a `sortFn` for [Array.prototype.sort](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort)
 * that will sort whole rows based on a list of `columnName` and `direction`
 * tuples.
 *
 * @param colDirectionTuples `[columnName: string, direction: string][]`
 * Ordering specification where `columnName` and `direction` are parameters from
 * `getSingleCompareFunction`.
 * @return (a: Row, b: Row) => number The `sortFn`.
 */
function orderByMulti(colDirectionTuples) {

    function compareFunction(rowA, rowB) {
        return colDirectionTuples.reduce((acc, [col, dir]) => {
            if (acc != 0) { return acc; }
            const cf = getSingleCompareFunction(col, dir);
            return cf(rowA, rowB);
        }, 0);
    }

    return function(rows) {
        return rows.sort(compareFunction);
    }

}

module.exports = orderByMulti;

Main Code

const { takeOne, runQuery, output } = require('./_utils');
const limit = require('./sql-spitting-image/limit');
const select = require('./sql-spitting-image/select');
const orderBy = require('./sql-spitting-image/orderBy');
const orderByMulti = require('./sql-spitting-image/orderByMulti');


/**
 * interface RaceResult { round: number; raceId: number; }
 * interface MainResult { points: number; driverId: number; year: number; }
 */


/**
 * Get data from the `results` table.
 *
 * @param year number
 * @return Promise<RaceResult[]>
 */
function qryRaces(year) {
    return runQuery('select "round", "raceId" from races where year = $1', [year]);
}


/**
 * Gets all driver standings at a given set of raceIds
 *
 * @param raceId number
 * @return Promise<MainResult[]>
 */
function qryStandings(raceId) {

    const sql = `
        select
        "driverStandings".points,
        "driverStandings"."driverId",
        2017 as year
        from "driverStandings"
        where "raceId" = $1
        `;
    return runQuery(sql, [raceId]);

}


qryRaces(2017)
    .then(orderBy('round', 'desc'))
    .then(limit(1))
    .then((rounds) => rounds.map(r => r.raceId))
    .then(takeOne)
    .then(qryStandings)
    .then(orderByMulti([['points', 'desc'], ['driverId', 'asc']]))
    .then(select([
        ["points", "points"],
        ["driverId", "driverId"],
        ["year" , "year"]
    ]))
    .then(output)
    .catch(err => {
        console.log("ERROR:", err);
        process.exit(1);
    });

This code, despite there being a lot of it is relatively straight forward. We get a list of raceId and round from the qryRaces function. Once we have this we will order by the round from largest to smallest and take the first one. This is the last race of the season.

After this we feed that raceId directly into the qryStandings functions to get the results from the last race. Finally we are forced to use a more complicated sorting function for stability, because some drivers have the same amount of points before presenting our desired columns.

Pro’s

  • There’s some nice re-usable functions here.
  • The main code is quite concise and easy to understand.

Con’s

  • Longer than SQL
  • We downloaded more data than necessary, in this case it is not too bad but it could have been much worse.

SQL

SELECT
    "driverStandings".points,
    "driverStandings"."driverId",
    2017 as year
FROM "driverStandings"
WHERE "raceId" IN (
    SELECT "raceId" FROM races
    WHERE year = 2017
    ORDER BY "round" DESC
    LIMIT 1
)
ORDER BY
    "driverStandings".points DESC,
    "driverStandings"."driverId" ASC

Like the JavaScript we are using ordering (ORDER BY) and limiting (LIMIT) to get the highest raceId within 2017.

The IN clause can be used to match a column against a set of numbers. For example we may choose to write WHERE "raceId" IN (4, 5, 7) which would be the same thing as writing WHERE "raceId" = 4 OR "raceId" = 5 OR "raceId" = 7.

The smart thing here is that we are using a query to get the last raceId within the IN clause instead of a directly specified list of raceId.

Finally an ORDER BY statement is used to perform sorting of the final record set, you can sort by multiple fields or use many types of expressions.

Pro’s

  • Shorter than the JavaScript.
  • If this were called by JavaScript we would need only one Promise, which is much easier to write and reason about.
  • The inside of the IN clause can be ran and understood individually.

Con’s

  • Is the ORDER BY / LIMIT 1 a trick?
  • It seems in code you can give the contents of IN clause a name (raceIds) but this is not possible using SQL’s IN, or is it?.

Results

points driverId year
363 1 2017
317 20 2017
305 822 2017
205 8 2017
200 817 2017
168 830 2017
100 815 2017
87 839 2017
54 832 2017
43 13 2017
43 807 2017
40 840 2017
28 154 2017
19 825 2017
17 4 2017
13 838 2017
8 835 2017
5 826 2017
5 836 2017
0 18 2017
0 814 2017
0 828 2017
0 841 2017
0 842 2017
0 843 2017

(25 rows)