When developing systems we often have a choice of writing code (NodeJS, C#, Python or PHP etc) or SQL. I believe that sometimes the decision to write code is taken without fully evaluating how much of the task could be offloaded to SQL.
In this series of articles I wish to show the huge benefits of using and learning SQL by examining progressively more difficult scenarios with increasing amounts of SQL knowledge. In doing this I hope to illustrate that sometimes large amounts of extra code is written for what SQL can achieve quicker, with less complexity and more readability.
If we think about what code we had in the previous article there are two peices of functionality we’re missing. These are:
The ability to find a row in the drivers table that matches a row in our current result set.
The ability to mix/join this row from drivers with our current results.
I’m going to aim to write code that is highly reusable and also still performs well on very large data sets.
Finding drivers efficiently
The obvious answer to finding a driver from a list of drivers would be to use Array.find()… something lie the following?
This is certainly a reusable piece of code and was easy to write and hopefully for you to understand.
I can see two problems here though.
The first problem is performance. When we need to look up a driverId we need to scan all the rows in drivers up until the point we find the correct one. We will be doing this for all of the (hypothetically millions of) driverId we want to look up. So I’m pretty sure the performance characteristics of this is not great.
The other short coming I can see is that it will only ever retreive one row. This is often what we want to acheive, but not always. An example of when this is not enough is when you have one customerId and you want to find / match / join it to all orders in another table.
The following would perform much better and allow returning multiple rows:
The indexBySimple function can scan through the whole set of drivers and fill up a Map with the key being driverId and the values are the actual rows with have that driverId. Once we have this Map looking up drivers by driverId will become very cheap.
Mixing a drivers record with our current results
Combining an Object of one type (driverRow) with another (currentResults) is really easy in ES6 because you can simply destruct the objects to create new one like the following
Building the libraries
Because of all our planning the innerJoinSimple library has become really quite simple.
Reading through it you can see that the first thing it does is build an index for the right set of data.
After this it will read through all of the left set of data, checking if it can be joined to the right, if it can it will be.
The last thing to do is glue all the code together. See below:
Again we have a rather large amount of code, however I again think it is quite readable.
Even if you disagree and don’t like this code I hope you will agree that this amount of code could easily be wrote quite badly.
Broken down quite well into bite size peices.
A lot of this code is quite reusable, if you wish.
There’s a lot of it.
We are again requesting more data than is required.
The INNER JOIN relatively effortlessly mixes in data about drivers into what we had before.
There’s not much here that’s re-usable, other than the knowledge you’ve acquired.