SQL Saves Hours, Possibly Lives, Part II

I said the last SQL piece was experimental, but that was before it cruised into my top posts. You could point to the obscene clickbait title. That's both logical and entirely fair, but hear me out. SQL's not the most popular kid in school, and that air of mystique gave her a fighting chance...

Either way, full-stack fun is back on the menu! For an introduction to SQL's time-bending nature, start with Part I.

Today, we dive deeper, covering the complex interactions that make SQL brilliant. We tackle the pros and cons, the power of JOINS, and the modularity of UNION ALL. But first, let's start with some Javascript.

Intended audience

The JS Approach

Let's start with a problem. We need to parse and track the net liquidity of the U.S. Federal Reserve. We have access to all required data but need to find the percent change between rows.

    • FED Net Liquidity = (Federal Reserve Balance) - (Treasury General Account + Reverse Repo Agreements)
    • Don't worry about what these mean, but the process is an interesting (and deep) rabbit hole if you're so inclined.

The initial data would look something like:

javascript
1 const fedBalance = [
2 // Your data goes here:
3 { date: '2023-01-01', fed: 100, tga: 50, rrp: 30, net_liquidity: 70, spx: 200 },
4 { date: '2023-01-02', fed: 110, tga: 55, rrp: 35, net_liquidity: 75, spx: 210 },
5 // More data...
6 ];
7

Remember, we need to find the percent change, or the difference between the current and proceeding rows, divided by the proceeding row, then multiplied by 100. JS offers a range of approaches, but map does the trick.

javascript
1 const calculatePercentageChange = (prev, current) => {
2 return ((current - prev) / prev) * 100;
3 };
4
5 const fedBalanceFull = fedBalance.map((row, index, arr) => {
6 if (index === 0) {
7 return {
8 ...row,
9 net_change_fed: null,
10 net_change_tga: null,
11 net_change_rrp: null,
12 net_change_net_liquidity: null,
13 };
14 }
15
16 const prevRow = arr[index - 1];
17
18 return {
19 ...row,
20 net_change_fed: calculatePercentageChange(prevRow.fed, row.fed),
21 net_change_tga: calculatePercentageChange(prevRow.tga, row.tga),
22 net_change_rrp: calculatePercentageChange(prevRow.rrp, row.rrp),
23 net_change_net_liquidity: calculatePercentageChange(prevRow.net_liquidity, row.net_liquidity),
24 };
25 });
26

This gives us what we need, but it took a hefty dose of repetition to get there. It also adds another layer of complexity to your front and/or backend implementations, which should be limited by design.

All this to say, I prefer formatting as much as possible before it hits my application.

A Glance Through the Window

True to its name, SQL is a querying language specifically designed to handle and manipulate large datasets. The below features one of SQL's heavy lifters:

Window functions perform calculations across a set of rows related to the current row. Three main types exist:

    • Aggregate, or function like SUM(), MAX(), or COUNT()
    • Ranking, or RANK() and DENSE_RANK()
    • Value-based, or functions like LAG() and LEAD()

A full rundown is outside this guide's scope but will be thoroughly covered in a follow-up post. For now, focus on LAG(). Note that window functions include an OVER clause, a PARTITION BY, and/or an ORDER BY. But the OVER clause distinguishes them from the average SQL query.

As window functions are row-based operations, we must specify how the data relates to the rows around it. OVER acts as that guide. In plain English, a SQL function lets us ask for data. A window function extends that, enabling us to ask for calculated data in a specific order.

Where SUM(amount) returns the total sum of an entire column in our dataset, SUM(amount) OVER (ORDER BY date) returns a row with the summed totals of each day. The below showcases LAG(), which finds the difference between a row and the row immediately preceding it.

sql
1 CREATE TABLE IF NOT EXISTS fed_balance_full AS (
2 SELECT date,
3 fed,
4 fed - LAG(fed, 1) over (ORDER BY date) AS net_change_fed,
5 tga,
6 tga - LAG(tga, 1) over (ORDER BY date) AS net_change_tga,
7 rrp,
8 rrp - LAG(rrp, 1) over (ORDER BY date) AS net_change_rrp,
9 net_liquidity,
10 net_liquidity - LAG(net_liquidity, 1) over (ORDER BY date) AS net_change_net_liquidity,
11 spx,
12 FROM fed_balance
13 );
14

Here we ask for the difference between the current and preceeding rows, ordered by date. We get the same result, but less wordy and more human-readable.

I tested both operations. I ran the JS locally in VSCode and ran the query via a local-hosted PostgreSQL database on the same machine. The SQL took under 50 milliseconds to process and deliver, while Javascript took around 0.253 seconds. So, the SQL query processed around 5.06 times faster than the equivalent JS.

The tests were (admittedly) isolated, but it's not all about speed and brevity. SQL handles complex queries, mutations, and interactions with outright class.

Recruit More Data

SQL's a maestro when it comes to manipulating and querying relational data. This becomes invaluable in operations that cross-reference multiple tables, allowing you to pull insights from seemingly disparate data. In a business context, it's often a unique ID that serves as the bridge between tables, linking, say, sales data to specific customers or salespersons.

For example, consider two tables, sales and salesTeam, where we link them via a salesId:

sql
1 -- NOTE: While SQL offers a `money` data type, we use `numeric`
2 -- here for compatibility with JavaScript data types in React.
3 CREATE TABLE IF NOT EXISTS sales (
4 id varchar PRIMARY KEY,
5 salesDate date,
6 salesAmount numeric
7 );
8
9 CREATE TABLE IF NOT EXISTS salesTeam (
10 salesId varchar REFERENCES sales(id),
11 salesDate date,
12 salesTotal numeric,
13 PRIMARY KEY(salesId)
14 );
15
16

Use JOIN to find the matching records between two or more tables. We covered the basics of each type in the first article, but here we'll use LEFT JOIN.

In our two tables, sales.salesId matches the values held by salesTeam.id. We include this information in our ON to join the salesTeam table on the id match. Now, let's get to joining:

sql
1 SELECT sales.id, sales.salesDate, SUM(salesTotal) as netSales
2 FROM sales
3 LEFT JOIN salesTeam ON sales.id = salesTeam.salesId
4 GROUP BY 1, 2
5 ORDER BY 2 DESC
6

With this, we can query the total sales for each salesperson in a given period.

UNIONS

Let's ramp things up. We've been cruising at ground level with flat queries, but it's time to soar. Enter SQL's UNIONS and nested queries.

SQL's UNION declaration joins multiple queries using related tables. SQL also allows for nested queries. Think of nesting like JS for loops. The comparison isn't 1:1, but the mental model is similar.

For this, let's head to Dune Analytics. I said it before, but you don't need to be into crypto to appreciate Dune. It's the ideal spot to hone your SQL, as the entire site is one massive relational database, which you can query to your heart's content.

We'll start with UNIONS, which you'll typically see as UNION ALL.

    • UNION ALL literally stacks two or more result sets on top of each other, keeping duplicate rows.
    • UNION does the same but eliminates duplicate rows, which can be less efficient if you need those duplicates for your analysis.

It has myriad uses, but the below showcases my favorite.

Now, hold on tight; we're diving deep. The query below utilizes each technique we've learned so far. At its core, it calculates the Total Value Locked (TVL) in a lending protocol by summing up deposits and subtracting withdrawals. Take a moment to absorb the data flow from the inner queries to the top-level aggregate.

It has numerous things going on:

    • We assign a name to each query and subquery using WITH.
    • We nest a SELECT query in our FROM clause.
    • We merge the two queries with UNION.
sql
1 -- Nesting in the `FROM` clause.
2 WITH borrows AS (
3 SELECT dt,
4 project,
5 token,
6 asset_symbol,
7 sum(delta) as "TVL"
8 FROM (
9 SELECT DATE_TRUNC('day', evt_block_time) as dt,
10 project,
11 asset_address as token,
12 asset_symbol,
13 sum(usd_value) as delta
14 FROM lending.borrow
15 GROUP BY 1, 2, 3, 4
16
17 UNION ALL
18
19 SELECT DATE_TRUNC('day', evt_block_time) as dt,
20 project,
21 asset_address as token,
22 asset_symbol,
23 -sum(usd_value) as delta -- negative for $ out
24 FROM lending.repay
25 GROUP BY 1, 2, 3, 4
26 -- Nested `FROM` clauses need to be aliased.
27 -- You can use `AS` or type away.
28 ) sub
29
30 GROUP BY 1, 2, 3, 4
31 ORDER BY 1 DESC
32 );
33

Now, we hit the true time savers. The above could likely be done in Javascript, but good luck. That's the beauty of SQL. It offers unparalleled efficiency when handling or performing complex computations on large datasets.

We have a few moving pieces to work with here.

The top-level SELECT aggregates the below queries to find the TVL (Total Value Locked), or the sum of all deposits minus any withdrawals. Our nested FROM names four columns in each query but focus on the last item.

    • sum(usd_value) as delta
    • -sum(usd_value) as delta -- negative for $ out

The result? The Total Value Locked, or the sum of all deposits minus withdrawals. It's a powerful metric in just a few lines of SQL, outperforming similar JS-based solutions in speed and simplicity.

Wrapping Up

If you're still with me, well done! I know it was a ton to absorb, but SQL gets simpler with time. SQL doesn't have many declarations, but the depth of each can be staggering.

Fortunately, the majority of web projects require fewer intricate queries. But if complex data's your swing, nothing beats SQL.

Now, to quickly recap:

    • We compared SQL to a Javascript approach
    • Introduced window functions, which allow for absurdly simple data sorting and manipulation
    • Covered complex querying through the use of JOIN
    • And learned the power of nested queries or SQLs modular for loop

So, what's next? We'll keep peeling back the layers of SQL, diving into stored procedures, triggers, and more advanced window functions. Feel free to message me on Twitter if you have specific questions or scenarios you'd like covered. I'm more than happy to explore them in future posts!

In this ever-changing world of web development, keeping your skills sharp is crucial. If you're looking to gain an edge, particularly with data-heavy applications, the depth and power of SQL should not be underestimated.

Like CSS, love it or hate it, it's always there to help!

Cheers!