SQL Window Functions

databases
sql training
data manipulation
Author

Bongani Ncube

Published

4 February 2024

Window functions

  • A window function performs an aggregate-like operation on a set of query rows.

However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each query row:

Anatomy of a window function

  • FUNCTION_NAME() OVER()
  • ORDER BY
  • PARTITION BY
  • ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED

Creating a fake dataset in R

library(tidyverse)

sales<-tribble(~ year , ~country , ~product    , ~profit ,~Own ,~Time,
                 2000 , "Finland", "Computer"  ,   1500  ,"Y"  ,"D",
                 2000 , "Finland" ,"Phone"      ,   100   ,"Y"  ,"D",
                 2001 , "Finland" ,"Phone"      ,     10  ,"N"  ,"D",
                 2000 , "India"   ,"Calculator" ,     75  ,"Y"  ,"E",
                 2000 , "India"   ,"Calculator" ,     75  ,"N"  ,"E",
                 2000 , "India"   ,"Computer"   ,   1200  ,"N"  ,"E",
                 2000 , "USA"     , "Calculator",     75  ,"Y"  ,"E",
                 2000 , "USA"     , "Computer"  ,   1500  ,"N"  ,"E",
                 2001 , "USA"     ,"Calculator" ,     50  ,"Y"  ,"E",
                 2001 , "USA"     ,"Computer"   ,   1500  ,"Y"  ,"E",
                 2001 , "USA"     ,"Computer"   ,   1200  ,"Y"  ,"D",
                 2001 , "USA"     , "TV"        ,    150  ,"Y"  ,"D",
                 2001 , "USA"     , "TV"        ,    100  ,"Y"  ,"D")

Load in the necessary packages

Set up a database using the datafile

wind<-dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(wind,sales)

Lets get started

  • The row for which function evaluation occurs is called the current row.
  • The query rows related to the current row over which function evaluation occurs comprise the window for the current row.

ROW_NUMBER() OVER()

  • use it if maybe you have duplicates or intend to create rowids for your data
SELECT ROW_NUMBER() OVER () AS row_id , year, country, product, profit
FROM sales;
Displaying records 1 - 10
row_id year country product profit
1 2000 Finland Computer 1500
2 2000 Finland Phone 100
3 2001 Finland Phone 10
4 2000 India Calculator 75
5 2000 India Calculator 75
6 2000 India Computer 1200
7 2000 USA Calculator 75
8 2000 USA Computer 1500
9 2001 USA Calculator 50
10 2001 USA Computer 1500

PARTITION BY

  • This splits the table into partitions based on a column’s unique values and results aren’t rolled into one column
Note

therefore ROW_NUMBER() With PARTITION BY produces the row number of each row within its partition. In this case, rows are numbered per country. By default, partition rows are unordered and row numbering is nondeterministic.

SELECT ROW_NUMBER() OVER(PARTITION BY country) AS row_num1,
       year, country, product, profit
FROM sales;
Displaying records 1 - 10
row_num1 year country product profit
1 2000 Finland Computer 1500
2 2000 Finland Phone 100
3 2001 Finland Phone 10
1 2000 India Calculator 75
2 2000 India Calculator 75
3 2000 India Computer 1200
1 2000 USA Calculator 75
2 2000 USA Computer 1500
3 2001 USA Calculator 50
4 2001 USA Computer 1500

its also possible to use ROW_NUMBER() With ORDER BY to determine ranks

SELECT ROW_NUMBER() OVER(ORDER BY profit DESC) AS row_num1,
       year, country, product, profit
FROM sales;
Displaying records 1 - 10
row_num1 year country product profit
1 2000 Finland Computer 1500
2 2000 USA Computer 1500
3 2001 USA Computer 1500
4 2000 India Computer 1200
5 2001 USA Computer 1200
6 2001 USA TV 150
7 2000 Finland Phone 100
8 2001 USA TV 100
9 2000 India Calculator 75
10 2000 India Calculator 75
Tip

in the above example , we see that we have ranked our data according to profits implying that each rowid also signifies the rank of profit

RANK()

  • ROW_NUMBER assigns different numbers to profits with the same amount, so it’s not a useful ranking function; if profits are the same , they should have the same rank.

in such a case we use RANK()

SELECT RANK() OVER(ORDER BY profit DESC) AS RANK, 
     profit ,year, country, product
FROM sales;
Displaying records 1 - 10
RANK profit year country product
1 1500 2000 Finland Computer
1 1500 2000 USA Computer
1 1500 2001 USA Computer
4 1200 2000 India Computer
4 1200 2001 USA Computer
6 150 2001 USA TV
7 100 2000 Finland Phone
7 100 2001 USA TV
9 75 2000 India Calculator
9 75 2000 India Calculator

DENSE_RANK()

Tip

we can use DENSE_RANK() if we need ranks to be ordered or require a further partition

SELECT DENSE_RANK() OVER(PARTITION BY country ORDER BY profit DESC) AS RANK, 
     profit ,year, country, product
FROM sales;
Displaying records 1 - 10
RANK profit year country product
1 1500 2000 Finland Computer
2 100 2000 Finland Phone
3 10 2001 Finland Phone
1 1200 2000 India Computer
2 75 2000 India Calculator
2 75 2000 India Calculator
1 1500 2000 USA Computer
1 1500 2001 USA Computer
2 1200 2001 USA Computer
3 150 2001 USA TV
Note

now we have ranked our profits by country starting with the highest profit in each country

Aggregations

  • window functions are similar to GROUP BY aggregate function but all rows stay in the output

using the sales information table, these two queries perform aggregate operations that produce a single global sum for all rows taken as a group, and sums grouped per country:

SELECT SUM(profit) AS total_profit
FROM sales;
1 records
total_profit
7535

lets us see a GROUP BY in action

SELECT country, SUM(profit) AS country_profit
       FROM sales
       GROUP BY country
       ORDER BY country;
3 records
country country_profit
Finland 1610
India 1350
USA 4575
Tip

On the other hand , window operations do not collapse groups of query rows to a single output row. Instead, they produce a result for each row. Like the preceding queries, the following query uses SUM(), but this time as a window function:

SELECT
year, country, product, profit,
         SUM(profit) OVER() AS total_profit,
         SUM(profit) OVER(PARTITION BY country) AS country_profit
FROM sales
ORDER BY country, year, product, profit;         
Displaying records 1 - 10
year country product profit total_profit country_profit
2000 Finland Computer 1500 7535 1610
2000 Finland Phone 100 7535 1610
2001 Finland Phone 10 7535 1610
2000 India Calculator 75 7535 1350
2000 India Calculator 75 7535 1350
2000 India Computer 1200 7535 1350
2000 USA Calculator 75 7535 4575
2000 USA Computer 1500 7535 4575
2001 USA Calculator 50 7535 4575
2001 USA Computer 1200 7535 4575
Note

we have included OVER clause that specifies how to partition query rows into groups for processing by the window function:

  • The first OVER clause is empty, which treats the entire set of query rows as a single partition. The window function thus produces a global sum, but does so for each row.
  • The second OVER clause partitions rows by country, producing a sum per partition (per country). The function produces this sum for each partition row.

Window functions are permitted only in the select list and ORDER BY clause. Query result rows are determined from the FROM clause, after WHERE, GROUP BY, and HAVING processing, and windowing execution occurs before ORDER BY, LIMIT, and SELECT DISTINCT.

WITH CLAUSE

This is used to create a temporary relation such that the output of this temporary relation is available and is used by the query that is associated with the WITH clause. for instance the following table can be created as a temporary table

FRAMES

Note

Frames allow you to “peek” forwards or backward without first using the relative fetching functions, LAG and LEAD, to fetch previous rows’ values into the current row.

ROWS BETWEEN

  • n PRECEDING : n rows before current row
  • CURRENT ROW : the current row
  • n FOLLOWING : n rows after the current row

the following query calculates the number of sales per year and per country(i.e grouped by year and country)

SELECT
    year, country, COUNT(*) AS num_sales
  FROM sales
  GROUP BY year, country;
5 records
year country num_sales
2000 Finland 2
2000 India 3
2000 USA 2
2001 Finland 1
2001 USA 5

like soo…

WITH and a FRAME

  • We create a temporary table called country_sales
WITH Country_sales AS (
  SELECT
    year, country, COUNT(*) AS num_sales
  FROM sales
  GROUP BY year, country)

SELECT
  year, country, num_sales,
  -- Calculate each country's 3s-ales moving total
  SUM(num_sales) OVER
    (PARTITION BY country
     ORDER BY year ASC
     ROWS BETWEEN
     2 PRECEDING AND CURRENT ROW) AS sales_MA
FROM Country_sales
ORDER BY country ASC, year ASC;
5 records
year country num_sales sales_MA
2000 Finland 2 2
2001 Finland 1 3
2000 India 3 3
2000 USA 2 2
2001 USA 5 7