SQL Window Functions

databases

sql training

data manipulation

Author

Bongani Ncube

Published

4 February 2024

Window functions

A window function performs an aggregate-like operation on a set of query rows.

However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each query row:

Anatomy of a window function

FUNCTION_NAME() OVER()
ORDER BY
PARTITION BY
ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED

Creating a fake dataset in R

library(tidyverse)

sales<-tribble(~ year , ~country , ~product    , ~profit ,~Own ,~Time,
                 2000 , "Finland", "Computer"  ,   1500  ,"Y"  ,"D",
                 2000 , "Finland" ,"Phone"      ,   100   ,"Y"  ,"D",
                 2001 , "Finland" ,"Phone"      ,     10  ,"N"  ,"D",
                 2000 , "India"   ,"Calculator" ,     75  ,"Y"  ,"E",
                 2000 , "India"   ,"Calculator" ,     75  ,"N"  ,"E",
                 2000 , "India"   ,"Computer"   ,   1200  ,"N"  ,"E",
                 2000 , "USA"     , "Calculator",     75  ,"Y"  ,"E",
                 2000 , "USA"     , "Computer"  ,   1500  ,"N"  ,"E",
                 2001 , "USA"     ,"Calculator" ,     50  ,"Y"  ,"E",
                 2001 , "USA"     ,"Computer"   ,   1500  ,"Y"  ,"E",
                 2001 , "USA"     ,"Computer"   ,   1200  ,"Y"  ,"D",
                 2001 , "USA"     , "TV"        ,    150  ,"Y"  ,"D",
                 2001 , "USA"     , "TV"        ,    100  ,"Y"  ,"D")

Load in the necessary packages

library(odbc)
library(DBI)
library(RSQLite)

Set up a database using the datafile

wind<-dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(wind,sales)

Lets get started

The row for which function evaluation occurs is called the current row.
The query rows related to the current row over which function evaluation occurs comprise the window for the current row.

`ROW_NUMBER() OVER()`

use it if maybe you have duplicates or intend to create rowids for your data

SELECT ROW_NUMBER() OVER () AS row_id , year, country, product, profit
FROM sales;

Displaying records 1 - 10
row_id	year	country	product	profit
1	2000	Finland	Computer	1500
2	2000	Finland	Phone	100
3	2001	Finland	Phone	10
4	2000	India	Calculator	75
5	2000	India	Calculator	75
6	2000	India	Computer	1200
7	2000	USA	Calculator	75
8	2000	USA	Computer	1500
9	2001	USA	Calculator	50
10	2001	USA	Computer	1500

PARTITION BY

This splits the table into partitions based on a column’s unique values and results aren’t rolled into one column

Note

therefore ROW_NUMBER() With PARTITION BY produces the row number of each row within its partition. In this case, rows are numbered per country. By default, partition rows are unordered and row numbering is nondeterministic.

SELECT ROW_NUMBER() OVER(PARTITION BY country) AS row_num1,
       year, country, product, profit
FROM sales;

Displaying records 1 - 10
row_num1	year	country	product	profit
1	2000	Finland	Computer	1500
2	2000	Finland	Phone	100
3	2001	Finland	Phone	10
1	2000	India	Calculator	75
2	2000	India	Calculator	75
3	2000	India	Computer	1200
1	2000	USA	Calculator	75
2	2000	USA	Computer	1500
3	2001	USA	Calculator	50
4	2001	USA	Computer	1500

its also possible to use ROW_NUMBER() With ORDER BY to determine ranks

SELECT ROW_NUMBER() OVER(ORDER BY profit DESC) AS row_num1,
       year, country, product, profit
FROM sales;

Displaying records 1 - 10
row_num1	year	country	product	profit
1	2000	Finland	Computer	1500
2	2000	USA	Computer	1500
3	2001	USA	Computer	1500
4	2000	India	Computer	1200
5	2001	USA	Computer	1200
6	2001	USA	TV	150
7	2000	Finland	Phone	100
8	2001	USA	TV	100
9	2000	India	Calculator	75
10	2000	India	Calculator	75

Tip

in the above example , we see that we have ranked our data according to profits implying that each rowid also signifies the rank of profit

RANK()

ROW_NUMBER assigns different numbers to profits with the same amount, so it’s not a useful ranking function; if profits are the same , they should have the same rank.

in such a case we use RANK()

SELECT RANK() OVER(ORDER BY profit DESC) AS RANK, 
     profit ,year, country, product
FROM sales;

Displaying records 1 - 10
RANK	profit	year	country	product
1	1500	2000	Finland	Computer
1	1500	2000	USA	Computer
1	1500	2001	USA	Computer
4	1200	2000	India	Computer
4	1200	2001	USA	Computer
6	150	2001	USA	TV
7	100	2000	Finland	Phone
7	100	2001	USA	TV
9	75	2000	India	Calculator
9	75	2000	India	Calculator

`DENSE_RANK()`

Tip

we can use DENSE_RANK() if we need ranks to be ordered or require a further partition

SELECT DENSE_RANK() OVER(PARTITION BY country ORDER BY profit DESC) AS RANK, 
     profit ,year, country, product
FROM sales;

Displaying records 1 - 10
RANK	profit	year	country	product
1	1500	2000	Finland	Computer
2	100	2000	Finland	Phone
3	10	2001	Finland	Phone
1	1200	2000	India	Computer
2	75	2000	India	Calculator
2	75	2000	India	Calculator
1	1500	2000	USA	Computer
1	1500	2001	USA	Computer
2	1200	2001	USA	Computer
3	150	2001	USA	TV

Note

now we have ranked our profits by country starting with the highest profit in each country

Aggregations

window functions are similar to GROUP BY aggregate function but all rows stay in the output

using the sales information table, these two queries perform aggregate operations that produce a single global sum for all rows taken as a group, and sums grouped per country:

SELECT SUM(profit) AS total_profit
FROM sales;

1 records
total_profit
7535

lets us see a GROUP BY in action

SELECT country, SUM(profit) AS country_profit
       FROM sales
       GROUP BY country
       ORDER BY country;

3 records
country	country_profit
Finland	1610
India	1350
USA	4575

Tip

On the other hand , window operations do not collapse groups of query rows to a single output row. Instead, they produce a result for each row. Like the preceding queries, the following query uses SUM(), but this time as a window function:

SELECT
year, country, product, profit,
         SUM(profit) OVER() AS total_profit,
         SUM(profit) OVER(PARTITION BY country) AS country_profit
FROM sales
ORDER BY country, year, product, profit;

Displaying records 1 - 10
year	country	product	profit	total_profit	country_profit
2000	Finland	Computer	1500	7535	1610
2000	Finland	Phone	100	7535	1610
2001	Finland	Phone	10	7535	1610
2000	India	Calculator	75	7535	1350
2000	India	Calculator	75	7535	1350
2000	India	Computer	1200	7535	1350
2000	USA	Calculator	75	7535	4575
2000	USA	Computer	1500	7535	4575
2001	USA	Calculator	50	7535	4575
2001	USA	Computer	1200	7535	4575

Note

we have included OVER clause that specifies how to partition query rows into groups for processing by the window function:

The first OVER clause is empty, which treats the entire set of query rows as a single partition. The window function thus produces a global sum, but does so for each row.
The second OVER clause partitions rows by country, producing a sum per partition (per country). The function produces this sum for each partition row.

Window functions are permitted only in the select list and ORDER BY clause. Query result rows are determined from the FROM clause, after WHERE, GROUP BY, and HAVING processing, and windowing execution occurs before ORDER BY, LIMIT, and SELECT DISTINCT.

WITH CLAUSE

This is used to create a temporary relation such that the output of this temporary relation is available and is used by the query that is associated with the WITH clause. for instance the following table can be created as a temporary table

FRAMES

Note

Frames allow you to “peek” forwards or backward without first using the relative fetching functions, LAG and LEAD, to fetch previous rows’ values into the current row.

ROWS BETWEEN

ROWS BETWEEN [START] AND [FINISH]
n PRECEDING : n rows before current row
CURRENT ROW : the current row
n FOLLOWING : n rows after the current row

the following query calculates the number of sales per year and per country(i.e grouped by year and country)

SELECT
    year, country, COUNT(*) AS num_sales
  FROM sales
  GROUP BY year, country;

5 records
year	country	num_sales
2000	Finland	2
2000	India	3
2000	USA	2
2001	Finland	1
2001	USA	5

like soo…

`WITH` and a `FRAME`

We create a temporary table called country_sales

WITH Country_sales AS (
  SELECT
    year, country, COUNT(*) AS num_sales
  FROM sales
  GROUP BY year, country)

SELECT
  year, country, num_sales,
  -- Calculate each country's 3s-ales moving total
  SUM(num_sales) OVER
    (PARTITION BY country
     ORDER BY year ASC
     ROWS BETWEEN
     2 PRECEDING AND CURRENT ROW) AS sales_MA
FROM Country_sales
ORDER BY country ASC, year ASC;

5 records
year	country	num_sales	sales_MA
2000	Finland	2	2
2001	Finland	1	3
2000	India	3	3
2000	USA	2	2
2001	USA	5	7

Window functions

Anatomy of a window function

Creating a fake dataset in R

Load in the necessary packages

Set up a database using the datafile

Lets get started

ROW_NUMBER() OVER()

PARTITION BY

RANK()

DENSE_RANK()

Aggregations

WITH CLAUSE

FRAMES

ROWS BETWEEN

WITH and a FRAME

`ROW_NUMBER() OVER()`

`DENSE_RANK()`

`WITH` and a `FRAME`