All pages
Powered by GitBook
2 of 18

Window Functions

Explore window functions in MariaDB Server. This section details SQL functions that perform calculations across a set of table rows related to the current row, enabling advanced analytical queries.

Window Functions Overview

Introduction

Window functions allow calculations to be performed across a set of rows related to the current row.

Syntax

function (expression) OVER (
  [ PARTITION BY expression_list ]
  [ ORDER BY order_list [ frame_clause ] ] ) 

function:
  A valid window function

expression_list:
  expression | column_name [, expr_list ]

order_list:
  expression | column_name [ ASC | DESC ] 
  [, ... ]

frame_clause:
  {ROWS | RANGE} {frame_border | BETWEEN frame_border AND frame_border}

frame_border:
  | UNBOUNDED PRECEDING
  | UNBOUNDED FOLLOWING
  | CURRENT ROW
  | expr PRECEDING
  | expr FOLLOWING

Description

Window functions perform calculations across a set of rows (in a defined window).

Dedicated window functions include

  • CUME_DIST

  • DENSE_RANK

  • FIRST_VALUE

  • LAG

  • LAST_VALUE

  • LEAD

  • MEDIAN

  • NTH_VALUE

  • NTILE

  • PERCENT_RANK

  • PERCENTILE_CONT

  • PERCENTILE_DISC

  • RANK, ROW_NUMBER

Aggregate functions that can also be used as window functions include

  • AVG

  • BIT_AND

  • BIT_OR

  • BIT_XOR

  • COUNT

  • MAX

  • MIN

  • STD

  • STDDEV

  • STDDEV_POP

  • STDDEV_SAMP

  • SUM

  • VAR_POP

  • VAR_SAMP

  • VARIANCE

Window function queries are characterised by the OVER keyword, following which the set of rows used for the calculation is specified. By default, the set of rows used for the calculation (the "window) is the entire dataset, which can be ordered with the ORDER BY clause. The PARTITION BY clause is used to reduce the window to a particular group within the dataset.

Consider the following data:

CREATE TABLE student (name CHAR(10), test CHAR(10), score TINYINT); 

INSERT INTO student VALUES 
  ('Chun', 'SQL', 75), ('Chun', 'Tuning', 73), 
  ('Esben', 'SQL', 43), ('Esben', 'Tuning', 31), 
  ('Kaolin', 'SQL', 56), ('Kaolin', 'Tuning', 88), 
  ('Tatiana', 'SQL', 87), ('Tatiana', 'Tuning', 83);

The following two queries return the average partitioned by test and by name respectively:

SELECT name, test, score, AVG(score) OVER (PARTITION BY test) 
  AS average_by_test FROM student;
+---------+--------+-------+-----------------+
| name    | test   | score | average_by_test |
+---------+--------+-------+-----------------+
| Chun    | SQL    |    75 |         65.2500 |
| Chun    | Tuning |    73 |         68.7500 |
| Esben   | SQL    |    43 |         65.2500 |
| Esben   | Tuning |    31 |         68.7500 |
| Kaolin  | SQL    |    56 |         65.2500 |
| Kaolin  | Tuning |    88 |         68.7500 |
| Tatiana | SQL    |    87 |         65.2500 |
| Tatiana | Tuning |    83 |         68.7500 |
+---------+--------+-------+-----------------+

SELECT name, test, score, AVG(score) OVER (PARTITION BY name) 
  AS average_by_name FROM student;
+---------+--------+-------+-----------------+
| name    | test   | score | average_by_name |
+---------+--------+-------+-----------------+
| Chun    | SQL    |    75 |         74.0000 |
| Chun    | Tuning |    73 |         74.0000 |
| Esben   | SQL    |    43 |         37.0000 |
| Esben   | Tuning |    31 |         37.0000 |
| Kaolin  | SQL    |    56 |         72.0000 |
| Kaolin  | Tuning |    88 |         72.0000 |
| Tatiana | SQL    |    87 |         85.0000 |
| Tatiana | Tuning |    83 |         85.0000 |
+---------+--------+-------+-----------------+

It is also possible to specify which rows to include for the window function (for example, the current row and all preceding rows). See Window Frames for more details.

Scope

Window functions were introduced in SQL:2003, and their definition was expanded in subsequent versions of the standard. The last expansion was in the latest version of the standard, SQL:2011.

Most database products support a subset of the standard, they implement some functions defined as late as in SQL:2011, and at the same time leave some parts of SQL:2008 unimplemented.

MariaDB:

  • Supports ROWS and RANGE-type frames

    • All kinds of frame bounds are supported, including RANGE PRECEDING|FOLLOWING n frame bounds (unlike PostgreSQL or MS SQL Server)

    • Does not yet support DATE[TIME] datatype and arithmetic for RANGE-type frames (MDEV-9727)

  • Does not support GROUPS-type frames (it seems that no popular database supports it, either)

  • Does not support frame exclusion (no other database seems to support it, either) (MDEV-9724)

  • Does not support explicit NULLS FIRST or NULLS LAST.

  • Does not support nested navigation in window functions (this is VALUE_OF(expr AT row_marker [, default_value) syntax)

  • The following window functions are supported:

    • "Streamable" window functions: ROW_NUMBER, RANK, DENSE_RANK,

    • Window functions that can be streamed once the number of rows in partition is known: PERCENT_RANK, CUME_DIST, NTILE

  • Aggregate functions that are currently supported as window functions are: COUNT, SUM, AVG, BIT_OR, BIT_AND, BIT_XOR.

  • Aggregate functions with the DISTINCT specifier (e.g. COUNT( DISTINCT x)) are not supported as window functions.

Links

  • MDEV-6115 is the main jira task for window functions development. Other tasks are attached as sub-tasks.

  • bb-10.2-mdev9543 is the feature tree for window functions. Development is ongoing, and this tree has the newest changes.

  • Test cases are in mysql-test/t/win*.test .

Examples

Given the following sample data:

CREATE TABLE users (
  email VARCHAR(30), 
  first_name VARCHAR(30), 
  last_name VARCHAR(30), 
  account_type VARCHAR(30)
);

INSERT INTO users VALUES 
  ('admin@boss.org', 'Admin', 'Boss', 'admin'), 
  ('bob.carlsen@foo.bar', 'Bob', 'Carlsen', 'regular'),
  ('eddie.stevens@data.org', 'Eddie', 'Stevens', 'regular'),
  ('john.smith@xyz.org', 'John', 'Smith', 'regular'), 
  ('root@boss.org', 'Root', 'Chief', 'admin')

First, let's order the records by email alphabetically, giving each an ascending rnum value starting with 1. This will make use of the ROW_NUMBER window function:

SELECT row_number() OVER (ORDER BY email) AS rnum,
    email, first_name, last_name, account_type
FROM users ORDER BY email;
+------+------------------------+------------+-----------+--------------+
| rnum | email                  | first_name | last_name | account_type |
+------+------------------------+------------+-----------+--------------+
|    1 | admin@boss.org         | Admin      | Boss      | admin        |
|    2 | bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
|    3 | eddie.stevens@data.org | Eddie      | Stevens   | regular      |
|    4 | john.smith@xyz.org     | John       | Smith     | regular      |
|    5 | root@boss.org          | Root       | Chief     | admin        |
+------+------------------------+------------+-----------+--------------

We can generate separate sequences based on account type, using the PARTITION BY clause:

SELECT row_number() OVER (PARTITION BY account_type ORDER BY email) AS rnum, 
  email, first_name, last_name, account_type 
FROM users ORDER BY account_type,email;
+------+------------------------+------------+-----------+--------------+
| rnum | email                  | first_name | last_name | account_type |
+------+------------------------+------------+-----------+--------------+
|    1 | admin@boss.org         | Admin      | Boss      | admin        |
|    2 | root@boss.org          | Root       | Chief     | admin        |
|    1 | bob.carlsen@foo.bar    | Bob        | Carlsen   | regular      |
|    2 | eddie.stevens@data.org | Eddie      | Stevens   | regular      |
|    3 | john.smith@xyz.org     | John       | Smith     | regular      |
+------+------------------------+------------+-----------+--------------+

Given the following structure and data, we want to find the top 5 salaries from each department.

CREATE TABLE employee_salaries (dept VARCHAR(20), name VARCHAR(20), salary INT(11));

INSERT INTO employee_salaries VALUES
('Engineering', 'Dharma', 3500),
('Engineering', 'Binh', 3000),
('Engineering', 'Adalynn', 2800),
('Engineering', 'Samuel', 2500),
('Engineering', 'Cveta', 2200),
('Engineering', 'Ebele', 1800),
('Sales', 'Carbry', 500),
('Sales', 'Clytemnestra', 400),
('Sales', 'Juraj', 300),
('Sales', 'Kalpana', 300),
('Sales', 'Svantepolk', 250),
('Sales', 'Angelo', 200);

We could do this without using window functions, as follows:

select dept, name, salary
from employee_salaries as t1
where (select count(t2.salary)
       from employee_salaries as t2
       where t1.name != t2.name and
             t1.dept = t2.dept and
             t2.salary > t1.salary) < 5
order by dept, salary desc;

+-------------+--------------+--------+
| dept        | name         | salary |
+-------------+--------------+--------+
| Engineering | Dharma       |   3500 |
| Engineering | Binh         |   3000 |
| Engineering | Adalynn      |   2800 |
| Engineering | Samuel       |   2500 |
| Engineering | Cveta        |   2200 |
| Sales       | Carbry       |    500 |
| Sales       | Clytemnestra |    400 |
| Sales       | Juraj        |    300 |
| Sales       | Kalpana      |    300 |
| Sales       | Svantepolk   |    250 |
+-------------+--------------+--------+

This has a number of disadvantages:

  • If there is no index, the query could take a long time if the employee_salary_table is large.

  • Adding and maintaining indexes adds overhead, and even with indexes on dept and salary, each subquery execution adds overhead by performing a lookup through the index.

Let's try achieve the same with window functions. First, generate a rank for all employees, using the RANK function.

SELECT rank() OVER (PARTITION BY dept ORDER BY salary DESC) AS ranking,
    dept, name, salary
    FROM employee_salaries
    ORDER BY dept, ranking;
+---------+-------------+--------------+--------+
| ranking | dept        | name         | salary |
+---------+-------------+--------------+--------+
|       1 | Engineering | Dharma       |   3500 |
|       2 | Engineering | Binh         |   3000 |
|       3 | Engineering | Adalynn      |   2800 |
|       4 | Engineering | Samuel       |   2500 |
|       5 | Engineering | Cveta        |   2200 |
|       6 | Engineering | Ebele        |   1800 |
|       1 | Sales       | Carbry       |    500 |
|       2 | Sales       | Clytemnestra |    400 |
|       3 | Sales       | Juraj        |    300 |
|       3 | Sales       | Kalpana      |    300 |
|       5 | Sales       | Svantepolk   |    250 |
|       6 | Sales       | Angelo       |    200 |
+---------+-------------+--------------+--------+

Each department has a separate sequence of ranks due to the PARTITION BY clause. This particular sequence of values for rank() is given by the ORDER BY clause inside the window function’s OVER clause. Finally, to get our results in a readable format we order the data by dept and the newly generated ranking column.

Now, we need to reduce the results to find only the top 5 per department. Here is a common mistake:

select
rank() over (partition by dept order by salary desc) as ranking,
dept, name, salary
from employee_salaries
where ranking <= 5
order by dept, ranking;

ERROR 1054 (42S22): Unknown column 'ranking' in 'where clause'

Trying to filter only the first 5 values per department by putting a where clause in the statement does not work, due to the way window functions are computed. The computation of window functions happens after all WHERE, GROUP BY and HAVING clauses have been completed, right before ORDER BY, so the WHERE clause has no idea that the ranking column exists. It is only present after we have filtered and grouped all the rows.

To counteract this problem, we need to wrap our query into a derived table. We can then attach a where clause to it:

select *from (select rank() over (partition by dept order by salary desc) as ranking,
  dept, name, salary
from employee_salaries) as salary_ranks
where (salary_ranks.ranking <= 5)
  order by dept, ranking;
+---------+-------------+--------------+--------+
| ranking | dept        | name         | salary |
+---------+-------------+--------------+--------+
|       1 | Engineering | Dharma       |   3500 |
|       2 | Engineering | Binh         |   3000 |
|       3 | Engineering | Adalynn      |   2800 |
|       4 | Engineering | Samuel       |   2500 |
|       5 | Engineering | Cveta        |   2200 |
|       1 | Sales       | Carbry       |    500 |
|       2 | Sales       | Clytemnestra |    400 |
|       3 | Sales       | Juraj        |    300 |
|       3 | Sales       | Kalpana      |    300 |
|       5 | Sales       | Svantepolk   |    250 |
+---------+-------------+--------------+--------+

See Also

  • Window Frames

  • Introduction to Window Functions in MariaDB Server 10.2

This page is licensed: CC BY-SA / Gnu FDL

Aggregate Functions as Window Functions

It is possible to use aggregate functions as window functions. An aggregate function used as a window function must have the OVER clause. For example, here's COUNT() used as a window function:

SELECT COUNT(*) OVER (ORDER BY column) FROM table;

MariaDB currently allows these aggregate functions to be used as window functions:

  • AVG

  • BIT_AND

  • BIT_OR

  • BIT_XOR

  • COUNT

  • MAX

  • MIN

  • STD

  • STDDEV

  • STDDEV_POP

  • STDDEV_SAMP

  • SUM

  • VAR_POP

  • VAR_SAMP

  • VARIANCE

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Window Functions

Introduction

MariaDB ColumnStore provides support for window functions broadly following the SQL 2003 specification. A window function allows for calculations relating to a window of data surrounding the current row in a result set. This capability provides for simplified queries in support of common business questions such as cumulative totals, rolling averages, and top 10 lists.

Aggregate functions are utilized for window functions however differ in behavior from a group by query because the rows remain ungrouped. This provides support for cumulative sums and rolling averages, for example.

Two key concepts for window functions are Partition and Frame:

  • A Partition is a group of rows, or window, that have the same value for a specific column, for example a Partition can be created over a time period such as a quarter or lookup values.

  • The Frame for each row is a subset of the row's Partition. The frame typically is dynamic allowing for a sliding frame of rows within the Partition. The Frame determines the range of rows for the windowing function. A Frame could be defined as the last X rows and next Y rows all the way up to the entire Partition.

Window functions are applied after joins, group by, and having clauses are calculated.

Syntax

A window function is applied in the select clause using the following syntax:

function_name ([expression [, expression ... ]]) OVER ( window_definition )

where window_definition is defined as:

[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

PARTITION BY:

  • Divides the window result set into groups based on one or more expressions.

  • An expression may be a constant, column, and non window function expressions.

  • A query is not limited to a single partition by clause. Different partition clauses can be used across different window function applications.

  • The partition by columns do not need to be in the select list but do need to be available from the query result set.

  • If there is no PARTITION BY clause, all rows of the result set define the group.

ORDER BY:

  • Defines the ordering of values within the partition.

  • Can be ordered by multiple keys which may be a constant, column or non window function expression.

  • The order by columns do not need to be in the select list but need to be available from the query result set.

  • Use of a select column alias from the query is not supported.

  • ASC (default) and DESC options allow for ordering ascending or descending.

  • NULLS FIRST and NULL_LAST options specify whether null values come first or last in the ordering sequence. NULLS_FIRST is the default for ASC order, and NULLS_LAST is the default for DESC order.

and the optional frame_clause is defined as:

{ RANGE | ROWS } frame_start
{ RANGE | ROWS } BETWEEN frame_start AND frame_end

and the optional frame_start and frame_end are defined as (value being a numeric expression):

UNBOUNDED PRECEDING
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING

RANGE/ROWS:

  • Defines the windowing clause for calculating the set of rows that the function applies to for calculating a given rows window function result.

  • Requires an ORDER BY clause to define the row order for the window.

  • ROWS specify the window in physical units, i.e. result set rows and must be a constant or expression evaluating to a positive numeric value.

  • RANGE specifies the window as a logical offset. If the expression evaluates to a numeric value then the ORDER BY expression must be a numeric or DATE type. If the expression evaluates to an interval value then the ORDER BY expression must be a DATE data type.

  • UNBOUNDED PRECEDING indicates the window starts at the first row of the partition.

  • UNBOUNDED FOLLOWING indicates the window ends at the last row of the partition.

  • CURRENT ROW specifies the window start or ends at the current row or value.

  • If omitted, the default is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Supported Functions

Function
Description

AVG()

The average of all input values.

COUNT()

Number of input rows.

CUME_DIST()

Calculates the cumulative distribution, or relative rank, of the current row to other rows in the same partition. Number of peer or preceding rows / number of rows in partition.

DENSE_RANK()

Ranks items in a group leaving no gaps in ranking sequence when there are ties.

FIRST_VALUE()

The value evaluated at the row that is the first row of the window frame (counting from 1); null if no such row.

LAG()

The value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null. LAG provides access to more than one row of a table at the same time without a self-join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position.

LAST_VALUE()

The value evaluated at the row that is the last row of the window frame (counting from 1); null if no such row.

LEAD()

Provides access to a row at a given physical offset beyond that position. Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null.

MAX()

Maximum value of expression across all input values.

MEDIAN()

An inverse distribution function that assumes a continuous distribution model. It takes a numeric or datetime value and returns the middle value or an interpolated value that would be the middle value once the values are sorted. Nulls are ignored in the calculation.

MIN()

Minimum value of expression across all input values.

NTH_VALUE()

The value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row.

NTILE()

Divides an ordered data set into a number of buckets indicated by expr and assigns the appropriate bucket number to each row. The buckets are numbered 1 through expr. The expr value must resolve to a positive constant for each partition. Integer ranging from 1 to the argument value, dividing the partition as equally as possible.

PERCENT_RANK()

relative rank of the current row: (rank - 1) / (total rows - 1).

PERCENTILE_CONT()

An inverse distribution function that assumes a continuous distribution model. It takes a percentile value and a sort specification, and returns an interpolated value that would fall into that percentile value with respect to the sort specification. Nulls are ignored in the calculation.

PERCENTILE_DISC()

An inverse distribution function that assumes a discrete distribution model. It takes a percentile value and a sort specification and returns an element from the set. Nulls are ignored in the calculation.

RANK()

rank of the current row with gaps; same as row_number of its first peer.

ROW_NUMBER()

number of the current row within its partition, counting from 1

STDDEV() STDDEV_POP()

Computes the population standard deviation and returns the square root of the population variance.

STDDEV_SAMP()

Computes the cumulative sample standard deviation and returns the square root of the sample variance.

SUM()

Sum of expression across all input values.

VARIANCE() VAR_POP()

Population variance of the input values (square of the population standard deviation).

VAR_SAMP()

Sample variance of the input values (square of the sample standard deviation).

Examples

Example Schema

The examples are all based on the following simplified sales opportunity table:

create table opportunities (
id int,
accountName varchar(20),
name varchar(128),
owner varchar(7),
amount decimal(10,2),
closeDate date,
stageName varchar(11)
) engine=columnstore;

Some example values are (thanks to www.mockaroo.com for sample data generation):

id
accountName
name
owner
amount
closeDate
stageName

1

Browseblab

Multi-lateral executive function

Bob

26444.86

2016-10-20

Negotiating

2

Mita

Organic demand-driven benchmark

Maria

477878.41

2016-11-28

ClosedWon

3

Miboo

De-engineered hybrid groupware

Olivier

80181.78

2017-01-05

ClosedWon

4

Youbridge

Enterprise-wide bottom-line Graphic Interface

Chris

946245.29

2016-07-02

ClosedWon

5

Skyba

Reverse-engineered fresh-thinking standardization

Maria

696241.82

2017-02-17

Negotiating

6

Eayo

Fundamental well-modulated artificial intelligence

Bob

765605.52

2016-08-27

Prospecting

7

Yotz

Extended secondary infrastructure

Chris

319624.20

2017-01-06

ClosedLost

8

Oloo

Configurable web-enabled data-warehouse

Chris

321016.26

2017-03-08

ClosedLost

9

Kaymbo

Multi-lateral web-enabled definition

Bob

690881.01

2017-01-02

Developing

10

Rhyloo

Public-key coherent infrastructure

Chris

965477.74

2016-11-07

Prospecting

The schema, sample data, and queries are available as an attachment to this article.

Cumulative Sum and Running Max Example

Window functions can be used to achieve cumulative / running calculations on a detail report. In this case a won opportunity report for a 7 day period adds columns to show the accumulated won amount as well as the current highest opportunity amount in preceding rows.

select owner, 
accountName, 
CloseDate, 
amount, 
sum(amount) over (order by CloseDate rows between unbounded preceding and current row) cumeWon, 
max(amount) over (order by CloseDate rows between unbounded preceding and current row) runningMax
from opportunities 
where stageName='ClosedWon' 
and closeDate >= '2016-10-02' and closeDate <= '2016-10-09' 
order by CloseDate;

with example results:

owner
accountName
CloseDate
amount
cumeWon
runningMax

Bill

Babbleopia

2016-10-02

437636.47

437636.47

437636.47

Bill

Thoughtworks

2016-10-04

146086.51

583722.98

437636.47

Olivier

Devpulse

2016-10-05

834235.93

1417958.91

834235.93

Chris

Linkbridge

2016-10-07

539977.45

2458738.65

834235.93

Olivier

Trupe

2016-10-07

500802.29

1918761.20

834235.93

Bill

Latz

2016-10-08

857254.87

3315993.52

857254.87

Chris

Avamm

2016-10-09

699566.86

4015560.38

857254.87

Partitioned Cumulative Sum and Running Max Example

The above example can be partitioned, so that the window functions are over a particular field grouping such as owner and accumulate within that grouping. This is achieved by adding the syntax "partition by" in the window function clause.

select owner,  
accountName,  
CloseDate,  
amount,  
sum(amount) over (partition by owner order by CloseDate rows between unbounded preceding and current row) cumeWon,  
max(amount) over (partition by owner order by CloseDate rows between unbounded preceding and current row) runningMax 
from opportunities  
where stageName='ClosedWon' 
and closeDate >= '2016-10-02' and closeDate <= '2016-10-09'  
order by owner, CloseDate;

With example results:

owner
accountName
CloseDate
amount
cumeWon
runningMax

Bill

Babbleopia

2016-10-02

437636.47

437636.47

437636.47

Bill

Thoughtworks

2016-10-04

146086.51

583722.98

437636.47

Bill

Latz

2016-10-08

857254.87

1440977.85

857254.87

Chris

Linkbridge

2016-10-07

539977.45

539977.45

539977.45

Chris

Avamm

2016-10-09

699566.86

1239544.31

699566.86

Olivier

Devpulse

2016-10-05

834235.93

834235.93

834235.93

Olivier

Trupe

2016-10-07

500802.29

1335038.22

834235.93

Ranking / Top Results

The rank window function allows for ranking or assigning a numeric order value based on the window function definition. Using the Rank() function will result in the same value for ties / equal values and the next rank value skipped. The Dense_Rank() function behaves similarly except the next consecutive number is used after a tie rather than skipped. The Row_Number() function will provide a unique ordering value. The example query shows the Rank() function being applied to rank sales reps by the number of opportunities for Q4 2016.

select owner, 
wonCount, 
rank() over (order by wonCount desc) rank 
from (
  select owner, 
  count(*) wonCount 
  from opportunities 
  where stageName='ClosedWon' 
  and closeDate >= '2016-10-01' and closeDate < '2016-12-31'  
  group by owner
) t
order by rank;

with example results (note the query is technically incorrect by using closeDate < '2016-12-31' however this creates a tie scenario for illustrative purposes):

owner
wonCount
rank

Bill

19

1

Chris

15

2

Maria

14

3

Bob

14

3

Olivier

10

5

If the dense_rank function is used the rank values would be 1,2,3,3,4 and for the row_number function the values would be 1,2,3,4,5.

First and Last Values

The first_value and last_value functions allow determining the first and last values of a given range. Combined with a group by this allows summarizing opening and closing values. The example shows a more complex case where detailed information is presented for first and last opportunity by quarter.

select a.year, 
a.quarter, 
f.accountName firstAccountName, 
f.owner firstOwner, 
f.amount firstAmount, 
l.accountName lastAccountName, 
l.owner lastOwner, 
l.amount lastAmount 
from (
  select year, 
  quarter, 
  min(firstId) firstId, 
  min(lastId) lastId 
  from (
    select year(closeDate) year, 
    quarter(closeDate) quarter, 
    first_value(id) over (partition by year(closeDate), quarter(closeDate) order by closeDate rows between unbounded preceding and current row) firstId, 
    last_value(id) over (partition by year(closeDate), quarter(closeDate) order by closeDate rows between current row and unbounded following) lastId 
    from opportunities  where stageName='ClosedWon'
  ) t 
  group by year, quarter order by year,quarter
) a 
join opportunities f on a.firstId = f.id 
join opportunities l on a.lastId = l.id 
order by year, quarter;

with example results:

year
quarter
firstAccountName
firstOwner
firstAmount
lastAccountName
lastOwner
lastAmount

2016

3

Skidoo

Bill

523295.07

Skipstorm

Bill

151420.86

2016

4

Skimia

Chris

961513.59

Avamm

Maria

112493.65

2017

1

Yombu

Bob

536875.51

Skaboo

Chris

270273.08

Prior and Next Example

Sometimes it useful to understand the previous and next values in the context of a given row. The lag and lead window functions provide this capability. By default the offset is one providing the prior or next value but can also be provided to get a larger offset. The example query is a report of opportunities by account name showing the opportunity amount, and the prior and next opportunity amount for that account by close date.

select accountName, 
closeDate,  
amount currentOppAmount, 
lag(amount) over (partition by accountName order by closeDate) priorAmount, lead(amount) over (partition by accountName order by closeDate) nextAmount 
from opportunities 
order by accountName, closeDate 
limit 9;

With example results:

accountName
closeDate
currentOppAmount
priorAmount
nextAmount

Abata

2016-09-10

645098.45

NULL

161086.82

Abata

2016-10-14

161086.82

645098.45

350235.75

Abata

2016-12-18

350235.75

161086.82

878595.89

Abata

2016-12-31

878595.89

350235.75

922322.39

Abata

2017-01-21

922322.39

878595.89

NULL

Abatz

2016-10-19

795424.15

NULL

NULL

Agimba

2016-07-09

288974.84

NULL

914461.49

Agimba

2016-09-07

914461.49

288974.84

176645.52

Agimba

2016-09-20

176645.52

914461.49

NULL

Quartiles Example

The NTile window function allows for breaking up a data set into portions assigned a numeric value to each portion of the range. NTile(4) breaks the data up into quartiles (4 sets). The example query produces a report of all opportunities summarizing the quartile boundaries of amount values.

select t.quartile, 
min(t.amount) min, 
max(t.amount) max 
from (
  select amount, 
  ntile(4) over (order by amount asc) quartile 
  from opportunities 
  where closeDate >= '2016-10-01' and closeDate <= '2016-12-31'
  ) t 
group by quartile 
order by quartile;

With example results:

quartile
min
max

1

6337.15

287634.01

2

288796.14

539977.45

3

540070.04

748727.51

4

753670.77

998864.47

Percentile Example

The percentile functions have a slightly different syntax from other window functions as can be seen in the example below. These functions can be only applied against numeric values. The argument to the function is the percentile to evaluate. Following 'within group' is the sort expression which indicates the sort column and optionally order. Finally after 'over' is an optional partition by clause, for no partition clause use 'over ()'. The example below utilizes the value 0.5 to calculate the median opportunity amount in the rows. The values differ sometimes because percentile_cont will return the average of the 2 middle rows for an even data set while percentile_desc returns the first encountered in the sort.

select owner,  
accountName,  
CloseDate,  
amount,
percentile_cont(0.5) within group (order by amount) over (partition by owner) pct_cont,
percentile_disc(0.5) within group (order by amount) over (partition by owner) pct_disc
from opportunities  
where stageName='ClosedWon' 
and closeDate >= '2016-10-02' and closeDate <= '2016-10-09'  
order by owner, CloseDate;

With example results:

owner
accountName
CloseDate
amount
pct_cont
pct_disc

Bill

Babbleopia

2016-10-02

437636.47

437636.4700000000

437636.47

Bill

Thoughtworks

2016-10-04

146086.51

437636.4700000000

437636.47

Bill

Latz

2016-10-08

857254.87

437636.4700000000

437636.47

Chris

Linkbridge

2016-10-07

539977.45

619772.1550000000

539977.45

Chris

Avamm

2016-10-09

699566.86

619772.1550000000

539977.45

Olivier

Devpulse

2016-10-05

834235.93

667519.1100000000

500802.29

Olivier

Trupe

2016-10-07

500802.29

667519.1100000000

500802.29

This page is licensed: CC BY-SA / Gnu FDL

Window Frames

Syntax

frame_clause:
  {ROWS | RANGE} {frame_border | BETWEEN frame_border AND frame_border}

frame_border:
  | UNBOUNDED PRECEDING
  | UNBOUNDED FOLLOWING
  | CURRENT ROW
  | expr PRECEDING
  | expr FOLLOWING

Description

A basic overview of window functions is described in Window Functions Overview. Window frames expand this functionality by allowing the function to include a specified a number of rows around the current row.

These include:

  • All rows before the current row (UNBOUNDED PRECEDING), for example RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW .

  • All rows after the current row (UNBOUNDED FOLLOWING), for example RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING .

  • A set number of rows before the current row (expr PRECEDING) for example RANGE BETWEEN 6 PRECEDING AND CURRENT ROW .

  • A set number of rows after the current row (expr PRECEDING AND expr FOLLOWING) for example RANGE BETWEEN CURRENT ROW AND 2 FOLLOWING .

  • A specified number of rows both before and after the current row, for example RANGE BETWEEN 6 PRECEDING AND 3 FOLLOWING .

The following functions operate on window frames:

  • AVG

  • BIT_AND

  • BIT_OR

  • BIT_XOR

  • COUNT

  • LEAD

  • MAX

  • MIN

  • NTILE

  • STD

  • STDDEV

  • STDDEV_POP

  • STDDEV_SAMP

  • SUM

  • VAR_POP

  • VAR_SAMP

  • VARIANCE

Window frames are determined by the frame_clause in the window function request.

Take the following example:

CREATE TABLE `student_test` (
  name char(10),
  test char(10),
  score tinyint(4)
);

INSERT INTO student_test VALUES 
    ('Chun', 'SQL', 75), ('Chun', 'Tuning', 73), 
    ('Esben', 'SQL', 43), ('Esben', 'Tuning', 31), 
    ('Kaolin', 'SQL', 56), ('Kaolin', 'Tuning', 88), 
    ('Tatiana', 'SQL', 87);

SELECT name, test, score, SUM(score) 
  OVER () AS total_score 
  FROM student_test;
+---------+--------+-------+-------------+
| name    | test   | score | total_score |
+---------+--------+-------+-------------+
| Chun    | SQL    |    75 |         453 |
| Chun    | Tuning |    73 |         453 |
| Esben   | SQL    |    43 |         453 |
| Esben   | Tuning |    31 |         453 |
| Kaolin  | SQL    |    56 |         453 |
| Kaolin  | Tuning |    88 |         453 |
| Tatiana | SQL    |    87 |         453 |
+---------+--------+-------+-------------+

By not specifying an OVER clause, the SUM function is run over the entire dataset. However, if we specify an ORDER BY condition based on score (and order the entire result in the same way for clarity), the following result is returned:

SELECT name, test, score, SUM(score) 
  OVER (ORDER BY score) AS total_score 
  FROM student_test ORDER BY score;
+---------+--------+-------+-------------+
| name    | test   | score | total_score |
+---------+--------+-------+-------------+
| Esben   | Tuning |    31 |          31 |
| Esben   | SQL    |    43 |          74 |
| Kaolin  | SQL    |    56 |         130 |
| Chun    | Tuning |    73 |         203 |
| Chun    | SQL    |    75 |         278 |
| Tatiana | SQL    |    87 |         365 |
| Kaolin  | Tuning |    88 |         453 |
+---------+--------+-------+-------------+

The total_score column represents a running total of the current row, and all previous rows. The window frame in this example expands as the function proceeds.

The above query makes use of the default to define the window frame. It could be written explicitly as follows:

SELECT name, test, score, SUM(score) 
  OVER (ORDER BY score RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_score 
  FROM student_test ORDER BY score;
+---------+--------+-------+-------------+
| name    | test   | score | total_score |
+---------+--------+-------+-------------+
| Esben   | Tuning |    31 |          31 |
| Esben   | SQL    |    43 |          74 |
| Kaolin  | SQL    |    56 |         130 |
| Chun    | Tuning |    73 |         203 |
| Chun    | SQL    |    75 |         278 |
| Tatiana | SQL    |    87 |         365 |
| Kaolin  | Tuning |    88 |         453 |
+---------+--------+-------+-------------+

Let's look at some alternatives:

Firstly, applying the window function to the current row and all following rows can be done with the use of UNBOUNDED FOLLOWING:

SELECT name, test, score, SUM(score) 
  OVER (ORDER BY score RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS total_score 
  FROM student_test ORDER BY score;
+---------+--------+-------+-------------+
| name    | test   | score | total_score |
+---------+--------+-------+-------------+
| Esben   | Tuning |    31 |         453 |
| Esben   | SQL    |    43 |         422 |
| Kaolin  | SQL    |    56 |         379 |
| Chun    | Tuning |    73 |         323 |
| Chun    | SQL    |    75 |         250 |
| Tatiana | SQL    |    87 |         175 |
| Kaolin  | Tuning |    88 |          88 |
+---------+--------+-------+-------------+

It's possible to specify a number of rows, rather than the entire unbounded following or preceding set. The following example takes the current row, as well as the previous row:

SELECT name, test, score, SUM(score) 
  OVER (ORDER BY score ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS total_score 
  FROM student_test ORDER BY score;
+---------+--------+-------+-------------+
| name    | test   | score | total_score |
+---------+--------+-------+-------------+
| Esben   | Tuning |    31 |          31 |
| Esben   | SQL    |    43 |          74 |
| Kaolin  | SQL    |    56 |          99 |
| Chun    | Tuning |    73 |         129 |
| Chun    | SQL    |    75 |         148 |
| Tatiana | SQL    |    87 |         162 |
| Kaolin  | Tuning |    88 |         175 |
+---------+--------+-------+-------------+

The current row and the following row:

SELECT name, test, score, SUM(score) 
  OVER (ORDER BY score ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS total_score 
  FROM student_test ORDER BY score;
+---------+--------+-------+-------------+
| name    | test   | score | total_score |
+---------+--------+-------+-------------+
| Esben   | Tuning |    31 |          74 |
| Esben   | SQL    |    43 |         130 |
| Kaolin  | SQL    |    56 |         172 |
| Chun    | Tuning |    73 |         204 |
| Chun    | SQL    |    75 |         235 |
| Tatiana | SQL    |    87 |         250 |
| Kaolin  | Tuning |    88 |         175 |
+---------+--------+-------+-------------+

This page is licensed: CC BY-SA / Gnu FDL

CUME_DIST

Syntax

CUME_DIST() OVER ( 
  [ PARTITION BY partition_expression ] 
  [ ORDER BY order_list ]
)

Description

CUME_DIST() is a window function that returns the cumulative distribution of a given row. The following formula is used to calculate the value:

(number of rows <= current row) / (total rows)

Examples

create table t1 (
  pk int primary key,
  a int,
  b int
);


insert into t1 values
( 1 , 0, 10),
( 2 , 0, 10),
( 3 , 1, 10),
( 4 , 1, 10),
( 8 , 2, 10),
( 5 , 2, 20),
( 6 , 2, 20),
( 7 , 2, 20),
( 9 , 4, 20),
(10 , 4, 20);

select pk, a, b,
    rank() over (order by a) as rank,
    percent_rank() over (order by a) as pct_rank,
    cume_dist() over (order by a) as cume_dist
from t1;
+----+------+------+------+--------------+--------------+
| pk | a    | b    | rank | pct_rank     | cume_dist    |
+----+------+------+------+--------------+--------------+
|  1 |    0 |   10 |    1 | 0.0000000000 | 0.2000000000 |
|  2 |    0 |   10 |    1 | 0.0000000000 | 0.2000000000 |
|  3 |    1 |   10 |    3 | 0.2222222222 | 0.4000000000 |
|  4 |    1 |   10 |    3 | 0.2222222222 | 0.4000000000 |
|  5 |    2 |   20 |    5 | 0.4444444444 | 0.8000000000 |
|  6 |    2 |   20 |    5 | 0.4444444444 | 0.8000000000 |
|  7 |    2 |   20 |    5 | 0.4444444444 | 0.8000000000 |
|  8 |    2 |   10 |    5 | 0.4444444444 | 0.8000000000 |
|  9 |    4 |   20 |    9 | 0.8888888889 | 1.0000000000 |
| 10 |    4 |   20 |    9 | 0.8888888889 | 1.0000000000 |
+----+------+------+------+--------------+--------------+

select pk, a, b,
       percent_rank() over (order by pk) as pct_rank,
       cume_dist() over (order by pk) as cume_dist
from t1 order by pk;
+----+------+------+--------------+--------------+
| pk | a    | b    | pct_rank     | cume_dist    |
+----+------+------+--------------+--------------+
|  1 |    0 |   10 | 0.0000000000 | 0.1000000000 |
|  2 |    0 |   10 | 0.1111111111 | 0.2000000000 |
|  3 |    1 |   10 | 0.2222222222 | 0.3000000000 |
|  4 |    1 |   10 | 0.3333333333 | 0.4000000000 |
|  5 |    2 |   20 | 0.4444444444 | 0.5000000000 |
|  6 |    2 |   20 | 0.5555555556 | 0.6000000000 |
|  7 |    2 |   20 | 0.6666666667 | 0.7000000000 |
|  8 |    2 |   10 | 0.7777777778 | 0.8000000000 |
|  9 |    4 |   20 | 0.8888888889 | 0.9000000000 |
| 10 |    4 |   20 | 1.0000000000 | 1.0000000000 |
+----+------+------+--------------+--------------+

select pk, a, b,
        percent_rank() over (partition by a order by a) as pct_rank,
        cume_dist() over (partition by a order by a) as cume_dist
from t1;
+----+------+------+--------------+--------------+
| pk | a    | b    | pct_rank     | cume_dist    |
+----+------+------+--------------+--------------+
|  1 |    0 |   10 | 0.0000000000 | 1.0000000000 |
|  2 |    0 |   10 | 0.0000000000 | 1.0000000000 |
|  3 |    1 |   10 | 0.0000000000 | 1.0000000000 |
|  4 |    1 |   10 | 0.0000000000 | 1.0000000000 |
|  5 |    2 |   20 | 0.0000000000 | 1.0000000000 |
|  6 |    2 |   20 | 0.0000000000 | 1.0000000000 |
|  7 |    2 |   20 | 0.0000000000 | 1.0000000000 |
|  8 |    2 |   10 | 0.0000000000 | 1.0000000000 |
|  9 |    4 |   20 | 0.0000000000 | 1.0000000000 |
| 10 |    4 |   20 | 0.0000000000 | 1.0000000000 |
+----+------+------+--------------+--------------+

See Also

  • PERCENT_RANK()

This page is licensed: CC BY-SA / Gnu FDL

DENSE_RANK

Syntax

DENSE_RANK() OVER (
  [ PARTITION BY partition_expression ]
  [ ORDER BY order_list ]
)

Description

DENSE_RANK() is a window function that displays the number of a given row, starting at one and following the ORDER BY sequence of the window function, with identical values receiving the same result. Unlike the RANK() function, there are no skipped values if the preceding results are identical. It is also similar to the ROW_NUMBER() function except that in that function, identical values will receive a different row number for each result.

Examples

The distinction between DENSE_RANK(), RANK() and ROW_NUMBER():

CREATE TABLE student(course VARCHAR(10), mark int, name varchar(10));

INSERT INTO student VALUES 
  ('Maths', 60, 'Thulile'),
  ('Maths', 60, 'Pritha'),
  ('Maths', 70, 'Voitto'),
  ('Maths', 55, 'Chun'),
  ('Biology', 60, 'Bilal'),
   ('Biology', 70, 'Roger');

SELECT 
  RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank, 
  DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS dense_rank, 
  ROW_NUMBER() OVER (PARTITION BY course ORDER BY mark DESC) AS row_num, 
  course, mark, name 
FROM student ORDER BY course, mark DESC;
+------+------------+---------+---------+------+---------+
| rank | dense_rank | row_num | course  | mark | name    |
+------+------------+---------+---------+------+---------+
|    1 |          1 |       1 | Biology |   70 | Roger   |
|    2 |          2 |       2 | Biology |   60 | Bilal   |
|    1 |          1 |       1 | Maths   |   70 | Voitto  |
|    2 |          2 |       2 | Maths   |   60 | Thulile |
|    2 |          2 |       3 | Maths   |   60 | Pritha  |
|    4 |          3 |       4 | Maths   |   55 | Chun    |
+------+------------+---------+---------+------+---------+

See Also

  • RANK()

  • ROW_NUMBER()

  • ORDER BY

This page is licensed: CC BY-SA / Gnu FDL

FIRST_VALUE

Syntax

FIRST_VALUE(expr) OVER (
  [ PARTITION BY partition_expression ]
  [ ORDER BY order_list ]
)

Description

FIRST_VALUE returns the first result from an ordered set, or NULL if no such result exists.

Examples

CREATE TABLE t1 (
  pk int primary key,
  a int,
  b int,
  c char(10),
  d decimal(10, 3),
  e real
);

INSERT INTO t1 VALUES
( 1, 0, 1,    'one',    0.1,  0.001),
( 2, 0, 2,    'two',    0.2,  0.002),
( 3, 0, 3,    'three',  0.3,  0.003),
( 4, 1, 2,    'three',  0.4,  0.004),
( 5, 1, 1,    'two',    0.5,  0.005),
( 6, 1, 1,    'one',    0.6,  0.006),
( 7, 2, NULL, 'n_one',  0.5,  0.007),
( 8, 2, 1,    'n_two',  NULL, 0.008),
( 9, 2, 2,    NULL,     0.7,  0.009),
(10, 2, 0,    'n_four', 0.8,  0.010),
(11, 2, 10,   NULL,     0.9,  NULL);

SELECT pk, FIRST_VALUE(pk) OVER (ORDER BY pk) AS first_asc,
           LAST_VALUE(pk) OVER (ORDER BY pk) AS last_asc,
           FIRST_VALUE(pk) OVER (ORDER BY pk DESC) AS first_desc,
           LAST_VALUE(pk) OVER (ORDER BY pk DESC) AS last_desc
FROM t1
ORDER BY pk DESC;

+----+-----------+----------+------------+-----------+
| pk | first_asc | last_asc | first_desc | last_desc |
+----+-----------+----------+------------+-----------+
| 11 |         1 |       11 |         11 |        11 |
| 10 |         1 |       10 |         11 |        10 |
|  9 |         1 |        9 |         11 |         9 |
|  8 |         1 |        8 |         11 |         8 |
|  7 |         1 |        7 |         11 |         7 |
|  6 |         1 |        6 |         11 |         6 |
|  5 |         1 |        5 |         11 |         5 |
|  4 |         1 |        4 |         11 |         4 |
|  3 |         1 |        3 |         11 |         3 |
|  2 |         1 |        2 |         11 |         2 |
|  1 |         1 |        1 |         11 |         1 |
+----+-----------+----------+------------+-----------+
CREATE OR REPLACE TABLE t1 (i int);
INSERT INTO t1 VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

SELECT i,
  FIRST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW and 1 FOLLOWING) AS f_1f,
  LAST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW and 1 FOLLOWING) AS l_1f,
  FIRST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS f_1p1f,
  LAST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS l_1p1f,
  FIRST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING) AS f_2p1p,
  LAST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING) AS l_2p1p,
  FIRST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) AS f_1f2f,
  LAST_VALUE(i) OVER (ORDER BY i ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) AS l_1f2f
FROM t1;

+------+------+------+--------+--------+--------+--------+--------+--------+
| i    | f_1f | l_1f | f_1p1f | l_1p1f | f_2p1p | l_2p1p | f_1f2f | l_1f2f |
+------+------+------+--------+--------+--------+--------+--------+--------+
|    1 |    1 |    2 |      1 |      2 |   NULL |   NULL |      2 |      3 |
|    2 |    2 |    3 |      1 |      3 |      1 |      1 |      3 |      4 |
|    3 |    3 |    4 |      2 |      4 |      1 |      2 |      4 |      5 |
|    4 |    4 |    5 |      3 |      5 |      2 |      3 |      5 |      6 |
|    5 |    5 |    6 |      4 |      6 |      3 |      4 |      6 |      7 |
|    6 |    6 |    7 |      5 |      7 |      4 |      5 |      7 |      8 |
|    7 |    7 |    8 |      6 |      8 |      5 |      6 |      8 |      9 |
|    8 |    8 |    9 |      7 |      9 |      6 |      7 |      9 |     10 |
|    9 |    9 |   10 |      8 |     10 |      7 |      8 |     10 |     10 |
|   10 |   10 |   10 |      9 |     10 |      8 |      9 |   NULL |   NULL |
+------+------+------+--------+--------+--------+--------+--------+--------+

See Also

  • LAST_VALUE

This page is licensed: CC BY-SA / Gnu FDL

LAG

Syntax

LAG (expr[, offset]) OVER ( 
  [ PARTITION BY partition_expression ] 
  < ORDER BY order_list >
)

Description

The LAG function accesses data from a previous row according to the ORDER BY clause without the need for a self-join. The specific row is determined by the offset (default 1), which specifies the number of rows behind the current row to use. An offset of 0 is the current row.

Examples

CREATE TABLE t1 (pk int primary key, a int, b int, c char(10), d decimal(10, 3), e real);

INSERT INTO t1 VALUES
 ( 1, 0, 1,    'one',    0.1,  0.001),
 ( 2, 0, 2,    'two',    0.2,  0.002),
 ( 3, 0, 3,    'three',  0.3,  0.003),
 ( 4, 1, 2,    'three',  0.4,  0.004),
 ( 5, 1, 1,    'two',    0.5,  0.005),
 ( 6, 1, 1,    'one',    0.6,  0.006),
 ( 7, 2, NULL, 'n_one',  0.5,  0.007),
 ( 8, 2, 1,    'n_two',  NULL, 0.008),
 ( 9, 2, 2,    NULL,     0.7,  0.009),
 (10, 2, 0,    'n_four', 0.8,  0.010),
 (11, 2, 10,   NULL,     0.9,  NULL);

SELECT pk, LAG(pk) OVER (ORDER BY pk) AS l,
  LAG(pk,1) OVER (ORDER BY pk) AS l1,
  LAG(pk,2) OVER (ORDER BY pk) AS l2,
  LAG(pk,0) OVER (ORDER BY pk) AS l0,
  LAG(pk,-1) OVER (ORDER BY pk) AS lm1,
  LAG(pk,-2) OVER (ORDER BY pk) AS lm2 
FROM t1;
+----+------+------+------+------+------+------+
| pk | l    | l1   | l2   | l0   | lm1  | lm2  |
+----+------+------+------+------+------+------+
|  1 | NULL | NULL | NULL |    1 |    2 |    3 |
|  2 |    1 |    1 | NULL |    2 |    3 |    4 |
|  3 |    2 |    2 |    1 |    3 |    4 |    5 |
|  4 |    3 |    3 |    2 |    4 |    5 |    6 |
|  5 |    4 |    4 |    3 |    5 |    6 |    7 |
|  6 |    5 |    5 |    4 |    6 |    7 |    8 |
|  7 |    6 |    6 |    5 |    7 |    8 |    9 |
|  8 |    7 |    7 |    6 |    8 |    9 |   10 |
|  9 |    8 |    8 |    7 |    9 |   10 |   11 |
| 10 |    9 |    9 |    8 |   10 |   11 | NULL |
| 11 |   10 |   10 |    9 |   11 | NULL | NULL |
+----+------+------+------+------+------+------+

See Also

  • LEAD - Window function to access a following row

This page is licensed: CC BY-SA / Gnu FDL

LEAD

Syntax

LEAD (expr[, offset]) OVER ( 
  [ PARTITION BY partition_expression ] 
  [ ORDER BY order_list ]
)

Description

The LEAD function accesses data from a following row in the same result set without the need for a self-join. The specific row is determined by the offset (default 1), which specifies the number of rows ahead the current row to use. An offset of 0 is the current row.

Example

CREATE TABLE t1 (pk int primary key, a int, b int, c char(10), d decimal(10, 3), e real);

INSERT INTO t1 VALUES
 ( 1, 0, 1,    'one',    0.1,  0.001),
 ( 2, 0, 2,    'two',    0.2,  0.002),
 ( 3, 0, 3,    'three',  0.3,  0.003),
 ( 4, 1, 2,    'three',  0.4,  0.004),
 ( 5, 1, 1,    'two',    0.5,  0.005),
 ( 6, 1, 1,    'one',    0.6,  0.006),
 ( 7, 2, NULL, 'n_one',  0.5,  0.007),
 ( 8, 2, 1,    'n_two',  NULL, 0.008),
 ( 9, 2, 2,    NULL,     0.7,  0.009),
 (10, 2, 0,    'n_four', 0.8,  0.010),
 (11, 2, 10,   NULL,     0.9,  NULL);

SELECT pk, LEAD(pk) OVER (ORDER BY pk) AS l,
  LEAD(pk,1) OVER (ORDER BY pk) AS l1,
  LEAD(pk,2) OVER (ORDER BY pk) AS l2,
  LEAD(pk,0) OVER (ORDER BY pk) AS l0,
  LEAD(pk,-1) OVER (ORDER BY pk) AS lm1,
  LEAD(pk,-2) OVER (ORDER BY pk) AS lm2 
FROM t1;
+----+------+------+------+------+------+------+
| pk | l    | l1   | l2   | l0   | lm1  | lm2  |
+----+------+------+------+------+------+------+
|  1 |    2 |    2 |    3 |    1 | NULL | NULL |
|  2 |    3 |    3 |    4 |    2 |    1 | NULL |
|  3 |    4 |    4 |    5 |    3 |    2 |    1 |
|  4 |    5 |    5 |    6 |    4 |    3 |    2 |
|  5 |    6 |    6 |    7 |    5 |    4 |    3 |
|  6 |    7 |    7 |    8 |    6 |    5 |    4 |
|  7 |    8 |    8 |    9 |    7 |    6 |    5 |
|  8 |    9 |    9 |   10 |    8 |    7 |    6 |
|  9 |   10 |   10 |   11 |    9 |    8 |    7 |
| 10 |   11 |   11 | NULL |   10 |    9 |    8 |
| 11 | NULL | NULL | NULL |   11 |   10 |    9 |
+----+------+------+------+------+------+------+

See Also

  • LAG - Window function to access a previous row

This page is licensed: CC BY-SA / Gnu FDL

MEDIAN

Syntax

MEDIAN(median expression) OVER (
  [ PARTITION BY partition_expression ] 
)

Description

MEDIAN() is a window function that returns the median value of a range of values.

It is a specific case of PERCENTILE_CONT, with an argument of 0.5 and the ORDER BY column the one in MEDIAN's argument.

MEDIAN(<median-arg>) OVER ( [ PARTITION BY partition_expression] )

Is equivalent to:

PERCENTILE_CONT(0.5) WITHIN 
  GROUP (ORDER BY <median-arg>) OVER ( [ PARTITION BY partition_expression ])

Examples

CREATE TABLE book_rating (name CHAR(30), star_rating TINYINT);

INSERT INTO book_rating VALUES ('Lord of the Ladybirds', 5);
INSERT INTO book_rating VALUES ('Lord of the Ladybirds', 3);
INSERT INTO book_rating VALUES ('Lady of the Flies', 1);
INSERT INTO book_rating VALUES ('Lady of the Flies', 2);
INSERT INTO book_rating VALUES ('Lady of the Flies', 5);

SELECT name, median(star_rating) OVER (PARTITION BY name) FROM book_rating;
+-----------------------+----------------------------------------------+
| name                  | median(star_rating) OVER (PARTITION BY name) |
+-----------------------+----------------------------------------------+
| Lord of the Ladybirds |                                 4.0000000000 |
| Lord of the Ladybirds |                                 4.0000000000 |
| Lady of the Flies     |                                 2.0000000000 |
| Lady of the Flies     |                                 2.0000000000 |
| Lady of the Flies     |                                 2.0000000000 |
+-----------------------+----------------------------------------------+

See Also

  • PERCENTILE_CONT

This page is licensed: CC BY-SA / Gnu FDL

NTH_VALUE

Syntax

NTH_VALUE (expr[, num_row]) OVER ( 
  [ PARTITION BY partition_expression ] 
  [ ORDER BY order_list ]
)

Description

The NTH_VALUE function returns the value evaluated at row number num_row of the window frame, starting from 1, or NULL if the row does not exist.

This page is licensed: CC BY-SA / Gnu FDL

NTILE

Syntax

NTILE (expr) OVER ( 
  [ PARTITION BY partition_expression ] 
  [ ORDER BY order_list ]
)

Description

NTILE() is a window function that returns an integer indicating which group a given row falls into. The number of groups is specified in the argument (expr), starting at one. Ordered rows in the partition are divided into the specified number of groups with as equal a size as possible.

Examples

create table t1 (
    pk int primary key,
    a int,
    b int
  );

insert into t1 values
    (11 , 0, 10),
    (12 , 0, 10),
    (13 , 1, 10),
    (14 , 1, 10),
    (18 , 2, 10),
    (15 , 2, 20),
    (16 , 2, 20),
    (17 , 2, 20),
    (19 , 4, 20),
    (20 , 4, 20);

select pk, a, b,
    ntile(1) over (order by pk)
  from t1;
+----+------+------+-----------------------------+
| pk | a    | b    | ntile(1) over (order by pk) |
+----+------+------+-----------------------------+
| 11 |    0 |   10 |                           1 |
| 12 |    0 |   10 |                           1 |
| 13 |    1 |   10 |                           1 |
| 14 |    1 |   10 |                           1 |
| 15 |    2 |   20 |                           1 |
| 16 |    2 |   20 |                           1 |
| 17 |    2 |   20 |                           1 |
| 18 |    2 |   10 |                           1 |
| 19 |    4 |   20 |                           1 |
| 20 |    4 |   20 |                           1 |
+----+------+------+-----------------------------+

select pk, a, b,
    ntile(4) over (order by pk)
 from t1;
+----+------+------+-----------------------------+
| pk | a    | b    | ntile(4) over (order by pk) |
+----+------+------+-----------------------------+
| 11 |    0 |   10 |                           1 |
| 12 |    0 |   10 |                           1 |
| 13 |    1 |   10 |                           1 |
| 14 |    1 |   10 |                           2 |
| 15 |    2 |   20 |                           2 |
| 16 |    2 |   20 |                           2 |
| 17 |    2 |   20 |                           3 |
| 18 |    2 |   10 |                           3 |
| 19 |    4 |   20 |                           4 |
| 20 |    4 |   20 |                           4 |
+----+------+------+-----------------------------+

This page is licensed: CC BY-SA / Gnu FDL

PERCENT_RANK

Syntax

PERCENT_RANK() OVER (
  [ PARTITION BY partition_expression ] 
  [ ORDER BY order_list ]
)

Description

PERCENT_RANK() is a window function that returns the relative percent rank of a given row. The following formula is used to calculate the percent rank:

(rank - 1) / (number of rows in the window or partition - 1)

Examples

create table t1 (
  pk int primary key,
  a int,
  b int
);


insert into t1 values
( 1 , 0, 10),
( 2 , 0, 10),
( 3 , 1, 10),
( 4 , 1, 10),
( 8 , 2, 10),
( 5 , 2, 20),
( 6 , 2, 20),
( 7 , 2, 20),
( 9 , 4, 20),
(10 , 4, 20);

select pk, a, b,
    rank() over (order by a) as rank,
    percent_rank() over (order by a) as pct_rank,
    cume_dist() over (order by a) as cume_dist
from t1;
+----+------+------+------+--------------+--------------+
| pk | a    | b    | rank | pct_rank     | cume_dist    |
+----+------+------+------+--------------+--------------+
|  1 |    0 |   10 |    1 | 0.0000000000 | 0.2000000000 |
|  2 |    0 |   10 |    1 | 0.0000000000 | 0.2000000000 |
|  3 |    1 |   10 |    3 | 0.2222222222 | 0.4000000000 |
|  4 |    1 |   10 |    3 | 0.2222222222 | 0.4000000000 |
|  5 |    2 |   20 |    5 | 0.4444444444 | 0.8000000000 |
|  6 |    2 |   20 |    5 | 0.4444444444 | 0.8000000000 |
|  7 |    2 |   20 |    5 | 0.4444444444 | 0.8000000000 |
|  8 |    2 |   10 |    5 | 0.4444444444 | 0.8000000000 |
|  9 |    4 |   20 |    9 | 0.8888888889 | 1.0000000000 |
| 10 |    4 |   20 |    9 | 0.8888888889 | 1.0000000000 |
+----+------+------+------+--------------+--------------+

select pk, a, b,
       percent_rank() over (order by pk) as pct_rank,
       cume_dist() over (order by pk) as cume_dist
from t1 order by pk;
+----+------+------+--------------+--------------+
| pk | a    | b    | pct_rank     | cume_dist    |
+----+------+------+--------------+--------------+
|  1 |    0 |   10 | 0.0000000000 | 0.1000000000 |
|  2 |    0 |   10 | 0.1111111111 | 0.2000000000 |
|  3 |    1 |   10 | 0.2222222222 | 0.3000000000 |
|  4 |    1 |   10 | 0.3333333333 | 0.4000000000 |
|  5 |    2 |   20 | 0.4444444444 | 0.5000000000 |
|  6 |    2 |   20 | 0.5555555556 | 0.6000000000 |
|  7 |    2 |   20 | 0.6666666667 | 0.7000000000 |
|  8 |    2 |   10 | 0.7777777778 | 0.8000000000 |
|  9 |    4 |   20 | 0.8888888889 | 0.9000000000 |
| 10 |    4 |   20 | 1.0000000000 | 1.0000000000 |
+----+------+------+--------------+--------------+

select pk, a, b,
        percent_rank() over (partition by a order by a) as pct_rank,
        cume_dist() over (partition by a order by a) as cume_dist
from t1;
+----+------+------+--------------+--------------+
| pk | a    | b    | pct_rank     | cume_dist    |
+----+------+------+--------------+--------------+
|  1 |    0 |   10 | 0.0000000000 | 1.0000000000 |
|  2 |    0 |   10 | 0.0000000000 | 1.0000000000 |
|  3 |    1 |   10 | 0.0000000000 | 1.0000000000 |
|  4 |    1 |   10 | 0.0000000000 | 1.0000000000 |
|  5 |    2 |   20 | 0.0000000000 | 1.0000000000 |
|  6 |    2 |   20 | 0.0000000000 | 1.0000000000 |
|  7 |    2 |   20 | 0.0000000000 | 1.0000000000 |
|  8 |    2 |   10 | 0.0000000000 | 1.0000000000 |
|  9 |    4 |   20 | 0.0000000000 | 1.0000000000 |
| 10 |    4 |   20 | 0.0000000000 | 1.0000000000 |
+----+------+------+--------------+--------------+

See Also

  • CUME_DIST()

This page is licensed: CC BY-SA / Gnu FDL

PERCENTILE_CONT

Syntax

Description

PERCENTILE_CONT() (standing for continuous percentile) is a window function which returns a value which corresponds to the given fraction in the sort order. If required, it will interpolate between adjacent input items.

Essentially, the following process is followed to find the value to return:

  • Get the number of rows in the partition, denoted by N

  • RN = p*(N-1), where p denotes the argument to the PERCENTILE_CONT function

  • calculate the FRN(floor row number) and CRN(column row number for the group( FRN= floor(RN) and CRN = ceil(RN))

  • look up rows FRN and CRN

  • If (CRN = FRN = RN) then the result is (value of expression from row at RN)

  • Otherwise the result is

  • (CRN - RN) * (value of expression for row at FRN) +

  • (RN - FRN) * (value of expression for row at CRN)

The MEDIAN function is a specific case of PERCENTILE_CONT, equivalent to PERCENTILE_CONT(0.5).

Examples

CREATE TABLE book_rating (name CHAR(30), star_rating TINYINT);

INSERT INTO book_rating VALUES ('Lord of the Ladybirds', 5);
INSERT INTO book_rating VALUES ('Lord of the Ladybirds', 3);
INSERT INTO book_rating VALUES ('Lady of the Flies', 1);
INSERT INTO book_rating VALUES ('Lady of the Flies', 2);
INSERT INTO book_rating VALUES ('Lady of the Flies', 5);

SELECT name, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY star_rating) 
  OVER (PARTITION BY name) AS pc 
  FROM book_rating;
+-----------------------+--------------+
| name                  | pc           |
+-----------------------+--------------+
| Lord of the Ladybirds | 4.0000000000 |
| Lord of the Ladybirds | 4.0000000000 |
| Lady of the Flies     | 2.0000000000 |
| Lady of the Flies     | 2.0000000000 |
| Lady of the Flies     | 2.0000000000 |
+-----------------------+--------------+

SELECT name, PERCENTILE_CONT(1) WITHIN GROUP (ORDER BY star_rating) 
  OVER (PARTITION BY name) AS pc 
  FROM book_rating;
+-----------------------+--------------+
| name                  | pc           |
+-----------------------+--------------+
| Lord of the Ladybirds | 5.0000000000 |
| Lord of the Ladybirds | 5.0000000000 |
| Lady of the Flies     | 5.0000000000 |
| Lady of the Flies     | 5.0000000000 |
| Lady of the Flies     | 5.0000000000 |
+-----------------------+--------------+

SELECT name, PERCENTILE_CONT(0) WITHIN GROUP (ORDER BY star_rating) 
  OVER (PARTITION BY name) AS pc 
  FROM book_rating;
+-----------------------+--------------+
| name                  | pc           |
+-----------------------+--------------+
| Lord of the Ladybirds | 3.0000000000 |
| Lord of the Ladybirds | 3.0000000000 |
| Lady of the Flies     | 1.0000000000 |
| Lady of the Flies     | 1.0000000000 |
| Lady of the Flies     | 1.0000000000 |
+-----------------------+--------------+

SELECT name, PERCENTILE_CONT(0.6) WITHIN GROUP (ORDER BY star_rating) 
  OVER (PARTITION BY name) AS pc 
  FROM book_rating;
+-----------------------+--------------+
| name                  | pc           |
+-----------------------+--------------+
| Lord of the Ladybirds | 4.2000000000 |
| Lord of the Ladybirds | 4.2000000000 |
| Lady of the Flies     | 2.6000000000 |
| Lady of the Flies     | 2.6000000000 |
| Lady of the Flies     | 2.6000000000 |
+-----------------------+--------------+

See Also

  • MEDIAN() - a special case of PERCENTILE_CONT equivalent to PERCENTILE_CONT(0.5)

This page is licensed: CC BY-SA / Gnu FDL

PERCENTILE_DISC

Syntax

Description

PERCENTILE_DISC() (standing for discrete percentile) is a window function which returns the first value in the set whose ordered position is the same or more than the specified fraction.

Essentially, the following process is followed to find the value to return:

  • Get the number of rows in the partition.

  • Walk through the partition, in order, until finding the first row with CUME_DIST() >= function_argument.

Examples

CREATE TABLE book_rating (name CHAR(30), star_rating TINYINT);

INSERT INTO book_rating VALUES ('Lord of the Ladybirds', 5);
INSERT INTO book_rating VALUES ('Lord of the Ladybirds', 3);
INSERT INTO book_rating VALUES ('Lady of the Flies', 1);
INSERT INTO book_rating VALUES ('Lady of the Flies', 2);
INSERT INTO book_rating VALUES ('Lady of the Flies', 5);

SELECT name, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY star_rating)
  OVER (PARTITION BY name) AS pc FROM book_rating;
+-----------------------+------+
| name                  | pc   |
+-----------------------+------+
| Lord of the Ladybirds |    3 |
| Lord of the Ladybirds |    3 |
| Lady of the Flies     |    2 |
| Lady of the Flies     |    2 |
| Lady of the Flies     |    2 |
+-----------------------+------+
5 rows in set (0.000 sec)

SELECT name, PERCENTILE_DISC(0) WITHIN GROUP (ORDER BY star_rating) 
 OVER (PARTITION BY name) AS pc FROM book_rating;
+-----------------------+------+
| name                  | pc   |
+-----------------------+------+
| Lord of the Ladybirds |    3 |
| Lord of the Ladybirds |    3 |
| Lady of the Flies     |    1 |
| Lady of the Flies     |    1 |
| Lady of the Flies     |    1 |
+-----------------------+------+
5 rows in set (0.000 sec)

SELECT name, PERCENTILE_DISC(1) WITHIN GROUP (ORDER BY star_rating) 
  OVER (PARTITION BY name) AS pc FROM book_rating;
+-----------------------+------+
| name                  | pc   |
+-----------------------+------+
| Lord of the Ladybirds |    5 |
| Lord of the Ladybirds |    5 |
| Lady of the Flies     |    5 |
| Lady of the Flies     |    5 |
| Lady of the Flies     |    5 |
+-----------------------+------+
5 rows in set (0.000 sec)

SELECT name, PERCENTILE_DISC(0.6) WITHIN GROUP (ORDER BY star_rating) 
  OVER (PARTITION BY name) AS pc FROM book_rating;
+-----------------------+------+
| name                  | pc   |
+-----------------------+------+
| Lord of the Ladybirds |    5 |
| Lord of the Ladybirds |    5 |
| Lady of the Flies     |    2 |
| Lady of the Flies     |    2 |
| Lady of the Flies     |    2 |
+-----------------------+------

See Also

  • CUME_DIST()

This page is licensed: CC BY-SA / Gnu FDL

RANK

Syntax

RANK() OVER (
  [ PARTITION BY partition_expression ]
  [ ORDER BY order_list ]
)

Description

RANK() is a window function that displays the number of a given row, starting at one and following the ORDER BY sequence of the window function, with identical values receiving the same result. It is similar to the ROW_NUMBER() function except that in that function, identical values will receive a different row number for each result.

Examples

The distinction between DENSE_RANK(), RANK() and ROW_NUMBER():

CREATE TABLE student(course VARCHAR(10), mark int, name varchar(10));

INSERT INTO student VALUES 
  ('Maths', 60, 'Thulile'),
  ('Maths', 60, 'Pritha'),
  ('Maths', 70, 'Voitto'),
  ('Maths', 55, 'Chun'),
  ('Biology', 60, 'Bilal'),
   ('Biology', 70, 'Roger');

SELECT 
  RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank, 
  DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS dense_rank, 
  ROW_NUMBER() OVER (PARTITION BY course ORDER BY mark DESC) AS row_num, 
  course, mark, name 
FROM student ORDER BY course, mark DESC;
+------+------------+---------+---------+------+---------+
| rank | dense_rank | row_num | course  | mark | name    |
+------+------------+---------+---------+------+---------+
|    1 |          1 |       1 | Biology |   70 | Roger   |
|    2 |          2 |       2 | Biology |   60 | Bilal   |
|    1 |          1 |       1 | Maths   |   70 | Voitto  |
|    2 |          2 |       2 | Maths   |   60 | Thulile |
|    2 |          2 |       3 | Maths   |   60 | Pritha  |
|    4 |          3 |       4 | Maths   |   55 | Chun    |
+------+------------+---------+---------+------+---------+

See Also

  • DENSE_RANK()

  • ROW_NUMBER()

  • ORDER BY

This page is licensed: CC BY-SA / Gnu FDL

ROW_NUMBER

Syntax

ROW_NUMBER() OVER (
  [ PARTITION BY partition_expression ]
  [ ORDER BY order_list ]
)

Description

ROW_NUMBER() is a window function that displays the number of a given row, starting at one and following the ORDER BY sequence of the window function, with identical values receiving different row numbers. It is similar to the RANK() and DENSE_RANK() functions except that in that function, identical values will receive the same rank for each result.

Examples

The distinction between DENSE_RANK(), RANK() and ROW_NUMBER():

CREATE TABLE student(course VARCHAR(10), mark int, name varchar(10));

INSERT INTO student VALUES 
  ('Maths', 60, 'Thulile'),
  ('Maths', 60, 'Pritha'),
  ('Maths', 70, 'Voitto'),
  ('Maths', 55, 'Chun'),
  ('Biology', 60, 'Bilal'),
   ('Biology', 70, 'Roger');

SELECT 
  RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank, 
  DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS dense_rank, 
  ROW_NUMBER() OVER (PARTITION BY course ORDER BY mark DESC) AS row_num, 
  course, mark, name 
FROM student ORDER BY course, mark DESC;
+------+------------+---------+---------+------+---------+
| rank | dense_rank | row_num | course  | mark | name    |
+------+------------+---------+---------+------+---------+
|    1 |          1 |       1 | Biology |   70 | Roger   |
|    2 |          2 |       2 | Biology |   60 | Bilal   |
|    1 |          1 |       1 | Maths   |   70 | Voitto  |
|    2 |          2 |       2 | Maths   |   60 | Thulile |
|    2 |          2 |       3 | Maths   |   60 | Pritha  |
|    4 |          3 |       4 | Maths   |   55 | Chun    |
+------+------------+---------+---------+------+---------+

See Also

  • RANK()

  • DENSE_RANK()

  • ORDER BY

This page is licensed: CC BY-SA / Gnu FDL