Summary: In this tutorial, you’ll learn how to use the PostgreSQL CUME_DIST()
window function to calculate the cumulative distribution of a value in a set of values.
Getting Started with PostgreSQL CUME_DIST Window Function #
The CUME_DIST
function returns the cumulative distribution of a value within a set of values. It provides the relative position of a value in a set, helping to identify the percentile rank of that value.
A percentile rank is a statistical measure that shows the relative position of a value within a dataset.
For instance, if a price is in the 80th percentile, it indicates that 80% of product prices are equal to or lower than that price.
Syntax #
Here’s the syntax of the CUME_DIST()
window function:
CUME_DIST() OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression]
)
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
PARTITION BY
: Divides the result set into partitions where the function applies. This clause is optional. If you omit it, the function will treat the whole result set as a single partition.ORDER BY
: Specifies the order of rows in each partition.
The CUME_DIST
function returns a value in the range of (0, 1).
PostgreSQL CUME_DIST Window Function Examples #
Let’s take some examples of using the CUME_DIST
function with the products
table:
Basic PostgreSQL CUME_DIST Window Function Example #
The following SELECT
statement uses the CUME_DIST()
function to calculate the price percentile for each product:
SELECT
product_name,
price,
CUME_DIST() OVER (
ORDER BY
price DESC
) AS price_distribution
FROM
products;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
product_name | price | price_distribution
----------------------------+---------+--------------------
Samsung QN900C Neo QLED | 2999.99 | 0.04
LG G3 OLED | 2499.99 | 0.12
Sony Bravia XR A95K | 2499.99 | 0.12
LG OLED TV C3 | 1999.99 | 0.16
Samsung Galaxy Z Fold 5 | 1799.99 | 0.2
Lenovo ThinkPad X1 Carbon | 1599.99 | 0.24
Dell XPS 15 | 1499.99 | 0.28
HP Spectre x360 | 1399.99 | 0.32
Microsoft Surface Laptop 5 | 1299.99 | 0.48
Apple iPhone 15 Pro Max | 1299.99 | 0.48
Sony HT-A7000 Soundbar | 1299.99 | 0.48
Apple iMac 24" | 1299.99 | 0.48
Apple iPhone 15 | 1099.99 | 0.56
Apple iPad Pro 12.9 | 1099.99 | 0.56
Samsung Galaxy S24 | 999.99 | 0.64
Dell Inspiron 27 | 999.99 | 0.64
Sony Xperia 1 VI | 949.99 | 0.68
Huawei Mate 60 | 899.99 | 0.72
Xiaomi Mi 14 | 799.99 | 0.76
Samsung Galaxy Tab S9 | 699.99 | 0.8
Apple Watch Series 9 | 399.99 | 0.88
Bose SoundLink Max | 399.99 | 0.88
Samsung Galaxy Watch 6 | 349.99 | 0.92
Apple AirPods Pro 3 | 249.99 | 0.96
Samsung Galaxy Buds Pro 2 | 199.99 | 1
Code language: plaintext (plaintext)
The output shows that over 80%
of products have prices greater than 699.99
, and only 20%
have prices at 1799.99
.
Calculating the Price Percentile for Each category #
The following statement uses the CUME_DIST()
function to calculate the price percentile for products in each category:
SELECT
category_id,
product_name,
price,
CUME_DIST() OVER (
PARTITION BY
category_id
ORDER BY
price DESC
)::DEC(3, 2) AS price_distribution
FROM
products;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
category_id | product_name | price | price_distribution
-------------+----------------------------+---------+--------------------
3 | Samsung Galaxy Z Fold 5 | 1799.99 | 0.14
3 | Apple iPhone 15 Pro Max | 1299.99 | 0.29
3 | Apple iPhone 15 | 1099.99 | 0.43
3 | Samsung Galaxy S24 | 999.99 | 0.57
3 | Sony Xperia 1 VI | 949.99 | 0.71
3 | Huawei Mate 60 | 899.99 | 0.86
3 | Xiaomi Mi 14 | 799.99 | 1.00
4 | Apple iPad Pro 12.9 | 1099.99 | 0.50
4 | Samsung Galaxy Tab S9 | 699.99 | 1.00
5 | Apple AirPods Pro 3 | 249.99 | 0.50
5 | Samsung Galaxy Buds Pro 2 | 199.99 | 1.00
6 | Apple Watch Series 9 | 399.99 | 0.50
6 | Samsung Galaxy Watch 6 | 349.99 | 1.00
8 | Samsung QN900C Neo QLED | 2999.99 | 0.25
8 | Sony Bravia XR A95K | 2499.99 | 0.75
8 | LG G3 OLED | 2499.99 | 0.75
8 | LG OLED TV C3 | 1999.99 | 1.00
9 | Sony HT-A7000 Soundbar | 1299.99 | 0.50
9 | Bose SoundLink Max | 399.99 | 1.00
11 | Lenovo ThinkPad X1 Carbon | 1599.99 | 0.25
11 | Dell XPS 15 | 1499.99 | 0.50
11 | HP Spectre x360 | 1399.99 | 0.75
11 | Microsoft Surface Laptop 5 | 1299.99 | 1.00
12 | Apple iMac 24" | 1299.99 | 0.50
12 | Dell Inspiron 27 | 999.99 | 1.00
Code language: plaintext (plaintext)
In this example:
- The
PARTITION BY category_id
divides the products bycategory_id
into partitions. - The
ORDER BY price DESC
sorts the products by prices in each partition.
To get the category name, you can join the products
table with the categories
table:
SELECT
category_name,
product_name,
price,
CUME_DIST() OVER (
PARTITION BY
category_name
ORDER BY
price DESC
)::DEC(3, 2) AS price_distribution
FROM
products
JOIN categories USING (category_id);
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
category_name | product_name | price | price_distribution
---------------+----------------------------+---------+--------------------
Accessories | Apple AirPods Pro 3 | 249.99 | 0.50
Accessories | Samsung Galaxy Buds Pro 2 | 199.99 | 1.00
Audio Systems | Sony HT-A7000 Soundbar | 1299.99 | 0.50
Audio Systems | Bose SoundLink Max | 399.99 | 1.00
Desktops | Apple iMac 24" | 1299.99 | 0.50
Desktops | Dell Inspiron 27 | 999.99 | 1.00
Laptops | Lenovo ThinkPad X1 Carbon | 1599.99 | 0.25
Laptops | Dell XPS 15 | 1499.99 | 0.50
Laptops | HP Spectre x360 | 1399.99 | 0.75
Laptops | Microsoft Surface Laptop 5 | 1299.99 | 1.00
Smartphones | Samsung Galaxy Z Fold 5 | 1799.99 | 0.14
Smartphones | Apple iPhone 15 Pro Max | 1299.99 | 0.29
Smartphones | Apple iPhone 15 | 1099.99 | 0.43
Smartphones | Samsung Galaxy S24 | 999.99 | 0.57
Smartphones | Sony Xperia 1 VI | 949.99 | 0.71
Smartphones | Huawei Mate 60 | 899.99 | 0.86
Smartphones | Xiaomi Mi 14 | 799.99 | 1.00
Tablets | Apple iPad Pro 12.9 | 1099.99 | 0.50
Tablets | Samsung Galaxy Tab S9 | 699.99 | 1.00
Televisions | Samsung QN900C Neo QLED | 2999.99 | 0.25
Televisions | Sony Bravia XR A95K | 2499.99 | 0.75
Televisions | LG G3 OLED | 2499.99 | 0.75
Televisions | LG OLED TV C3 | 1999.99 | 1.00
Wearables | Apple Watch Series 9 | 399.99 | 0.50
Wearables | Samsung Galaxy Watch 6 | 349.99 | 1.00
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Summary #
- Use the
CUME_DIST
window function to calculate the cumulative distribution of a value in a set of values.