Summary: in this tutorial, you’ll learn how to use a PostgreSQL correlated subquery to select data that depends on the values of the outer query.
Getting Started with PostgreSQL Correlated Subquery #
A correlated subquery is a subquery that uses values from an outer query.
Unlike a regular subquery that can execute independently, PostgreSQL may have to execute a correlated subquery for every row in the outer query.
For this reason, you should avoid using the correlated subquery as much as possible to improve the query performance.
PostgreSQL Correlated Subquery example #
Suppose we have the following products
table that include id, name, price, and brand:
Script for creating the products table and inserting data into it
CREATE TABLE products (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name VARCHAR(50) NOT NULL,
price DEC(11, 2) NOT NULL CHECK (price > 0),
brand VARCHAR(50) NOT NULL
);
INSERT INTO
products (name, price, brand)
VALUES
('Galaxy S24', 799.99, 'Samsung'),
('iPhone 16', 1099.99, 'Apple'),
('iPhone 16 Pro Max', 1399.99, 'Apple'),
('iPhone 16 Plus', 1199.99, 'Apple'),
('Galaxy S24 Ultra', 1299.99, 'Samsung'),
('Galaxy S24 Plus', 1119.99, 'Samsung');
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
id | name | price | brand |
---|---|---|---|
1 | Galaxy S24 | 799.99 | Samsung |
2 | iPhone 16 | 1099.99 | Apple |
3 | iPhone 16 Pro Max | 1399.99 | Apple |
4 | iPhone 16 Plus | 1199.99 | Apple |
5 | Galaxy S24 Ultra | 1299.99 | Samsung |
6 | Galaxy S24 Plus | 1119.99 | Samsung |
The following example shows how to use a correlated subquery to get the product name, brand, and price of the most expensive product in each brand:
SELECT
name,
brand,
price
FROM
products p1
WHERE
price = (
SELECT
MAX(price)
FROM
products p2
WHERE
p2.brand = p1.brand
);
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
name | brand | price
-------------------+---------+---------
iPhone 16 Pro Max | Apple | 1399.99
Galaxy S24 Ultra | Samsung | 1299.99
Code language: plaintext (plaintext)
How it works.
First, the outer query selects the name
, brand
, and price
from the products
table with the alias p1
:
SELECT
name,
brand,
price
FROM
products p1
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Second, PostgreSQL executes the correlated subquery for each row in the outer query:
SELECT
MAX(price)
FROM
products p2
WHERE
p2.brand = p1.brand
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
The correlated subquery selects the maximum price from the products
table with the alias p2
, where the brand of p2
matches the brand of the current row in the outer query (p1.brand
).
Third, the WHERE
clause in the outer query checks the price of the current row (p1.price
) equals the maximum price of the brand of the current row. If yes, PostgreSQL includes the row in the result set.
Here’s the detail:
Row id 1
id | name | price | brand |
---|---|---|---|
1 | Galaxy S24 | 799.99 | Samsung |
The subquery finds the maximum price of the Samsung brand (1299.99
).
The WHERE
clause compares the price (799.99
) with the maximum price (1299.99
). It returns false. PostgreSQL does not include row 1 in the result set.
Row id 2:
id | name | price | brand |
---|---|---|---|
2 | iPhone 16 | 1099.99 | Apple |
The subquery finds the maximum price of the Apple brand (1399.99
).
The WHERE
clause compares the price (1099.99
) with the maximum price (1399.99
), which returns false
. Hence, PostgreSQL does not include row id 2 in the result set.
Row id 3
id | name | price | brand |
---|---|---|---|
3 | iPhone 16 Pro Max | 1399.99 | Apple |
The subquery finds the maximum price of the Apple brand (1399.99
).
The WHERE
clause compares the price (1399.99
) with the maximum price (1399.99
),which returns true
. PostgreSQL includes row id 3 in the result set.
Row id 4
id | name | price | brand |
---|---|---|---|
4 | iPhone 16 Plus | 1199.99 | Apple |
The subquery finds the maximum price of the Apple brand (1399.99
).
The WHERE
clause compares the price (1199.99
) with the maximum price (1399.99
). It returns false
. PostgreSQL does not include row id 4 in the result set.
Row id 5
id | name | price | brand |
---|---|---|---|
5 | Galaxy S24 Ultra | 1299.99 | Samsung |
The subquery finds the maximum price of the Samsung brand (1299.99
).
The WHERE
clause compares the price (1299.99
) with the maximum price (1299.99
), which returns true. Therefore, PostgreSQL does not include row id 5 in the result set.
Row id 6
id | name | price | brand |
---|---|---|---|
6 | Galaxy S24 Plus | 1119.99 | Samsung |
The subquery finds the maximum price of the Samsung brand (1299.99
).
The WHERE
clause compares the price (1119.99
) with the maximum price (1299.99
), which returns false. PostgreSQL does not include row id 6 in the result set.
The final result set will be as follows:
name | price | brand |
---|---|---|
iPhone 16 Pro Max | 1399.99 | Apple |
Galaxy S24 Ultra | 1299.99 | Samsung |
This example shows that PostgreSQL has to execute the correlated subquery to find the maximum price of Apple brand for every row in the outer query. It is redundant and inefficient.
Rewriting a correlated subquery using join #
The following example uses a
with a subquery to make the query faster:JOIN
SELECT
p1.name,
p1.brand,
p1.price
FROM
products p1
JOIN (
SELECT
brand,
MAX(price) AS max_price
FROM
products
GROUP BY
brand
) p2 ON p1.brand = p2.brand
AND p1.price = p2.max_price;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
How it works.
First, select the brand and maximum product price by brand using GROUP BY
with the MAX
aggregate function:
SELECT
brand,
MAX(price) AS max_price
FROM
products
GROUP BY
brand;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Second, join the products
table with the result set returned by the subquery on the brand
and price
columns. This will retrieve the products with the highest price for each selected brand.
This approach reduces the number of times PostgreSQL executes the subquery, improving the query performance, especially when the products
table is large.
Summary #
- A correlated subquery is a subquery that depends on the outer query.