PostgreSQL Correlated Subquery

Summary: in this tutorial, you’ll learn how to use a PostgreSQL correlated subquery to select data that depends on the values of the outer query.

Getting Started with PostgreSQL Correlated Subquery #

A correlated subquery is a subquery that uses values from an outer query.

Unlike a regular subquery that can execute independently, PostgreSQL may have to execute a correlated subquery for every row in the outer query.

For this reason, you should avoid using the correlated subquery as much as possible to improve the query performance.

PostgreSQL Correlated Subquery example #

Suppose we have the following products table that include id, name, price, and brand:

Script for creating the products table and inserting data into it

CREATE TABLE products (
  id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  price DEC(11, 2) NOT NULL CHECK (price > 0),
  brand VARCHAR(50) NOT NULL
);
INSERT INTO
  products (name, price, brand)
VALUES
  ('Galaxy S24', 799.99, 'Samsung'),
  ('iPhone 16', 1099.99, 'Apple'),
  ('iPhone 16 Pro Max', 1399.99, 'Apple'),
  ('iPhone 16 Plus', 1199.99, 'Apple'),
  ('Galaxy S24 Ultra', 1299.99, 'Samsung'),
  ('Galaxy S24 Plus', 1119.99, 'Samsung');Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

id	name	price	brand
1	Galaxy S24	799.99	Samsung
2	iPhone 16	1099.99	Apple
3	iPhone 16 Pro Max	1399.99	Apple
4	iPhone 16 Plus	1199.99	Apple
5	Galaxy S24 Ultra	1299.99	Samsung
6	Galaxy S24 Plus	1119.99	Samsung

The following example shows how to use a correlated subquery to get the product name, brand, and price of the most expensive product in each brand:

SELECT
  name,
  brand,
  price
FROM
  products p1
WHERE
  price = (
    SELECT
      MAX(price)
    FROM
      products p2
    WHERE
      p2.brand = p1.brand
  );Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Try it

Output:

       name        |  brand  |  price
-------------------+---------+---------
 iPhone 16 Pro Max | Apple   | 1399.99
 Galaxy S24 Ultra  | Samsung | 1299.99Code language: plaintext (plaintext)

How it works.

First, the outer query selects the name, brand, and price from the products table with the alias p1:

SELECT
  name,
  brand,
  price
FROM
  products p1Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Second, PostgreSQL executes the correlated subquery for each row in the outer query:

SELECT
  MAX(price)
FROM
  products p2
WHERE
  p2.brand = p1.brandCode language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

The correlated subquery selects the maximum price from the products table with the alias p2, where the brand of p2 matches the brand of the current row in the outer query (p1.brand).

Third, the WHERE clause in the outer query checks the price of the current row (p1.price) equals the maximum price of the brand of the current row. If yes, PostgreSQL includes the row in the result set.

Here’s the detail:

Row id 1

id	name	price	brand
1	Galaxy S24	799.99	Samsung

The subquery finds the maximum price of the Samsung brand (1299.99).

The WHERE clause compares the price (799.99) with the maximum price (1299.99). It returns false. PostgreSQL does not include row 1 in the result set.

Row id 2:

id	name	price	brand
2	iPhone 16	1099.99	Apple

The subquery finds the maximum price of the Apple brand (1399.99).

The WHERE clause compares the price (1099.99) with the maximum price (1399.99), which returns false. Hence, PostgreSQL does not include row id 2 in the result set.

Row id 3

id	name	price	brand
3	iPhone 16 Pro Max	1399.99	Apple

The subquery finds the maximum price of the Apple brand (1399.99).

The WHERE clause compares the price (1399.99) with the maximum price (1399.99),which returns true. PostgreSQL includes row id 3 in the result set.

Row id 4

id	name	price	brand
4	iPhone 16 Plus	1199.99	Apple

The subquery finds the maximum price of the Apple brand (1399.99).

The WHERE clause compares the price (1199.99) with the maximum price (1399.99). It returns false. PostgreSQL does not include row id 4 in the result set.

Row id 5

id	name	price	brand
5	Galaxy S24 Ultra	1299.99	Samsung

The subquery finds the maximum price of the Samsung brand (1299.99).

The WHERE clause compares the price (1299.99) with the maximum price (1299.99), which returns true. Therefore, PostgreSQL does not include row id 5 in the result set.

Row id 6

id	name	price	brand
6	Galaxy S24 Plus	1119.99	Samsung

The subquery finds the maximum price of the Samsung brand (1299.99).

The WHERE clause compares the price (1119.99) with the maximum price (1299.99), which returns false. PostgreSQL does not include row id 6 in the result set.

The final result set will be as follows:

name	price	brand
iPhone 16 Pro Max	1399.99	Apple
Galaxy S24 Ultra	1299.99	Samsung

This example shows that PostgreSQL has to execute the correlated subquery to find the maximum price of Apple brand for every row in the outer query. It is redundant and inefficient.

Rewriting a correlated subquery using join #

The following example uses a JOIN with a subquery to make the query faster:

SELECT
  p1.name,
  p1.brand,
  p1.price
FROM
  products p1
  JOIN (
    SELECT
      brand,
      MAX(price) AS max_price
    FROM
      products
    GROUP BY
      brand
  ) p2 ON p1.brand = p2.brand
  AND p1.price = p2.max_price;Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Try it

How it works.

First, select the brand and maximum product price by brand using GROUP BY with the MAX aggregate function:

SELECT
  brand,
  MAX(price) AS max_price
FROM
  products
GROUP BY
  brand;Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Try it

Second, join the products table with the result set returned by the subquery on the brand and price columns. This will retrieve the products with the highest price for each selected brand.

This approach reduces the number of times PostgreSQL executes the subquery, improving the query performance, especially when the products table is large.

Summary #

A correlated subquery is a subquery that depends on the outer query.

Quiz #

Was this tutorial helpful ?