PostgreSQL Index-only Scans

Summary: in this tutorial, you’ll learn how to utilize PostgreSQL index-only scans to boost query performance.

PostgreSQL index-only scan overview #

In PostgreSQL, indexes are essential tools for optimizing query performance.

When a query filters data using indexed columns, PostgreSQL finds matching rows in the index and accesses the table to retrieve the actual row data.

If the required data is included in the index, PostgreSQL can retrieve data entirely from the index without having to access the table. This mechanism is called an index-only scan.

The index-only scan is possible if:

  • The index contains all columns required by the query.
  • The visibility map marks all the rows as all-visible i.e., the rows haven’t been recently updated or deleted.

PostgreSQL uses a visibility map, a bitmap structure, to track whether all rows on a particular page in a table are visible to all transactions.

PostgreSQL index-only scan example #

First, create a products table:

CREATE TABLE products (
  id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  price DEC(10, 2) NOT NULL
);Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Second, create a function to insert n rows into the products table:

CREATE OR REPLACE FUNCTION insert_products (n INT) 
RETURNS VOID 
AS 
$$
DECLARE
   i INT;
BEGIN
   FOR i IN 1..n LOOP
      INSERT INTO products (name, price)
      VALUES (
       'Product_' || i, 
       ROUND((RANDOM() * 100 + 1)::numeric, 2)
      );
   END LOOP;
END;
$$ 
LANGUAGE plpgsql;Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Third, insert 1000 rows into the products table by calling the insert_products funtion:

SELECT insert_products (1000);Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Fourth, create an index for the name column of the products table:

CREATE INDEX ON products(name);Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Fifth, find the products with the name 'Product_100':

EXPLAIN ANALYZE 
SELECT name
FROM products 
WHERE name = 'Product_100';Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Output:

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using products_name_idx on products  (cost=0.28..8.29 rows=1 width=11) (actual time=0.983..0.984 rows=1 loops=1)
   Index Cond: (name = 'Product_100'::text)
   Heap Fetches: 1
 Planning Time: 0.078 ms
 Execution Time: 0.999 msCode language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

The output indicates that PostgreSQL uses an index-only scan because the query retrieves only the name included in the index. However, it still accesses the heap (or table) to fetch one row:

Heap Fetches: 1Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

This occurs because the visibility map did not mark the page as all-visible, so PostgreSQL need to verify rows’ visibility.

Sixth, to verify if a table’s pages are marked as all-visible, you can run this query:

SELECT
  relname,
  relpages,
  relallvisible
FROM
  pg_class
WHERE
  relname = 'products';Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Output:

 relname  | relpages | relallvisible
----------+----------+---------------
 products |        7 |             0Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

If the relallvisible is low, PostgreSQL must check the heap (or table). In this case, it is zero.

Seventh, to improve the visibility of rows, you can run the VACUUM ANALYZE statement:

VACUUM ANALYZE products;Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

This statement marks older rows as all-visible, reducing heap fetches and improving index-only scan performance.

Eighth, run the query to check the new value of the relallvisible:

SELECT
  relname,
  relpages,
  relallvisible
FROM
  pg_class
WHERE
  relname = 'products';Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Output:

 relname  | relpages | relallvisible
----------+----------+---------------
 products |        7 |             7Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Finally, run the query that finds the product with the name Product_100 again, you’ll see that PostgreSQL uses the index-only scan but does not have to fetch data from the heap.

EXPLAIN
ANALYZE
SELECT
  name
FROM
  products
WHERE
  name = 'Product_100';Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Output:

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using products_name_idx on products  (cost=0.28..4.29 rows=1 width=11) (actual time=0.153..0.154 rows=1 loops=1)
   Index Cond: (name = 'Product_100'::text)
   Heap Fetches: 0
 Planning Time: 0.293 ms
 Execution Time: 0.214 msCode language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Summary #

  • Utilize index-only scans to improve performance by reading all required data from the index.
  • Use the VACUUM ANALYZE statement to improve index-only scan’s efficiency.

Quiz #

Was this tutorial helpful ?