Summary: in this tutorial, you’ll learn how to utilize PostgreSQL index-only scans to boost query performance.
PostgreSQL index-only scan overview #
In PostgreSQL, indexes are essential tools for optimizing query performance.
When a query filters data using indexed columns, PostgreSQL finds matching rows in the index and accesses the table to retrieve the actual row data.
If the required data is included in the index, PostgreSQL can retrieve data entirely from the index without having to access the table. This mechanism is called an index-only scan.
The index-only scan is possible if:
- The index contains all columns required by the query.
- The visibility map marks all the rows as all-visible i.e., the rows haven’t been recently updated or deleted.
PostgreSQL uses a visibility map, a bitmap structure, to track whether all rows on a particular page in a table are visible to all transactions.
PostgreSQL index-only scan example #
First, create a products
table:
CREATE TABLE products (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DEC(10, 2) NOT NULL
);
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Second, create a function to insert n
rows into the products
table:
CREATE OR REPLACE FUNCTION insert_products (n INT)
RETURNS VOID
AS
$$
DECLARE
i INT;
BEGIN
FOR i IN 1..n LOOP
INSERT INTO products (name, price)
VALUES (
'Product_' || i,
ROUND((RANDOM() * 100 + 1)::numeric, 2)
);
END LOOP;
END;
$$
LANGUAGE plpgsql;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Third, insert 1000
rows into the products
table by calling the insert_products
funtion:
SELECT insert_products (1000);
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Fourth, create an index for the name
column of the products
table:
CREATE INDEX ON products(name);
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Fifth, find the products with the name 'Product_100'
:
EXPLAIN ANALYZE
SELECT name
FROM products
WHERE name = 'Product_100';
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using products_name_idx on products (cost=0.28..8.29 rows=1 width=11) (actual time=0.983..0.984 rows=1 loops=1)
Index Cond: (name = 'Product_100'::text)
Heap Fetches: 1
Planning Time: 0.078 ms
Execution Time: 0.999 ms
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
The output indicates that PostgreSQL uses an index-only scan because the query retrieves only the name
included in the index. However, it still accesses the heap (or table) to fetch one row:
Heap Fetches: 1
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
This occurs because the visibility map did not mark the page as all-visible, so PostgreSQL need to verify rows’ visibility.
Sixth, to verify if a table’s pages are marked as all-visible, you can run this query:
SELECT
relname,
relpages,
relallvisible
FROM
pg_class
WHERE
relname = 'products';
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
relname | relpages | relallvisible
----------+----------+---------------
products | 7 | 0
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
If the relallvisible
is low, PostgreSQL must check the heap (or table). In this case, it is zero.
Seventh, to improve the visibility of rows, you can run the VACUUM ANALYZE
statement:
VACUUM ANALYZE products;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
This statement marks older rows as all-visible, reducing heap fetches and improving index-only scan performance.
Eighth, run the query to check the new value of the relallvisible:
SELECT
relname,
relpages,
relallvisible
FROM
pg_class
WHERE
relname = 'products';
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
relname | relpages | relallvisible
----------+----------+---------------
products | 7 | 7
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Finally, run the query that finds the product with the name Product_100
again, you’ll see that PostgreSQL uses the index-only scan but does not have to fetch data from the heap.
EXPLAIN
ANALYZE
SELECT
name
FROM
products
WHERE
name = 'Product_100';
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Output:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using products_name_idx on products (cost=0.28..4.29 rows=1 width=11) (actual time=0.153..0.154 rows=1 loops=1)
Index Cond: (name = 'Product_100'::text)
Heap Fetches: 0
Planning Time: 0.293 ms
Execution Time: 0.214 ms
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Summary #
- Utilize index-only scans to improve performance by reading all required data from the index.
- Use the
VACUUM ANALYZE
statement to improve index-only scan’s efficiency.