PostgreSQL REGEXP_MATCHES Function

Summary: in this tutorial, you’ll learn how to use the PostgreSQL REGEXP_MATCHES function to extract all matches of a POSIX regular expression pattern from a string.

Introduction to the PostgreSQL REGEXP_MATCHES function #

In PostgreSQL, the REGEXP_MATCHES function allows you to find matches based on a POSIX regular expression.

Here’s the syntax of the REGEXP_MATCHES function:

REGEXP_MATCHES(string, pattern [, flags])Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

The function accepts three arguments:

  • string: The input string to search.
  • pattern: The POSIX regular expression pattern to match.
  • flags (optional): A text string that contains zero or more single-letter flags that control the behavior of the regular expression. For example, the 'i' flag is for case-insensitive matching.

The REGEXP_MATCHES function returns a set of text arrays (SET OF TEXT[]) that contains matching substrings.

If the pattern does not match the string, the function returns an empty set (or no rows).

If the pattern matches the string, the function may return one or more rows based on the arguments:

  • If the pattern has no capturing groups, the function returns one row (TEXT[]) that has one element containing the matching substring.
  • If the pattern has capturing groups, the function returns one row (TEXT[]) that contains multiple elements, each corresponding to a capturing group.
  • If the flags does not include the g flag, the function returns only the first match of each capturing group (TEXT[]).
  • If the flags includes the g flag, the function returns the set of text arrays (SET OF TEXT[]).

PostgreSQL REGEXP_MATCHES function examples #

Let’s take some examples of using the REGEXP_MATCHES function.

Returning no rows #

The following query uses the REGEXP_MATCHES function to find a substring with one or more digits:

SELECT
  REGEXP_MATCHES('PostgreSQL', '\d+') result;Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Try it

In this example, the input string has no digits. Therefore, the \d+ regular expression has no match. Hence, the REGEXP_MATCHES function returns an empty set:

 result
--------
(0 rows)

Returning first match #

The following statement uses the REGEXP_MATCHES function to find the first substring that contains one or more digits:

SELECT
  REGEXP_MATCHES('PostgreSQL 17, 2024', '\d+') result;Code language: JavaScript (javascript)

Try it

Output:

 result
--------
 {17}

In this example, the REGEXP_MATCHES function returns one row, which is an array of text (TEXT[]) that has one element ({17}).

Notice that the Playground does not include the {} which indicates that the result is an array.

Returning all matches with the g flag #

The following statement uses the REGEXP_MATCHES function to find all substrings that consist of one or more digits:

SELECT
  REGEXP_MATCHES('PostgreSQL 17, 2024', '\d+', 'g') result;Code language: JavaScript (javascript)

Try it

Output:

 result
--------
 {17}
 {2024}

In this example, we use the g flag to instruct the REGEXP_MATCHES function to return a SET OF TEXT[] that contains all matches. Each row is an array (TEXT[]) with one element that contains the match.

Returning all matches with capturing groups #

The following statement uses the REGEXP_MATCHES function to return all matches as a set of array of texts (SET OF TEXT[]):

SELECT
  REGEXP_MATCHES('John Doe, Jane Smith', '(\w+) (\w+)', 'g') result;Code language: JavaScript (javascript)

Try it

Output:

    result
--------------
 {John,Doe}
 {Jane,Smith}

In this example:

  • The regular expression (\w+) (\w+) has two capturing groups that match the first and second words.
  • Each row contains an array of text with two elements: the first and last names.

Using REGEXP_MATCHES function with Table Data #

We’ll use the products table from the inventory database:

PostgreSQL REGEXP_MATCHES function - sample table

The following query uses the REGEXP_MATCH function to find products with names that end with a number:

SELECT
  product_name,
  REGEXP_MATCHES(product_name, '\d+$', 'g') number
FROM
  products;Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Try it

Output:

        product_name        | number
----------------------------+--------
 Samsung Galaxy S24         | {24}
 Apple iPhone 15            | {15}
 Huawei Mate 60             | {60}
 Xiaomi Mi 14               | {14}
 Samsung Galaxy Z Fold 5    | {5}
...Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)

Common Use Cases #

The REGEXP_MATCHES function can be helpful in the following cases:

  • Data Extraction: Extracting data based on patterns from text such as phone numbers, emails, and error codes.
  • Data Validation: Validate data based on a specific pattern.

Performance Considerations #

When using the  REGEXP_MATCHES function, you should consider the following:

  • Regular Expression Complexity: If you use complex regular expressions, the function will be slow, especially with a large data set. Ensure to optimize your regular expressions for performance.
  • Index Utilization: The queries with the REGEXP_MATCHES function generally cannot utilize indexes, which results in full table scans. If the performance is critical, you can consider using functional indexes.

Summary #

  • The REGEXP_MATCHES function returns a SET OF TEXT[],  where each array corresponds to a match.
  • The capturing groups () will determine the number of elements in each array.
  • The g flag controls whether the function returns all matches or only the first match.
  • The REGEXP_MATCHES function returns no rows if there are no matches.
Was this tutorial helpful ?