How to Concatenate Strings in a PostgreSQL Group By Query

When working with databases, particularly with PostgreSQL, you might encounter a common scenario where you need to concatenate strings within a group of records. This typically comes into play when you have a dataset that includes multiple entries for a single category (such as employees in the same company) and you want to collect those entries into a single string for presentation or analysis.

This blog post will explain how to achieve string concatenation with a GROUP BY query in PostgreSQL. We’ll cover modern solutions as well as approaches for older versions of PostgreSQL for comprehensive understanding.

The Problem

For instance, consider the following employees’ table:

ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave

You want to group these entries by COMPANY_ID such that the output presents a list of employees associated with each company. The desired output would look like this:

COMPANY_ID EMPLOYEE
1 Anna, Bill
2 Carol, Dave

The Solution

PostgreSQL 9.0 or Later

If you are using PostgreSQL version 9.0 or later, you can use the powerful built-in function string_agg() to concatenate strings within your GROUP BY clause effectively.

Here’s how you can write the query:

SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;

With Ordering

Additionally, since PostgreSQL version 9.0 supports ORDER BY within aggregate functions, you can specify the order of employees:

SELECT company_id, string_agg(employee, ', ' ORDER BY employee)
FROM mytable
GROUP BY company_id;

This ensures that your employee names are concatenated in a specific sequence.

PostgreSQL 8.4.x

If you’re still working with PostgreSQL 8.4, which is a version that no longer receives support, you can use the array_agg() function in combination with array_to_string() to achieve similar results. Here’s the SQL query you would use:

SELECT company_id, array_to_string(array_agg(employee), ', ')
FROM mytable
GROUP BY company_id;

PostgreSQL 8.3.x and Older

For users of PostgreSQL 8.3 and earlier versions, there is no built-in function to perform string concatenation directly. The following custom implementation can work around this limitation:

  1. Create a new aggregate function using the textcat function:
CREATE AGGREGATE textcat_all(
  basetype    = text,
  sfunc       = textcat,
  stype       = text,
  initcond    = ''
);
  1. To include separators like , between concatenated strings, create a custom function:
CREATE FUNCTION commacat(acc text, instr text) RETURNS text AS $$
BEGIN
  IF acc IS NULL OR acc = '' THEN
    RETURN instr;
  ELSE
    RETURN acc || ', ' || instr;
  END IF;
END;
$$ LANGUAGE plpgsql;

This will appropriately concatenate the strings.

If you want to remove extra commas for NULL or empty values, here’s a more refined version:

CREATE FUNCTION commacat_ignore_nulls(acc text, instr text) RETURNS text AS $$
BEGIN
  IF acc IS NULL OR acc = '' THEN
    RETURN instr;
  ELSIF instr IS NULL OR instr = '' THEN
    RETURN acc;
  ELSE
    RETURN acc || ', ' || instr;
  END IF;
END;
$$ LANGUAGE plpgsql;

Conclusion

Concatenating strings in PostgreSQL, especially during GROUP BY operations, is crucial for generating meaningful reports and datasets. With the introduction of functions like string_agg() in recent versions, this task has become straightforward.

If you are on an older version, using the custom aggregation methods detailed here can achieve similar outcomes. Always consider upgrading your PostgreSQL version to take advantage of newer features and security improvements.

Happy querying!