Overview

In PostgreSQL, the GROUP BY statement is used to group rows that have the same values in specified columns into aggregated data. This is particularly useful when combined with SQL aggregate functions such as COUNT(), SUM(), AVG(), MAX(), and MIN().

When using a GROUP BY, we must make sure we select a categorical column to create the group (since we are grouping data on various categories or buckets).

To filter a GROUP BY, we must use the HAVING clause if we want to filter based on the result of an aggregate function.

Note

Consider the following query:

SELECT company, SUM(sales)
FROM finance_table
WHERE company != 'Company'
GROUP BY company 

The function call to SUM will not happen until after the GROUP BY clause has executed. As a result, we cannot use WHERE to filter prior to the GROUP BY if we want to filter based on the aggregate function.

The HAVING clause is used very much like a WHERE clause:

SELECT company, SUM(sales) FROM finance_table
WHERE company != 'company'
GROUP BY company
HAVING SUM(sales) > 1000

Key Points

  • Purpose: GROUP BY groups rows with the same values in specified columns.
  • Aggregate Functions: Used to perform calculations on grouped data.
  • Column Requirement: Every column in the SELECT statement must either be included in the GROUP BY clause or be used in an aggregate function.

Syntax

SELECT column1, column2, aggregate_function(column3)
FROM table_name
GROUP BY column1, column2;

Note

The GROUP BY clause must appear right after a FROM or a WHERE statement.

Note

In the SELECT statement, columns must either have an aggregate function or be in the GROUP BY call.

Note

Using WHERE statements to filter data must not refer to the aggregation result. For example, if we have SUM(x) as part of the SELECT statement, we cannot use WHERE SUM(x) < y

Example

Suppose we have a table named sales with the following structure:

idregionproductquantityrevenue
1EastWidget10200
2EastGadget5100
3WestWidget8160
4EastWidget15300
5WestGadget7140

Query:

If we want to find the total quantity sold and total revenue for each region and product, we can use the GROUP BY statement:

SELECT region, product, SUM(quantity) AS total_quantity, SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, product;

Result:

regionproducttotal_quantitytotal_revenue
EastWidget25500
EastGadget5100
WestWidget8160
WestGadget7140

Notes

  1. Columns not in the GROUP BY clause or an aggregate function cannot appear in the SELECT list.
  2. Aggregate functions operate on each group of rows defined by the GROUP BY clause.
  3. The order of columns in the GROUP BY clause can affect readability but not the result.

Use Cases

  • Summarizing sales or revenue by categories.
  • Counting occurrences of distinct values in a column.
  • Calculating averages or totals over grouped data.