In PostgreSQL, the GROUP BY statement is used to group rows that have the same values in specified columns into aggregated data. This is particularly useful when combined with SQL aggregate functions such as COUNT(), SUM(), AVG(), MAX(), and MIN().
When using a GROUP BY, we must make sure we select a categorical column to create the group (since we are grouping data on various categories or buckets).
To filter a GROUP BY, we must use the HAVING clause if we want to filter based on the result of an aggregate function.
Note
Consider the following query:
SELECT company, SUM(sales)FROM finance_tableWHERE company != 'Company'GROUP BY company
The function call to SUM will not happen until after the GROUP BY clause has executed. As a result, we cannot use WHERE to filter prior to the GROUP BY if we want to filter based on the aggregate function.
The HAVING clause is used very much like a WHERE clause:
SELECT company, SUM(sales) FROM finance_tableWHERE company != 'company'GROUP BY companyHAVING SUM(sales) > 1000
Key Points
Purpose: GROUP BY groups rows with the same values in specified columns.
Aggregate Functions: Used to perform calculations on grouped data.
Column Requirement: Every column in the SELECT statement must either be included in the GROUP BY clause or be used in an aggregate function.
Syntax
SELECT column1, column2, aggregate_function(column3)FROM table_nameGROUP BY column1, column2;
Note
The GROUP BY clause must appear right after a FROM or a WHERE statement.
Note
In the SELECT statement, columns must either have an aggregate function or be in the GROUP BY call.
Note
Using WHERE statements to filter data must not refer to the aggregation result. For example, if we have SUM(x) as part of the SELECT statement, we cannot use WHERE SUM(x) < y
Example
Suppose we have a table named sales with the following structure:
id
region
product
quantity
revenue
1
East
Widget
10
200
2
East
Gadget
5
100
3
West
Widget
8
160
4
East
Widget
15
300
5
West
Gadget
7
140
Query:
If we want to find the total quantity sold and total revenue for each region and product, we can use the GROUP BY statement:
SELECT region, product, SUM(quantity) AS total_quantity, SUM(revenue) AS total_revenueFROM salesGROUP BY region, product;
Result:
region
product
total_quantity
total_revenue
East
Widget
25
500
East
Gadget
5
100
West
Widget
8
160
West
Gadget
7
140
Notes
Columns not in the GROUP BY clause or an aggregate function cannot appear in the SELECT list.
Aggregate functions operate on each group of rows defined by the GROUP BY clause.
The order of columns in the GROUP BY clause can affect readability but not the result.
Use Cases
Summarizing sales or revenue by categories.
Counting occurrences of distinct values in a column.