MS SQL Server Group By

Hello friends, welcome to shrash studio learning, in this article we will cover three of the most important SQL Server topics together — Aggregate Functions, the GROUP BY clause, and the difference between WHERE and HAVING. These are asked in almost every SQL interview, and they are the foundation of every report, dashboard, and analytics query. We will use simple English, one continuous example, and finish with the interview answer that always works.

Our Sample Table

We'll use a small Sales table throughout this article. Every query below runs against this data.


CREATE TABLE Sales (
    SaleId    INT PRIMARY KEY,
    Region    VARCHAR(20),
    Category  VARCHAR(20),
    Amount    DECIMAL(10, 2)
);

INSERT INTO Sales VALUES
(1,  'North', 'Electronics', 55000),
(2,  'North', 'Clothing',     8000),
(3,  'South', 'Electronics', 42000),
(4,  'South', 'Grocery',     15000),
(5,  'East',  'Electronics', 60000),
(6,  'East',  'Clothing',    12000),
(7,  'North', 'Electronics', 48000),
(8,  'South', 'Clothing',    18000),
(9,  'West',  'Grocery',     22000),
(10, 'West',  'Electronics', 35000);

Aggregate Functions: Turning Many Rows into One Number

An aggregate function takes a whole column of values and reduces them to a single answer — a sum, an average, a count, a highest, or a lowest. These are the most-used functions in reporting.

THE FIVE MAIN AGGREGATE FUNCTIONS

SUM()

Adds up all values
Total revenue, total hours worked

AVG()

Calculates the mean
Average sale, average rating

COUNT()

Counts how many rows
Number of orders, number of users

MIN() / MAX()

Smallest / largest value
Lowest price, highest salary

SELECT SUM(Amount)   AS TotalSales       FROM Sales;   -- 315000
SELECT AVG(Amount)   AS AverageSale      FROM Sales;   -- 31500
SELECT COUNT(*)      AS TotalTransactions FROM Sales;  -- 10
SELECT MIN(Amount)   AS SmallestSale     FROM Sales;   --  8000
SELECT MAX(Amount)   AS LargestSale      FROM Sales;   -- 60000

GROUP BY: Aggregating Per Category

A single total is useful, but real reports usually ask "how much per region?" or "how many per category?". That is exactly what GROUP BY does — it splits the data into buckets and runs the aggregate inside each bucket.


SELECT Region, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Region;

Result:

RegionTotalSales
North111000
South75000
East72000
West57000

SQL Server picks out the distinct regions, then adds up Amount for each one. One total per group.

The Golden Rule of GROUP BY

Every column in your SELECT must either (a) appear in the GROUP BY clause, or (b) be wrapped in an aggregate function. No exceptions.

-- Wrong: Category is not grouped and not aggregated
SELECT Region, Category, SUM(Amount)
FROM Sales
GROUP BY Region;
-- Error: Column 'Sales.Category' is invalid in the select list
-- because it is not contained in either an aggregate function
-- or the GROUP BY clause.

-- Right: both Region and Category are in GROUP BY
SELECT Region, Category, SUM(Amount)
FROM Sales
GROUP BY Region, Category;

Think of it this way — if there are 100 rows per region, SQL Server has no idea which single Category to display for that region. So it refuses to guess. You either include the column in the grouping, or tell it how to aggregate.

Grouping by Multiple Columns

When you group by more than one column, SQL Server creates one bucket per combination of values.


SELECT Region, Category,
       SUM(Amount)  AS TotalSales,
       COUNT(*)     AS Transactions
FROM Sales
GROUP BY Region, Category
ORDER BY Region, Category;

Partial result:

RegionCategoryTotalSalesTransactions
EastClothing120001
EastElectronics600001
NorthClothing80001
NorthElectronics1030002
SouthClothing180001
SouthElectronics420001

Each (Region, Category) pair becomes its own row with its own totals. You can pile on as many aggregates as you need — here we used both SUM and COUNT together.

Tip on aliases: Use AS to give your aggregated columns meaningful names. If the alias has spaces, wrap it in square brackets — SUM(Amount) AS [Total Sales].

COUNT(*) vs COUNT(column)

A subtle but important difference:

COUNT(*) counts every row, including rows where some columns are NULL.
COUNT(columnName) counts only rows where that column is not NULL.
COUNT(DISTINCT column) counts unique non-null values.


SELECT COUNT(*)                AS AllRows,
       COUNT(Amount)            AS NonNullAmounts,
       COUNT(DISTINCT Category) AS UniqueCategories
FROM Sales;

WHERE vs HAVING: The Classic Interview Question

Both clauses filter data, but they run at different stages of a query. Understanding this is the single most common SQL Server interview question on this topic.

WHERE vs HAVING — WHEN EACH RUNS

WHERE

Filters rows BEFORE grouping
Acts on individual rows
Cannot use aggregate functions
Works with SELECT, UPDATE, DELETE

HAVING

Filters groups AFTER grouping
Acts on aggregated results
Can use aggregate functions
Only used with SELECT

WHERE Example — Filter Rows First


-- Only Electronics sales, grouped by region
SELECT Region, SUM(Amount) AS TotalElectronics
FROM Sales
WHERE Category = 'Electronics'
GROUP BY Region;

Here, every non-Electronics row is removed before grouping. The totals only include Electronics.

HAVING Example — Filter After Grouping


-- Only show regions whose total sales exceed 70000
SELECT Region, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Region
HAVING SUM(Amount) > 70000;

Result:

RegionTotalSales
North111000
South75000
East72000

West is excluded because its total is only 57000. The condition SUM(Amount) > 70000 could not go in WHERE — aggregate functions are not allowed there.

Why Aggregates Don't Work in WHERE

Remember the order SQL Server executes a query:

SQL Query Execution Order
1. FROM — pick the table
2. WHERE — filter individual rows
3. GROUP BY — bucket the remaining rows
4. HAVING — filter the buckets
5. SELECT — project columns & aggregates
6. ORDER BY — sort the final result

At the time WHERE runs, the aggregates haven't been calculated yet. There is no SUM(Amount) to compare against. That's why aggregate conditions must wait for HAVING — by then the groups and their totals actually exist.

Combining WHERE and HAVING

It is very common to use both in the same query. WHERE throws away irrelevant rows first (cheap and fast), and HAVING filters the groups afterwards.


-- Among Electronics sales only, show regions with total > 50000
SELECT Region,
       SUM(Amount)  AS TotalElectronics,
       COUNT(*)     AS Transactions
FROM Sales
WHERE  Category = 'Electronics'
GROUP BY Region
HAVING SUM(Amount) > 50000
ORDER BY TotalElectronics DESC;

Read it like English: "From sales, keep only Electronics rows, group by region, show only regions whose Electronics total is above 50k, and sort highest first."

Performance tip: Filter as early as possible with WHERE. The more rows you eliminate before grouping, the less work SQL Server has to do in the aggregation step.

Side-by-Side Comparison

Feature WHERE HAVING
Runs before GROUP BY? Yes No — runs after
Filters what? Individual rows Groups
Aggregate functions allowed? No Yes
Works in UPDATE / DELETE? Yes No (SELECT only)
Typical example WHERE Category = 'Electronics' HAVING SUM(Amount) > 50000

Common Mistakes

Mistake 1: Putting an aggregate in WHERE. WHERE SUM(Amount) > 1000 is a syntax error. Move the condition to HAVING.

Mistake 2: Forgetting to add a column to GROUP BY. If a column is in the SELECT list and it's not aggregated, it must also appear in GROUP BY. SQL Server will throw a clear error.

Mistake 3: Using HAVING when WHERE would do. If your filter doesn't involve an aggregate, it belongs in WHERE. Using HAVING works but wastes performance because all rows are grouped first.

Mistake 4: Ignoring NULLs in COUNT. COUNT(*) and COUNT(column) can give different numbers. Pick the one that fits your question.

Mistake 5: Assuming GROUP BY sorts the output. It doesn't — it just groups. If you need a specific order, add an explicit ORDER BY.

The Interview Answer

If an interviewer asks "What is the difference between WHERE and HAVING?":

"WHERE filters individual rows before the grouping step, while HAVING filters the groups after aggregation. Because of this, aggregate functions like SUM or COUNT are not allowed in WHERE — they only work in HAVING. For performance, use WHERE whenever possible to eliminate rows early, and use HAVING only when the condition involves an aggregated value."

Summary

Aggregate functions — SUM, AVG, COUNT, MIN, and MAX — collapse many rows into a single number. Pair them with GROUP BY to calculate one aggregated value per bucket of data. The golden rule — every non-aggregated column in your SELECT list must appear in GROUP BY.

Use WHERE to filter rows before grouping, and HAVING to filter groups after. Aggregates only work in HAVING, because at the time WHERE runs, there are no aggregated values yet. For clean, fast queries — filter aggressively with WHERE first, then apply group-level conditions in HAVING.

Master these three ideas — aggregates, GROUP BY, and the WHERE/HAVING split — and you'll be able to write almost every reporting query a real business ever needs.

Concept Key Takeaway
Aggregate functions Collapse many rows into one value
GROUP BY Runs the aggregate once per group
Golden rule Every SELECT column must be grouped or aggregated
Multiple columns Creates one row per combination of values
COUNT(*) vs COUNT(col) * counts rows, col ignores NULLs
WHERE Filters rows before grouping
HAVING Filters groups after grouping
Aggregates in WHERE Not allowed — use HAVING
Execution order FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
Interview line "WHERE before groups, HAVING after groups"

Chakrapani U

Hi, I’m Chakrapani Upadhyaya, an IT professional with 15+ years of industry experience. Over the years, I have worked on web development, enterprise applications, database systems, and cloud-based solutions. Through this blog, I aim to simplify complex technical concepts and help learners grow from beginners to confident, industry-ready developers.

Previous Post Next Post

نموذج الاتصال