MS SQL Server - Delete duplicate records in MS SQL Server

Hello friends, welcome to shrash studio learning, in this article we will cover one of the most asked SQL Server interview questions — "How do you delete duplicate records from a table?". Interviewers love this question because it has many flavors — with an ID column, without an ID column, using a CTE, using window functions, and more. In this guide we will use simple English, fresh examples, and the exact three-step approach that works every time. By the end, you will be able to answer any variation of this question with full confidence and present your solution in a clean, professional way.

Why Interviewers Love This Question

This one problem tests almost every important SQL skill in a single query — GROUP BY, aggregate functions, subqueries, window functions, CTEs, and DELETE. A good interviewer does not just want the final SQL — they want to see how you think. That is why you must focus on the approach, not just the final line of code.

If you walk in with a clear three-step approach, you can handle any twist the interviewer throws at you — "what if there is no ID?", "what if there are 5 duplicate columns?", "what if you want to keep the newest row?". The approach stays the same; only the tools change.

The Sample Table

Let's use a simple Customers table throughout the article so every example is easy to follow.


CREATE TABLE Customers (
    CustomerId INT IDENTITY(1,1) PRIMARY KEY,
    FullName   VARCHAR(100),
    Email      VARCHAR(100)
);

INSERT INTO Customers (FullName, Email) VALUES
('Anita Sharma',  'anita@example.com'),
('Rohit Verma',   'rohit@example.com'),
('Anita Sharma',  'anita@example.com'),
('Priya Das',     'priya@example.com'),
('Rohit Verma',   'rohit@example.com'),
('Karan Malhotra','karan@example.com'),
('Anita Sharma',  'anita@example.com');

After the insert, the table looks like this:

CustomerId	FullName	Email
1	Anita Sharma	anita@example.com
2	Rohit Verma	rohit@example.com
3	Anita Sharma	anita@example.com
4	Priya Das	priya@example.com
5	Rohit Verma	rohit@example.com
6	Karan Malhotra	karan@example.com
7	Anita Sharma	anita@example.com

Anita appears 3 times (rows 1, 3, 7). Rohit appears 2 times (rows 2, 5). Our goal — keep exactly one row for each unique person and delete the rest.

The Three-Step Approach

This is the approach that works for every flavor of this question. Memorize it and say it out loud in the interview before you write any SQL.

THE 3 STEPS TO DELETE DUPLICATES

Step 1

Make sure rows have a unique id

Use existing primary key OR generate one with ROW_NUMBER()

Step 2

Pick the ids you want to keep

Use GROUP BY + MIN / MAX for each duplicate group

Step 3

Delete everything else

DELETE WHERE id NOT IN (the ids from step 2)

That is the whole trick. The tools you use change depending on whether the table has an ID or not, but these three steps never change.

Case 1: Table With a Unique ID Column

This is the easy case. Our Customers table already has CustomerId. Let's apply the three steps.

Step 1: We Already Have an ID

CustomerId is unique for every row. Step one is done for free.

Step 2: Pick the IDs to Keep

For each group of duplicates, we keep one row. Let's keep the row with the smallest CustomerId (the oldest record). Use GROUP BY on the duplicate columns and MIN on the ID.


SELECT MIN(CustomerId) AS KeepId
FROM Customers
GROUP BY FullName, Email;

Result:

KeepId
1 (Anita — keep the earliest)
2 (Rohit — keep the earliest)
4 (Priya — only one)
6 (Karan — only one)

Step 3: Delete Every Other Row


DELETE FROM Customers
WHERE CustomerId NOT IN (
    SELECT MIN(CustomerId)
    FROM Customers
    GROUP BY FullName, Email
);

After running this, rows 3, 5, and 7 disappear. Only one Anita, one Rohit, one Priya, and one Karan remain. Done.

  Tip: To keep the newest row instead of the oldest, just change MIN to MAX. The approach is identical — only one word changes.

Case 2: Table Without a Unique ID Column

Now the tricky version. Imagine the same Customers table but without CustomerId — just FullName and Email. How do we identify which row to delete when every duplicate row looks identical?

Step 1: Generate a Unique ID on the Fly

SQL Server gives us a beautiful tool called ROW_NUMBER() — a window function that can number rows any way we want. Combined with PARTITION BY, it numbers rows inside each duplicate group, restarting from 1 for each new group.


SELECT FullName,
       Email,
       ROW_NUMBER() OVER (
           PARTITION BY FullName, Email
           ORDER BY FullName
       ) AS RowNum
FROM Customers;

Result:

FullName	Email	RowNum
Anita Sharma	anita@example.com	1
Anita Sharma	anita@example.com	2
Anita Sharma	anita@example.com	3
Karan Malhotra	karan@example.com	1
Priya Das	priya@example.com	1
Rohit Verma	rohit@example.com	1
Rohit Verma	rohit@example.com	2

See the magic? Every duplicate group starts at 1. The first row in each group gets 1, the next gets 2, and so on. That tiny number is now our unique ID inside each group.

Step 2 & 3: Delete Everything with RowNum > 1

The rule is simple — keep RowNum = 1, delete everything else. But we cannot use this directly in a DELETE because window functions cannot sit inside a regular WHERE. This is where a CTE (Common Table Expression) becomes our hero.

The Clean CTE Solution (Best for Interviews)

A CTE is basically a named temporary result that sits right before your main query. It makes the solution read like plain English.


WITH DuplicateRows AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY FullName, Email
               ORDER BY FullName
           ) AS RowNum
    FROM Customers
)
DELETE FROM DuplicateRows
WHERE RowNum > 1;

Read it like a sentence: "First, number every row inside its duplicate group. Then delete every row where the number is greater than 1."

  Presentation matters: When you give this answer in an interview, explain each part in plain English as you write it — "I will use a CTE to give every row a position inside its duplicate group, and then delete everything after the first one." Clean answer + clean story = top marks.

Visual: How ROW_NUMBER with PARTITION BY Works

Before vs After the CTE Delete

before: Anita x3, Rohit x2, Priya x1, Karan x1  (7 rows)

group: Anita → 1,2,3  |  Rohit → 1,2  |  Priya → 1  |  Karan → 1

keep: All rows with RowNum = 1

drop: All rows with RowNum > 1  (Anita 2nd & 3rd, Rohit 2nd)

after: Anita x1, Rohit x1, Priya x1, Karan x1  (4 rows)

Alternative: Using a Self-Join

If the interviewer blocks CTEs and window functions just to push you, you can still solve it with a self-join. It is older-school and works on any SQL database.


DELETE a
FROM Customers a
INNER JOIN Customers b
    ON a.FullName = b.FullName
   AND a.Email    = b.Email
   AND a.CustomerId > b.CustomerId;

Read it like this: "For every row a, find another row b that is identical but has a smaller ID. If such a b exists, a is a duplicate — delete it." Only the row with the smallest ID in each group survives.

Alternative: Using DELETE with EXISTS


DELETE a
FROM Customers a
WHERE EXISTS (
    SELECT 1
    FROM Customers b
    WHERE a.FullName = b.FullName
      AND a.Email    = b.Email
      AND b.CustomerId < a.CustomerId
);

This does the exact same thing as the self-join but sometimes reads cleaner to interviewers who prefer subqueries.

Comparison: Which Method to Use When

Method	Best For	Pros	Cons
`GROUP BY` + `MIN`	Table already has a unique ID	Simple, easy to explain	Works only if ID exists
`ROW_NUMBER` + CTE	No ID, or big tables	Cleanest answer, very readable	Needs window function support (SQL Server, PostgreSQL, MySQL 8+)
Self-join	Older databases, no window functions	Works almost everywhere	Harder to read, slower on huge tables
`EXISTS` subquery	Mid-size tables, readable style	Very expressive	Slower than a CTE on very large tables

Common Mistakes

Mistake 1: Running the DELETE Without Testing First

Important: Always run a SELECT version of your query first to see which rows would be deleted. Once you run DELETE on production data, you cannot undo it unless you have a backup or a transaction.


BEGIN TRANSACTION;

WITH DuplicateRows AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY FullName, Email
               ORDER BY CustomerId
           ) AS RowNum
    FROM Customers
)
DELETE FROM DuplicateRows WHERE RowNum > 1;

SELECT * FROM Customers;
-- If everything looks right:
COMMIT;
-- If something is wrong:
-- ROLLBACK;

Mistake 2: Forgetting `PARTITION BY`

Writing ROW_NUMBER() OVER (ORDER BY FullName) without PARTITION BY gives every row a unique number across the whole table — not inside each duplicate group. Every row becomes RowNum > 1 and you accidentally delete almost everything.

Mistake 3: Using `SELECT DISTINCT` and Thinking It Deletes Duplicates

SELECT DISTINCT only hides duplicates in the result set. It does not touch the actual table. The duplicates are still there. Always use DELETE for real cleanup.

Mistake 4: Not Including All Duplicate Columns

If "duplicate" means same FullName and same Email, both must appear in PARTITION BY or GROUP BY. Missing one column changes the definition of duplicate and gives wrong results.

Mistake 5: Choosing `MIN`/`MAX` Without a Reason

Always ask yourself: "Which copy should I keep?" The oldest (smallest ID), the newest (largest ID), or one with some other rule? Document your choice — interviewers often ask why.

Bonus: Preventing Duplicates in the First Place

Deleting duplicates is a fix. The real solution is to stop them from entering in the first place. Add a UNIQUE constraint or a unique index on the columns that should not repeat.


ALTER TABLE Customers
ADD CONSTRAINT UQ_Customers_Email UNIQUE (Email);

After this, SQL Server will block any insert that tries to add a second row with the same email. Mentioning this in the interview shows you think like a real developer, not just a coder.

Real-World Scenarios Where This Comes Up

Scenario 1: Data Import from CSV

A business user uploads a customer CSV twice by mistake. You now have every record duplicated. The CTE approach cleans it in one query.

Scenario 2: Merging Two Databases

After an acquisition, you merge two customer tables. Many customers exist in both. Use PARTITION BY email and keep the row with the most recent UpdatedAt.

Scenario 3: Logging Tables

An event-logging table accidentally inserts the same event multiple times. Delete duplicates by partitioning on (UserId, EventType, EventDate) and keeping the earliest row.

The Interview Answer Template

When the interviewer asks "How do you delete duplicate records?":

Walk them through the 3-step approach:

"My approach has three steps. First, make sure every row has a unique identifier — if the table doesn't have one, I will generate one using ROW_NUMBER(). Second, for each duplicate group, I pick exactly one row I want to keep — usually the one with the smallest or largest ID. Third, I delete every row that is not in that keep-list. The cleanest way to write this in SQL Server is a CTE with ROW_NUMBER() OVER (PARTITION BY ...) followed by a DELETE on RowNum > 1."

Then write the CTE solution and explain each part.

  Pro tip: Mention defensive practices at the end — always run a SELECT first, wrap the DELETE in a transaction, and suggest adding a UNIQUE constraint to prevent duplicates in the future. This is what separates a junior candidate from a senior one.

Summary

Deleting duplicate records in SQL Server sounds scary the first time you hear it, but the approach is always the same three steps — get a unique ID, pick the IDs you want to keep, delete everything else. Change the tools you use, never change the approach.

When the table has an ID, use GROUP BY with MIN or MAX. When it does not, use ROW_NUMBER() OVER (PARTITION BY ...) inside a CTE and delete every row where RowNum > 1. If you can't use window functions, fall back to a self-join or an EXISTS subquery — the logic stays the same.

Always test with a SELECT first. Wrap your DELETE in a transaction so you can roll back. And once the data is clean, protect the table with a UNIQUE constraint so the problem cannot happen again.

Most importantly, remember that an interview is 70% technical and 30% presentation. The CTE version is the cleanest to explain, the easiest to read, and the most impressive one to write on a whiteboard. Master it, rehearse the three-step story, and you will answer this question with confidence every single time.

Concept	Key Takeaway
The 3-step approach	Unique ID → pick keepers → delete others
With ID column	`GROUP BY` + `MIN(id)` + `DELETE WHERE id NOT IN`
Without ID column	`ROW_NUMBER() OVER (PARTITION BY ...)` + CTE + `DELETE WHERE RowNum > 1`
CTE	Best for presentation — reads like English
Self-join	Fallback when window functions are blocked
EXISTS subquery	Clean alternative without window functions
Keep oldest vs newest	Switch between `MIN` and `MAX`
Safety	Run `SELECT` first, wrap `DELETE` in a transaction
Prevention	Add a `UNIQUE` constraint on duplicate columns
Interview line	Explain the 3-step approach, then write the CTE

License Key