Why Interviewers Love This Question
This one problem tests almost every important SQL skill in a single query — GROUP BY, aggregate functions, subqueries, window functions, CTEs, and DELETE. A good interviewer does not just want the final SQL — they want to see how you think. That is why you must focus on the approach, not just the final line of code.
If you walk in with a clear three-step approach, you can handle any twist the interviewer throws at you — "what if there is no ID?", "what if there are 5 duplicate columns?", "what if you want to keep the newest row?". The approach stays the same; only the tools change.
The Sample Table
Let's use a simple Customers table throughout the article so every example is easy to follow.
CREATE TABLE Customers (
CustomerId INT IDENTITY(1,1) PRIMARY KEY,
FullName VARCHAR(100),
Email VARCHAR(100)
);
INSERT INTO Customers (FullName, Email) VALUES
('Anita Sharma', 'anita@example.com'),
('Rohit Verma', 'rohit@example.com'),
('Anita Sharma', 'anita@example.com'),
('Priya Das', 'priya@example.com'),
('Rohit Verma', 'rohit@example.com'),
('Karan Malhotra','karan@example.com'),
('Anita Sharma', 'anita@example.com');
After the insert, the table looks like this:
| CustomerId | FullName | |
|---|---|---|
| 1 | Anita Sharma | anita@example.com |
| 2 | Rohit Verma | rohit@example.com |
| 3 | Anita Sharma | anita@example.com |
| 4 | Priya Das | priya@example.com |
| 5 | Rohit Verma | rohit@example.com |
| 6 | Karan Malhotra | karan@example.com |
| 7 | Anita Sharma | anita@example.com |
Anita appears 3 times (rows 1, 3, 7). Rohit appears 2 times (rows 2, 5). Our goal — keep exactly one row for each unique person and delete the rest.
The Three-Step Approach
This is the approach that works for every flavor of this question. Memorize it and say it out loud in the interview before you write any SQL.
Step 1
Step 2
Step 3
That is the whole trick. The tools you use change depending on whether the table has an ID or not, but these three steps never change.
Case 1: Table With a Unique ID Column
This is the easy case. Our Customers table already has CustomerId. Let's apply the three steps.
Step 1: We Already Have an ID
CustomerId is unique for every row. Step one is done for free.
Step 2: Pick the IDs to Keep
For each group of duplicates, we keep one row. Let's keep the row with the smallest CustomerId (the oldest record). Use GROUP BY on the duplicate columns and MIN on the ID.
SELECT MIN(CustomerId) AS KeepId
FROM Customers
GROUP BY FullName, Email;
Result:
| KeepId |
|---|
| 1 (Anita — keep the earliest) |
| 2 (Rohit — keep the earliest) |
| 4 (Priya — only one) |
| 6 (Karan — only one) |
Step 3: Delete Every Other Row
DELETE FROM Customers
WHERE CustomerId NOT IN (
SELECT MIN(CustomerId)
FROM Customers
GROUP BY FullName, Email
);
After running this, rows 3, 5, and 7 disappear. Only one Anita, one Rohit, one Priya, and one Karan remain. Done.
MIN to MAX. The approach is identical — only one word changes.
Case 2: Table Without a Unique ID Column
Now the tricky version. Imagine the same Customers table but without CustomerId — just FullName and Email. How do we identify which row to delete when every duplicate row looks identical?
Step 1: Generate a Unique ID on the Fly
SQL Server gives us a beautiful tool called ROW_NUMBER() — a window function that can number rows any way we want. Combined with PARTITION BY, it numbers rows inside each duplicate group, restarting from 1 for each new group.
SELECT FullName,
Email,
ROW_NUMBER() OVER (
PARTITION BY FullName, Email
ORDER BY FullName
) AS RowNum
FROM Customers;
Result:
| FullName | RowNum | |
|---|---|---|
| Anita Sharma | anita@example.com | 1 |
| Anita Sharma | anita@example.com | 2 |
| Anita Sharma | anita@example.com | 3 |
| Karan Malhotra | karan@example.com | 1 |
| Priya Das | priya@example.com | 1 |
| Rohit Verma | rohit@example.com | 1 |
| Rohit Verma | rohit@example.com | 2 |
See the magic? Every duplicate group starts at 1. The first row in each group gets 1, the next gets 2, and so on. That tiny number is now our unique ID inside each group.
Step 2 & 3: Delete Everything with RowNum > 1
The rule is simple — keep RowNum = 1, delete everything else. But we cannot use this directly in a DELETE because window functions cannot sit inside a regular WHERE. This is where a CTE (Common Table Expression) becomes our hero.
The Clean CTE Solution (Best for Interviews)
A CTE is basically a named temporary result that sits right before your main query. It makes the solution read like plain English.
WITH DuplicateRows AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY FullName, Email
ORDER BY FullName
) AS RowNum
FROM Customers
)
DELETE FROM DuplicateRows
WHERE RowNum > 1;
Read it like a sentence: "First, number every row inside its duplicate group. Then delete every row where the number is greater than 1."
Visual: How ROW_NUMBER with PARTITION BY Works
Alternative: Using a Self-Join
If the interviewer blocks CTEs and window functions just to push you, you can still solve it with a self-join. It is older-school and works on any SQL database.
DELETE a
FROM Customers a
INNER JOIN Customers b
ON a.FullName = b.FullName
AND a.Email = b.Email
AND a.CustomerId > b.CustomerId;
Read it like this: "For every row a, find another row b that is identical but has a smaller ID. If such a b exists, a is a duplicate — delete it." Only the row with the smallest ID in each group survives.
Alternative: Using DELETE with EXISTS
DELETE a
FROM Customers a
WHERE EXISTS (
SELECT 1
FROM Customers b
WHERE a.FullName = b.FullName
AND a.Email = b.Email
AND b.CustomerId < a.CustomerId
);
This does the exact same thing as the self-join but sometimes reads cleaner to interviewers who prefer subqueries.
Comparison: Which Method to Use When
| Method | Best For | Pros | Cons |
|---|---|---|---|
GROUP BY + MIN |
Table already has a unique ID | Simple, easy to explain | Works only if ID exists |
ROW_NUMBER + CTE |
No ID, or big tables | Cleanest answer, very readable | Needs window function support (SQL Server, PostgreSQL, MySQL 8+) |
| Self-join | Older databases, no window functions | Works almost everywhere | Harder to read, slower on huge tables |
EXISTS subquery |
Mid-size tables, readable style | Very expressive | Slower than a CTE on very large tables |
Common Mistakes
Mistake 1: Running the DELETE Without Testing First
SELECT version of your query first to see which rows would be deleted. Once you run DELETE on production data, you cannot undo it unless you have a backup or a transaction.
BEGIN TRANSACTION;
WITH DuplicateRows AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY FullName, Email
ORDER BY CustomerId
) AS RowNum
FROM Customers
)
DELETE FROM DuplicateRows WHERE RowNum > 1;
SELECT * FROM Customers;
-- If everything looks right:
COMMIT;
-- If something is wrong:
-- ROLLBACK;
Mistake 2: Forgetting PARTITION BY
Writing ROW_NUMBER() OVER (ORDER BY FullName) without PARTITION BY gives every row a unique number across the whole table — not inside each duplicate group. Every row becomes RowNum > 1 and you accidentally delete almost everything.
Mistake 3: Using SELECT DISTINCT and Thinking It Deletes Duplicates
SELECT DISTINCT only hides duplicates in the result set. It does not touch the actual table. The duplicates are still there. Always use DELETE for real cleanup.
Mistake 4: Not Including All Duplicate Columns
If "duplicate" means same FullName and same Email, both must appear in PARTITION BY or GROUP BY. Missing one column changes the definition of duplicate and gives wrong results.
Mistake 5: Choosing MIN/MAX Without a Reason
Always ask yourself: "Which copy should I keep?" The oldest (smallest ID), the newest (largest ID), or one with some other rule? Document your choice — interviewers often ask why.
Bonus: Preventing Duplicates in the First Place
Deleting duplicates is a fix. The real solution is to stop them from entering in the first place. Add a UNIQUE constraint or a unique index on the columns that should not repeat.
ALTER TABLE Customers
ADD CONSTRAINT UQ_Customers_Email UNIQUE (Email);
After this, SQL Server will block any insert that tries to add a second row with the same email. Mentioning this in the interview shows you think like a real developer, not just a coder.
Real-World Scenarios Where This Comes Up
Scenario 1: Data Import from CSV
A business user uploads a customer CSV twice by mistake. You now have every record duplicated. The CTE approach cleans it in one query.
Scenario 2: Merging Two Databases
After an acquisition, you merge two customer tables. Many customers exist in both. Use PARTITION BY email and keep the row with the most recent UpdatedAt.
Scenario 3: Logging Tables
An event-logging table accidentally inserts the same event multiple times. Delete duplicates by partitioning on (UserId, EventType, EventDate) and keeping the earliest row.
The Interview Answer Template
Walk them through the 3-step approach:
"My approach has three steps. First, make sure every row has a unique identifier — if the table doesn't have one, I will generate one using
ROW_NUMBER(). Second, for each duplicate group, I pick exactly one row I want to keep — usually the one with the smallest or largest ID. Third, I delete every row that is not in that keep-list. The cleanest way to write this in SQL Server is a CTE with ROW_NUMBER() OVER (PARTITION BY ...) followed by a DELETE on RowNum > 1."Then write the CTE solution and explain each part.
SELECT first, wrap the DELETE in a transaction, and suggest adding a UNIQUE constraint to prevent duplicates in the future. This is what separates a junior candidate from a senior one.
Summary
Deleting duplicate records in SQL Server sounds scary the first time you hear it, but the approach is always the same three steps — get a unique ID, pick the IDs you want to keep, delete everything else. Change the tools you use, never change the approach.
When the table has an ID, use GROUP BY with MIN or MAX. When it does not, use ROW_NUMBER() OVER (PARTITION BY ...) inside a CTE and delete every row where RowNum > 1. If you can't use window functions, fall back to a self-join or an EXISTS subquery — the logic stays the same.
Always test with a SELECT first. Wrap your DELETE in a transaction so you can roll back. And once the data is clean, protect the table with a UNIQUE constraint so the problem cannot happen again.
Most importantly, remember that an interview is 70% technical and 30% presentation. The CTE version is the cleanest to explain, the easiest to read, and the most impressive one to write on a whiteboard. Master it, rehearse the three-step story, and you will answer this question with confidence every single time.
| Concept | Key Takeaway |
|---|---|
| The 3-step approach | Unique ID → pick keepers → delete others |
| With ID column | GROUP BY + MIN(id) + DELETE WHERE id NOT IN |
| Without ID column | ROW_NUMBER() OVER (PARTITION BY ...) + CTE + DELETE WHERE RowNum > 1 |
| CTE | Best for presentation — reads like English |
| Self-join | Fallback when window functions are blocked |
| EXISTS subquery | Clean alternative without window functions |
| Keep oldest vs newest | Switch between MIN and MAX |
| Safety | Run SELECT first, wrap DELETE in a transaction |
| Prevention | Add a UNIQUE constraint on duplicate columns |
| Interview line | Explain the 3-step approach, then write the CTE |