What Is Normalization?
Normalization is the process of organizing the columns and tables of a database to remove duplicate data, avoid update problems, and make your queries simpler and faster. It is done in stages called normal forms. In this article we cover the first three — 1NF, 2NF, and 3NF — which are enough for almost every real project and almost every interview question.
Our Starting Messy Table
Let's imagine a small library that tracks book loans in one single table called BookLoans. The designer was in a hurry and dumped everything into one place.
| LoanId | BorrowerName | BorrowerPhones | LibraryBranch | BookTitle | AuthorName | DaysBorrowed | FinePerDay | TotalFine |
|---|---|---|---|---|---|---|---|---|
| 1 | Anita Sharma | 9876, 9888 | Mumbai Central | Atomic Habits | James Clear | 5 | 10 | 50 |
| 2 | Rohit Verma | 9123 | Mumbai Central | Deep Work | Cal Newport | 3 | 10 | 30 |
| 3 | Anita Sharma | 9876, 9888 | Pune North | Atomic Habits | James Clear | 2 | 15 | 30 |
This table works, but it hides many problems. "Mumbai Central" is repeated. Phone numbers are stuffed into one column. Author information is duplicated whenever the same book is loaned. And TotalFine could be calculated, not stored. Let's fix all of this, one normal form at a time.
First Normal Form (1NF)
1NF has two simple rules.
Rule 1
Rule 2
Applying Rule 1: Make Columns Atomic
Our BorrowerName mixes first and last name, and BorrowerPhones stores multiple phones in one cell. Searching by last name or by one phone becomes painful and slow. Split them up.
-- Before (bad)
BorrowerName = "Anita Sharma"
BorrowerPhones = "9876, 9888"
-- After (1NF Rule 1)
FirstName = "Anita"
LastName = "Sharma"
-- And phones move to their own table:
Phones(BorrowerId, Phone)
Applying Rule 2: Remove Repeating Data
"Mumbai Central" and "Pune North" keep repeating for every loan from those branches. If "Mumbai Central" is ever renamed, we'd have to update dozens of rows. Move branches into their own master table and refer to them by an ID.
-- Master table
Branches(BranchId PK, BranchName)
1, "Mumbai Central"
2, "Pune North"
-- Loan table now just stores BranchId
BookLoans(LoanId, ..., BranchId FK)
SELECT queries stop needing string-splitting tricks.
Second Normal Form (2NF)
2NF builds on 1NF. It says: every non-key column must depend on the WHOLE primary key, not just a part of it.
This rule only matters when your primary key is composite — made of two or more columns. In our example let's imagine the primary key of BookLoans is (LoanId, BookTitle). Now look at AuthorName. It depends only on BookTitle — not on LoanId. That's a partial dependency, and it breaks 2NF.
The Problem with Partial Dependencies
Because author data is glued to every loan row, if James Clear updates his pen name, we must update dozens of rows. One typo and suddenly the same author has two names in our data.
The Fix: Move Partial Columns Into Their Own Table
Put each book's own details in a Books master table and reference it from BookLoans via a BookId.
-- Book master table
Books(BookId PK, Title, Author, Genre)
1, "Atomic Habits", "James Clear", "Self-help"
2, "Deep Work", "Cal Newport", "Productivity"
-- Loan table now just stores BookId
BookLoans(LoanId PK, BorrowerId, BranchId, BookId, DaysBorrowed, FinePerDay)
Now the author and genre live in one row inside Books. Updating them is a one-liner, not a bulk operation. That is 2NF in action.
Third Normal Form (3NF)
3NF adds one more rule: no non-key column should depend on another non-key column. In simple words — no calculated fields and no indirect dependencies.
Look back at our loan row:
DaysBorrowed = 5
FinePerDay = 10
TotalFine = 50 ← depends on the other two columns
TotalFine can always be calculated as DaysBorrowed * FinePerDay. Storing it is risky — if one of the two inputs changes, the stored total becomes wrong unless we remember to update it. That is a classic 3NF violation.
The Fix: Drop Calculated Columns
Delete TotalFine from the table. Whenever you need it, compute it on the fly in your query.
SELECT LoanId,
DaysBorrowed,
FinePerDay,
DaysBorrowed * FinePerDay AS TotalFine
FROM BookLoans;
Another common 3NF violation is an indirect dependency. For example, if FinePerDay really depends on the branch (Mumbai charges 10, Pune charges 15), then FinePerDay should sit in the Branches table — not on every loan row.
Our Final, Fully-Normalized Design
After applying 1NF, 2NF, and 3NF, the messy single table has become a clean small schema:
Borrowers(BorrowerId PK, FirstName, LastName)
Phones(PhoneId PK, BorrowerId FK, Phone)
Branches(BranchId PK, BranchName, FinePerDay)
Books(BookId PK, Title, Author, Genre)
BookLoans(LoanId PK,
BorrowerId FK,
BranchId FK,
BookId FK,
DaysBorrowed)
Every piece of information now lives in exactly one place. Updates touch one row. Queries stay simple. Duplicates are impossible.
Quick Comparison
| Form | Main Rule | Fix |
|---|---|---|
| 1NF | Atomic values + no repeating groups | Split columns; move repeated values to master tables |
| 2NF | No partial dependency on a composite PK | Move partial-dependent columns to their own table |
| 3NF | No column depends on another non-key column | Remove calculated fields; move indirect columns out |
Common Mistakes
Mistake 1: Storing comma-separated values in a single column. It looks compact but breaks 1NF and kills query performance.
Mistake 2: Keeping a column that depends only on part of the key. Always ask — "does this column depend on the whole primary key?" If no, move it out.
Mistake 3: Storing calculated totals. They drift out of sync the moment inputs change. Calculate at query time.
Mistake 4: Over-normalizing. Breaking a phone number into three fields (country code, area code, number) is usually pointless. Normalize only where it actually helps.
The Interview Answer
"1NF means every column should hold a single atomic value with no repeating groups — split combined fields and move repeating data into master tables. 2NF builds on 1NF and says no non-key column should depend on just a part of a composite primary key. 3NF builds on 2NF and says no non-key column should depend on another non-key column — that means no calculated fields and no indirect dependencies."
Then walk them through one example like our
BookLoans table.
Summary
Database normalization is the art of cleaning up your tables so data lives in only one place. The first three normal forms — 1NF, 2NF, and 3NF — solve most real-world design problems.
1NF forces atomic values and removes repeating groups. 2NF removes partial dependencies on composite keys. 3NF removes calculated fields and indirect dependencies. Apply them in order and you will almost always end up with a clean, efficient schema.
In interviews, remember the pattern — explain each form in one line, then show it with a single small example. That single example shows the interviewer that you truly understand why each rule exists, not just what it is called.
| Concept | Key Takeaway |
|---|---|
| Normalization | Organize data so each fact lives in one place |
| 1NF Rule 1 | Break columns into atomic values |
| 1NF Rule 2 | Move repeating values into master tables |
| 2NF | No partial dependency on a composite key |
| 3NF | No dependency between non-key columns |
| Composite key | Primary key made of two or more columns |
| Calculated field | Should be computed, not stored |
| Master table | Stores a repeated value only once |
| Interview line | Atomic → no partial → no transitive |