Normalization in DBMS

What is Normalization ?

We define Normalization as a  step-by-step process via which we reduce the complexity of a database. The database may have a lot redundancies (unnecessary information that has become outdated or no longer useful) like attendance record for former students in a university.

In this article, we will learn about Normalization in DBMS.

normalization in DBMS

Normalization in DBMS

A lot of anomalies may be caused in a database because of various cause insertion, deletion and updation operation or concurrency errors.

There are 6 different normalisation benchmarks used which are –

  • 1 NF (normal form)
  • 2 NF
  • 3 NF
  • BCNF (Boyce-Codd Normal Form
  • 4NF
  • 5NF (not used anymore)
  • 6NF (not used anymore)

1NF

Name – 1 Normal Form

Prerequisite –

The relation should not have any multivalued attribute at all in it. For example, for Student table in the database (relation). In the phone number column there must only be one phone number (value).

How to solve –

Decompose the table into single values attributes. Either the second phone number must be deleted or it can be decomposed into primary and secondary number.

Not 1 NF

ID FName Phone_Number
1 Bran 915988589, 989458602
2 Sansa 790614709
3 Jon 700293958

Is 1 NF

ID FName Primary_No Secondary_No
1 Bran 915988589 989458602
2 Sansa 790614709
3 Jon 700293958

Is 1 NF

ID FName Phone_Number
1 Bran 915988589
2 Sansa 790614709
3 Jon 700293958

2NF

To understand the conditions for 2NF, one must know the following –

  • What is non prime attribute
  • What is an candidate Key

Candidate Key – A candidate key is a column, or set of columns, in a table that can uniquely identify any database record without referring to any other data. The best set of columns which identity it uniquely is chosen as candidate key.

Non Prime Attribute – Is the row that doesn’t belong to the candidate key for the table.

Example –

In the table below, if you want to identify any row of the table uniquely, we need both {StudentID, CourseID} thus, it becomes candidate key.

And since CourseName is not part of candidate key. It becomes non-prime attribute.

Enrolled Courses Table

Student_ID Course_ID Course_Name
1 CSE101 C
1 CSE102 C++
2 CSE101 C
2 CSE102 C++
3 CSE103 DBMS

What is 2NF Finally?

To be 2NF in DBMS. The relation should be –

  • Already 1NF
  • No Non Prime attribute should be dependent on any candidate key element, which is called not having partial dependency. This maybe is difficult to understand from definitions but easier from example.

In table, the candidate key is {StudentID, CourseID}, the name of the course is obviously dependent on the CourseID right? CourseID, is non prime also. Thus there is partial dependency in it. Thus, the relation is not 2NF.

To make it 2NF we can break the table as follows –

Enrolled Courses Table

Student_ID Course_ID
1 CSE101
1 CSE102
2 CSE101
2 CSE102
3 CSE103

Courses Table

Course_ID Name
CSE101 C
CSE102 C++
CSE103 DBMS

3NF

For a relation to be in 3NF form the following must hold true –
  • Should be 2NF already
  • There should not be any transitive dependency for non prime attributes
Don’t worry if you don’t know what transitive dependency is, we will understand this with an example.

Student Table

Student_ID Name Age State Country
1 Bran 16 UP India
2 Jon 21 UP India
3 Arya 17 UP India
4 Jamie 41 Gujarat India
5 Tyrion 41 Gujarat India
Now, for the table above. The primary key obviously is StudentID. However, the following dependencies are there –
  1. StudentID -> Name
  2. StudentID -> Age
  3. StudentID -> State
  4. State -> Country
Now, when we look at point 3 and 4 they form transitive dependency as they are of format A -> B and B -> C. Thus, this forms transitive dependency is not in 3NF.

BCNF

The BCNF is the short for Boyce-Codd Normal Form the following are the conditions for a relation to be in BCNF form –
  • Should be 3NF
  • If every non-trivial functional dependency A –> B, A is a super key, i.e. LHS should always be a super key.
Let’s have a look onto an example to understand this better. Let’s say a student enrolling a course can also register for courses from other branches and the course registration table for student is stored in the following way –

Course Registered Table

Student_ID Name CourseName Branch_ID Course_ID
1 Bran C CSE 101
1 Bran Fluid Mechanics MECH 101
1 Bran C++ CSE 102
2 Jamie C CSE 101
2 Jamie Basic Electronics EEE 101
Candidate Key – {StudentID, CourseName}
  • To identify each row uniquely we need {StudentID, CourseName}
  • Thus, its the candidate key
Functional dependencies are –
  1. StudentID -> Name
  2. CourseName -> {BranchID, CourseID}
Now, neither the 2nd is a dependency of type X -> Y, nor CourseName is a super key. Note –  Set of attributes that can help us uniquely identify a record or tuple in the database is called super key.

Prime Course Trailer

Related Banners

Get PrepInsta Prime & get Access to all 200+ courses offered by PrepInsta in One Subscription

Student Table

Student_ID Name
1 Bran
1 Bran
1 Bran
2 Jamie
2 Jamie

Course Table

Course_Name Branch_ID Course_ID
C CSE 101
Fluid Mechanics MECH 101
C++ CSE 102
C CSE 101
Basic Electronics EEE 101

4NF

To be in 4NF form the follwing must be true

  1. Should be in BCNF form
  2. Should not have any Multi-Valued Dependency.

Example –

A -> BC

Get over 200+ course One Subscription

Courses like AI/ML, Cloud Computing, Ethical Hacking, C, C++, Java, Python, DSA (All Languages), Competitive Coding (All Languages), TCS, Infosys, Wipro, Amazon, DBMS, SQL and others

Checkout list of all the video courses in PrepInsta Prime Subscription

Checkout list of all the video courses in PrepInsta Prime Subscription