Nowadays Data that we have in the database is stored in enormous quantities, which makes retrieving the certain amount of data a tedious task if the data is not organized correctly. With the help of normalization we can organize the data and also reduce the redundant data. Through this article I will give you an insight of the normalization in SQL.
Now, let’s discuss what normalization in database is,
What is Normalization?
Database Normalization is an approach for organizing the data in a database inorder to remove the anomalies and keep consistent data . It is a process of decomposing tables into two or more tables which completely eliminates the data redundancy. It puts data into a tabular form by removing the duplicated data from its relational tables.
Is it really necessary to normalize the table present in the database?
Every table in the database has to be in the normal form, which might help in order to avoid such data anomalies -
Insertion Anomaly - It happens when we cannot allow a data to be inserted into the table without the presence of another attribute.
Update Anomaly - It happens due to the data inconsistency which results from data redundancy and allows a partial update of data.
Deletion Anomaly - This anomaly occurs due to the deletion of the attributes, certain attributes will get lost.
Normalization is capable of enhancing the database. Let’s move forward and understand various Normal forms.
1st Normal Form (1NF)
A relation violates the first normal form if it contains a multi-valued attribute. A relation will be in the 1st normal form if and only if all the attributes have atomic domain, which simply states that a single cell can not hold multiple values.
Let’s understand this with an example -
The above table does not follow the 1st NF as the Content attribute contains multiple values. Let’s rearrange the table so that it follows the 1st NF.
2nd Normal Form (2NF)
A relation is in 2NF when the table follows the 1NF and the relation should not contain any partial dependency. Now what is partial dependency? Let’s understand dependency first -
For example we have a relational table of student, which have attributes like student_id and student_name
If we ask for student_id: 4, we will get the student_name: Deepak, similarly, if we ask for the student_id: 5 , we will get the student_name: Shubham and every other attribute/column related to the student_id. This is what dependency or Functional Dependency means.
Let’s consider another table to understand Partial Dependency -
In this table we have a composite primary key, meaning more than one attribute is used to specify primary key, and here we have student_id + subject_id.
But as you can see the teacher column only depends on the subject_id and it has nothing to do with student_id. This is known as partial dependency, which means the relation is not in 2NF.
To remove the Partial Dependency or to bring the table in 2NF form, we need to break down the table into two parts -
As you can see, we have completely removed the partial dependency, now the teacher is fully dependent on subject_id which is a primary key.
3rd Normal Form (3NF)
For a table to be in 3NF, the first condition is that it must be in 2NF. The other condition is that it is not supposed to follow transitive dependency for non-prime attributes(which do not form a candidate key). That means non-prime attributes can not depend on other non-prime attributes.
This is an indirect relationship formed between the attributes which causes functional dependency. Let’s say if P -> Q and Q -> R is true, then P -> R is a transitive dependency.
Let’s try to understand this with an example and how we can attain 3NF by eliminating transitive dependency -
In the above table, student_id determines subject_id and subject_id determines the subject. This makes student_id determine the subject, which implies that we have a transitive functional dependency and violates the 3NF form.
In order to achieve 3NF, we are required to eliminate transitive dependency as shown below -
Now you can see from the above tables that all of the non-prime attributes are fully dependent on the prime key attributes. As in the first table subject_id, student_name are dependent only on student_id and in the second table subject is only dependent on subject_id.
Boyce Codd Normal Form (BCNF)
For a table to be in BCNF which is also referred to as 3.5NF, first it should be in third normal form (3NF) and for any dependency A -> B, here A should be a Super Key.
Now this table is in 1NF as all the values are atomic and it also satisfies 2NF as there is no Partial Dependency as well as 3NF because there is no Transitive Dependency. But why does this table not follow BCNF?
As you can see here student_id and subject form the primary key, which makes the subject column a prime attribute. Here as subject is dependent on professor, Professor -> subject, but professor is not a super key. So, the table doesn’t satisfy the BCNF.
Let’s break the table to satisfy the BCNF -
In one table it holds the student_id column and newly created professor_id column.
In the second table we will be having professor_id, Professor and subject columns. By doing this it will satisfy the Boyce Codd Normal Form.
And here we have reached the end of this article in which we got the understanding on Normalization and how we can correctly store the data in the database and avoid the anomalies completely.
Hope this article clears all your doubts regarding data normalization concepts. Please comment down below for further query and suggestions.