Home > SQL Mastery for Interviews > SQL Mastery for Interviews > Normalization in SQL

Normal Forms

Introduction

Normalization is a process that is used for organizing the data in the database by removing data redundancy and improving data integrity. The primary goal of normalizing a given relation is to get rid of Insert, Delete, and Update abnormalities.

This multi-step process involves structuring the database's columns and tables to guarantee that the relationships between columns and tables are correctly enforced by database integrity constraints. In order to be more effective, the normalization procedure typically involves dividing huge tables into smaller ones. Edgar F. Codd established the First Normal Form in 1970, and later, other Normal Forms were defined.

Build your intuition. Is this statement true or false?

Normalization is done by dividing large tables into smaller ones.

Press true if you believe the statement is correct, or false otherwise.

First Normal Form (1NF)

The First Normal Form addresses the problem of atomicity. Atomicity here refers to a table's values not being able to be divided any further. To put it plainly, one cell cannot contain more than one value since, in such cases of composite or multi-valued attributes, the 1NF is violated. If each attribute in a relation is a single-valued attribute, the relation is in the first normal form.

In order to better grasp the concept of 1NF, consider the following example. Let’s say we have a table with the name, department, and office location of the employees of a given company, like so:

In this table, the Telephone Number column is a multi-valued attribute meaning that it’s not in First Normal Form. After the removal of the multi-valued attribute, the normalized table would look like this:

Second Normal Form (2NF)

The table must first be in First Normal Form in order for it to be in Second Normal Form. The second condition is for the table to not have any partial dependency. What this means is that the proper subset of the candidate key determines a non-prime attribute.

Let's use an example to grasp the concept of the Second Normal Form. For this and all following examples, imagine a scenario where one employee can work in multiple departments and every department can have more than one manager. Consider the following table:

The primary key of this table is actually a composite key EmployeeID, Department. The non-key attribute is Location, which depends solely on the Department. Therefore, this table is currently not in the Seconds Normal Form, so in order to accomplish the normalization, we have to divide it into two parts. The resulting tables would then look like this:

So, as we can see, the initial partial functional dependency we had is now removed and the Location column is fully dependent on the table’s primary key – Department.

Third Normal Form (3NF)

The same criterion applies as previously, namely that the table must be in 2NF before moving on to the Third Normal Form. Another requirement is that there be no transitive reliance on non-prime properties. That is, non-prime characteristics (those that do not constitute a candidate key) should not be reliant on other non-prime attributes in the same table. A transitive dependence is a functional dependency in which X → Z (X determines Z) indirectly, through X → Y and Y → Z (where Y → X is not the case).

Let us look at the following table and understand why it is not in 3NF:

In this table, EmployeeID determines Department, and Department determines Location, meaning that EmployeeID determines Location via Department. This suggests that we have a transitive functional dependence, which does not satisfy 3NF. We can solve this issue by dividing this table into two separate parts, like so:

This way, all non-key attributes are dependent solely on the primary key.

Boyce Codd Normal Form (BCNF)

The Boyce Codd Normal Form is also known as 3.5NF since it is a higher version of 3NF which was developed to handle specific sorts of anomalies that 3NF did not solve. Once again, the table must fulfill the 3rd Normal Form before continuing to BCNF. Moreover, every Right-Hand Side (RHS) attribute of the functional dependencies should be dependent on the table's super key.

Now, let’s look at an example in order to better understand the principle of BCNF:

In this table, the EmployeeID and Department form the primary key, meaning that the Department attribute is a prime attribute.

Moreover, note that one Department can be run by multiple Managers, but one Manager can only run one Department. This means that there is a dependency between the Department and the Manager attributes, i.e. the department depends on the manager’s ID (ManagerID -> Department).

You might wonder: why would this be a problem?

Well, the problem here is that the Department is a prime attribute, while the ManagerID is not a primary attribute, therefore, the table doesn’t satisfy the BCNF. In order to solve this issue and put the table in Boyce Codd Normal Form, we have to split the table into two tables, as shown below:

Try this exercise. Is this statement true or false?

For a form to be in Boyce Codd Normal Form, it as to be in the 3rd Normal Form, but doesn’t have to be in the 2nd and 1st Normal Forms.