Normalization in SQL

Normalization in SQL, SQL Fundamentals, Normalization is the process to eliminate data redundancy and enhance data integrity in the table.

Normalization in SQL

Normalization: The most important thing in database designing is to make sure that the data get properly distributed among the tables. Simply we can say that the designing of the table in proper manner is called Normalization.

Normalization is a process that is used in relational database design to organize the data for minimizing the duplication. In normalization, we divide the database in two or more tables and create a relationship between them. After isolating the data we perform some addition, deletion or modification on the fields of a table then we propagate and remove the duplicate data from the related tables.

The main goals of normalization process are:

•    To eliminate the redundancy of data.

•    To make sure that the data dependencies (relationship) make sense.

By these two goals we reduce the space occupied by the duplicate data in the database tables and ensure that the data is logically stored there.

Benefits of Normalization:

There are several benefits for using Normalization in Database, they are

•    Eliminate data redundancy

•    Improve performance

•    Query optimization

•    Faster update due to less number of columns in one table

•    Index improvement

Types of Normalization:

Un-Normalized Form (UNF):

If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is something that cannot be further decomposed. A non-atomic value, as the name suggests, can be further decomposed and simplified. Consider the following table:

Emp-IdEmp-NameMonthSalesBank-IdBank-Name
E01AAJan1000B01SBI
Feb1200
Mar850
E02BBJan2200B02UTI
Feb2500
E03CCJan1700B01SBI
Feb1800
Mar1850

In the table above, there are multiple occurrences of rows under each key Emp-Id. Although considered to be the primary key, Emp-Id cannot give us the unique identification facility for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4 for E03).

First Normal Form (1st NF):

In 1st NF:

•    The table cells must be of single value.

•    Eliminate repeating groups in individual tables.

•    Create a separate table for each set of related data.

•    Identify each set of related data with a primary key.

Definition: An entity is in the first normal form if it contains no repeating groups. In relational terms, a table is in the first normal form if it contains no repeating columns. Repeating columns make your data less flexible, waste disk space, and make it more difficult to search for data. In 1NF relation the order of tuples (rows) and attributes (columns) does not matter.

Example:

OrderCustomerContactpersonTotal
1RishiManish134
2PreethiRohan545
3RishiManish1042
4RishiManish928

The above relation satisfies the properties of a relation and is said to be in first normal form (or 1NF). Conceptually it is convenient to have all the information in one relation since it is then likely to be easier to query the database.

Second Normal Form (2nd NF):

In 2nd NF:

•    Remove Partial Dependencies.

•    Functional Dependency: The value of one attribute in a table is determined entirely by the value of another.

•    Partial Dependency: A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key).

•    Create separate table with the functionally dependent data and the part of the key on which it depends. Tables created at this step will usually contain descriptions of resources.

Definition: A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.

Example:

The following relation is not in Second Normal Form:

OrderCustomerContactpersonTotal
1RishiManish134
2PreethiRohan545
3RishiManish1042
4RishiManish928

In the table above, the order number serves as the primary key. Notice that the customer and total amount are dependent upon the order number — this data is specific to each order. However, the contact person is dependent upon the customer. An alternative way to accomplish this would be to create two tables:

CustomerContactperson
RishiManish
PreethiRohan
OrderCustomerTotal
1Rishi134
2Preethi545
3Rishi1042
4Rishi928

The creation of two separate tables eliminates the dependency problem. In the first table, contact person is dependent upon the primary key — customer name. The second table only includes the information unique to each order. Someone interested in the contact person for each order could obtain this information by performing a Join Operation.

Third Normal Form (3rd NF):

In 3rd NF:

•    Remove transitive dependencies.

•    Transitive Dependency A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.

•    Create a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step will usually contain descriptions of either resources or agents. Keep a copy of the key attribute in the original file.

A relation is in third normal form, if it is in 2NF and every non-key attribute of the relation is non-transitively dependent on each candidate key of the relation.

Non-transitive dependency:

Let A, B and C be three attributes of a relation R such that Aïƒ B and Bïƒ C. From these FDs, we may derive AïƒC. This dependence Aïƒ C is transitive.

Example:

CompanyCityStateZIP
ABC Ltd.MumbaiMH10169
XYZ Ltd.NoidaUP33196
ASD Ltd.ChennaiTN21046

The above table is not in the 3NF.

In this example, the city and state are dependent upon the ZIP code. To place this table in 3NF, two separate tables would be created — one containing the company name and ZIP code and the other containing city, state, ZIP code pairings.

CompanyZIP
ABC Ltd.10169
XYZ Ltd.33196
ASD Ltd.21046
CityStateZIP
MumbaiMH10169
NoidaUP33196
ChennaiTN21046

This may seem overly complex for daily applications and indeed it may be. Database designers should always keep in mind the tradeoffs between higher level normal forms and the resource issues that complexity creates.

Boyce-Codd Normal Form (BCNF):

In BCNF:

•    When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.

•    3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.e. composite candidate keys with at least one attribute in common.

•    BCNF is based on the concept of a determinant.

•    A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.

•    A relation is in BCNF is, and only if, every determinant is a candidate key.

Definition: A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key. The difference between 3NF and BCNF is that for a functional dependency   A ïƒ B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, Whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Example:

ClientNoInterviewDateInterviewTimeStaffNoRoomNo
CR7613-may-1110:30SG5G101
CR7613-may-1112:00SG5G101
CR7413-may-1112:00SG37G102
CR5602-july-1110:30SG5G102

FD1 ClientNo, InterviewDate -> InterviewTime, StaffNo, RoomNo (Primary Key)

FD2 StaffNo, InterviewDate, InterviewTime -> ClientNo (Candidate key)

FD3 RoomNo, InterviewDate, InterviewTime -> ClientNo, StaffNo (Candidate key)

FD4 StaffNo, InterviewDate -> RoomNo (not a candidate key)

As a consequence the ClientInterview relation may suffer from update anomalies. To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and StaffRoom as shown below,

Interview (ClientNo, InterviewDate, InterviewTime, StaffNo)
StaffRoom (StaffNo, InterviewDate, RoomNo)

Interview:

ClientNoInterviewDateInterviewTimeStaffNo
CR7613-may-1110:30SG5
CR7613-may-1112:00SG5
CR7413-may-1112:00SG37
CR5602-july-1110:30SG5

StaffRoom:

StaffNoInterviewDateRoomNo
SG513-may-11G101
SG3713-may-11G102
SG502-july-11G102

BCNF Interview and StaffRoom relations.

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:

•    Have no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.

•    Many: many relationships are resolved independently.

Fourth Normal Form (4th NF):

In 4th NF:

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:

•    Have no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.

•    Fourth Normal Form applies to situations involving many-to-many relationships.

In relational databases, many-to-many relationships are expressed through cross-reference tables.

Definition: A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than one multi-valued dependency.

Example:

Take an example of Employee Table
info (Employee, Skills, Hobbies)

EmployeeSkillsHobbies
1programmingGolf
1programmingBowling
1AnalysisGolf
1AnalysisBowling
2AnalysisGolf
2AnalysisGardening
2ManagementGolf
2ManagementGardening

This table is difficult to maintain since adding a new hobby requires multiple new rows corresponding to each skill. This problem is created by the pair of multi-valued dependencies EMPLOYEE -> SKILLS and EMPLOYEE -> HOBBIES. A much better alternative would be to decompose INFO into two relations:

EmployeeSkills
1Programming
1Analysis
2Analysis
2Management
Follow me on social media: