NORMALIZATION IN SQL

NORMALIZATION IN SQL

NORMALIZATION IN SQL

Normalization: The most important thing in database designing is to make sure that the data get properly distributed among the tables. Simply we can say that the designing of the table in proper manner is called Normalization.

Normalization is a process that is used in relational database design to organize the data for minimizing the duplication. In normalization, we divide the database in two or more tables and create a relationship between them. After isolating the data we perform some addition, deletion or modification on the fields of a table then we propagate and remove the duplicate data from the related tables.

The main goals of normalization process are:

•    To eliminate the redundancy of data.

•    To make sure that the data dependencies (relationship) make sense.

By these two goals we reduce the space occupied by the duplicate data in the database tables and ensure that the data is logically stored there.

Benefits of Normalization:

There are several benefits for using Normalization in Database, they are

•    Eliminate data redundancy

•    Improve performance

•    Query optimization

•    Faster update due to less number of columns in one table

•    Index improvement

Types of Normalization:

Un-Normalized Form (UNF):

If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is something that cannot be further decomposed. A non-atomic value, as the name suggests, can be further decomposed and simplified. Consider the following table:

Emp-Id Emp-Name Month Sales Bank-Id Bank-Name
E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850

In the table above, there are multiple occurrences of rows under each key Emp-Id. Although considered to be the primary key, Emp-Id cannot give us the unique identification facility for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4 for E03).

First Normal Form (1st NF):

In 1st NF:

•    The table cells must be of single value.

•    Eliminate repeating groups in individual tables.

•    Create a separate table for each set of related data.

•    Identify each set of related data with a primary key.

Definition: An entity is in the first normal form if it contains no repeating groups. In relational terms, a table is in the first normal form if it contains no repeating columns. Repeating columns make your data less flexible, waste disk space, and make it more difficult to search for data. In 1NF relation the order of tuples (rows) and attributes (columns) does not matter.

Example:

Order Customer Contactperson Total
1 Rishi Manish 134
2 Preethi Rohan 545
3 Rishi Manish 1042
4 Rishi Manish 928

The above relation satisfies the properties of a relation and is said to be in first normal form (or 1NF). Conceptually it is convenient to have all the information in one relation since it is then likely to be easier to query the database.

Second Normal Form (2nd NF):

In 2nd NF:

•    Remove Partial Dependencies.

•    Functional Dependency: The value of one attribute in a table is determined entirely by the value of another.

•    Partial Dependency: A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key).

•    Create separate table with the functionally dependent data and the part of the key on which it depends. Tables created at this step will usually contain descriptions of resources.

Definition: A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.

Example:

The following relation is not in Second Normal Form:

Order Customer Contactperson Total
1 Rishi Manish 134
2 Preethi Rohan 545
3 Rishi Manish 1042
4 Rishi Manish 928

In the table above, the order number serves as the primary key. Notice that the customer and total amount are dependent upon the order number — this data is specific to each order. However, the contact person is dependent upon the customer. An alternative way to accomplish this would be to create two tables:

Customer Contactperson
Rishi Manish
Preethi Rohan
Order Customer Total
1 Rishi 134
2 Preethi 545
3 Rishi 1042
4 Rishi 928

 

The creation of two separate tables eliminates the dependency problem. In the first table, contact person is dependent upon the primary key — customer name. The second table only includes the information unique to each order. Someone interested in the contact person for each order could obtain this information by performing a Join Operation.

Third Normal Form (3rd NF):

In 3rd NF:

•    Remove transitive dependencies.

•    Transitive Dependency A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.

•    Create a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step will usually contain descriptions of either resources or agents. Keep a copy of the key attribute in the original file.

A relation is in third normal form, if it is in 2NF and every non-key attribute of the relation is non-transitively dependent on each candidate key of the relation.

Non-transitive dependency:

Let A, B and C be three attributes of a relation R such that Aïƒ B and Bïƒ C. From these FDs, we may derive AïƒC. This dependence Aïƒ C is transitive.

Example:

Company City State ZIP
ABC Ltd. Mumbai MH 10169
XYZ Ltd. Noida UP 33196
ASD Ltd. Chennai TN 21046

 

The above table is not in the 3NF.

In this example, the city and state are dependent upon the ZIP code. To place this table in 3NF, two separate tables would be created — one containing the company name and ZIP code and the other containing city, state, ZIP code pairings.

Company ZIP
ABC Ltd. 10169
XYZ Ltd. 33196
ASD Ltd. 21046
City State ZIP
Mumbai MH 10169
Noida UP 33196
Chennai TN 21046

 

This may seem overly complex for daily applications and indeed it may be. Database designers should always keep in mind the tradeoffs between higher level normal forms and the resource issues that complexity creates.

Boyce-Codd Normal Form (BCNF):

In BCNF:

•    When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.

•    3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.e. composite candidate keys with at least one attribute in common.

•    BCNF is based on the concept of a determinant.

•    A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.

•    A relation is in BCNF is, and only if, every determinant is a candidate key.

Definition: A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key. The difference between 3NF and BCNF is that for a functional dependency   A ïƒ B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, Whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Example:

ClientNo InterviewDate InterviewTime StaffNo RoomNo
CR76 13-may-11 10:30 SG5 G101
CR76 13-may-11 12:00 SG5 G101
CR74 13-may-11 12:00 SG37 G102
CR56 02-july-11 10:30 SG5 G102

FD1 ClientNo, InterviewDate -> InterviewTime, StaffNo, RoomNo (Primary Key)

FD2 StaffNo, InterviewDate, InterviewTime -> ClientNo (Candidate key)

FD3 RoomNo, InterviewDate, InterviewTime -> ClientNo, StaffNo (Candidate key)

FD4 StaffNo, InterviewDate -> RoomNo (not a candidate key)

As a consequence the ClientInterview relation may suffer from update anomalies. To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and StaffRoom as shown below,

Interview (ClientNo, InterviewDate, InterviewTime, StaffNo)
StaffRoom (StaffNo, InterviewDate, RoomNo)

Interview:

ClientNo InterviewDate InterviewTime StaffNo
CR76 13-may-11 10:30 SG5
CR76 13-may-11 12:00 SG5
CR74 13-may-11 12:00 SG37
CR56 02-july-11 10:30 SG5

StaffRoom:

StaffNo InterviewDate RoomNo
SG5 13-may-11 G101
SG37 13-may-11 G102
SG5 02-july-11 G102

BCNF Interview and StaffRoom relations.

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:

•    Have no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.

•    Many: many relationships are resolved independently.

Fourth Normal Form (4th NF):

In 4th NF:

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:

•    Have no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.

•    Fourth Normal Form applies to situations involving many-to-many relationships.

In relational databases, many-to-many relationships are expressed through cross-reference tables.

Definition: A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than one multi-valued dependency.

Example:

Take an example of Employee Table
info (Employee, Skills, Hobbies)

Employee Skills Hobbies
1 programming Golf
1 programming Bowling
1 Analysis Golf
1 Analysis Bowling
2 Analysis Golf
2 Analysis Gardening
2 Management Golf
2 Management Gardening

 

This table is difficult to maintain since adding a new hobby requires multiple new rows corresponding to each skill. This problem is created by the pair of multi-valued dependencies EMPLOYEE -> SKILLS and EMPLOYEE -> HOBBIES. A much better alternative would be to decompose INFO into two relations:

Employee Skills
1 Programming
1 Analysis
2 Analysis
2 Management

 

(1)

You may also like...

Leave a Reply