Normalization

NORMALIZATION IN SQL

Normalization: The most important thing in database designing is to make sure that the data get properly distributed among the tables. Simply we can say that the designing of the table in proper manner is called Normalization.

Normalization is a process that is used in relational database design to organize the data for minimizing the duplication. In normalization, we divide the database in two or more tables and create a relationship between them. After isolating the data we perform some addition, deletion or modification on the fields of a table then we propagate and remove the duplicate data from the related tables.

The main goals of normalization process are:

•    To eliminate the redundancy of data.

•    To make sure that the data dependencies (relationship) make sense.

By these two goals we reduce the space occupied by the duplicate data in the database tables and ensure that the data is logically stored there.

Benefits of Normalization:

There are several benefits for using Normalization in Database, they are

•    Eliminate data redundancy

•    Improve performance

•    Query optimization

•    Faster update due to less number of columns in one table

•    Index improvement

Types of Normalization:

Un-Normalized Form (UNF):

If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is something that cannot be further decomposed. A non-atomic value, as the name suggests, can be further decomposed and simplified. Consider the following table:



Emp-Id
Emp-Name
Month
Sales
Bank-Id
Bank-Name
E01
AA
Jan

1000
B01
SBI


Feb

1200




Mar

850


E02
BB
Jan

2200
B02
UTI


Feb

2500


E03
CC
Jan

1700
B01
SBI


Feb

1800




Mar

1850
















In the table above, there are multiple occurrences of rows under each key Emp-Id. Although considered to be the primary key, Emp-Id cannot give us the unique identification facility for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4 for E03).

First Normal Form (1st NF):

In 1st NF:

•    The table cells must be of single value.

•    Eliminate repeating groups in individual tables.

•    Create a separate table for each set of related data.

•    Identify each set of related data with a primary key.

Definition: An entity is in the first normal form if it contains no repeating groups. In relational terms, a table is in the first normal form if it contains no repeating columns. Repeating columns make your data less flexible, waste disk space, and make it more difficult to search for data. In 1NF relation the order of tuples (rows) and attributes (columns) does not matter.

Example:



Order
Customer
Contactperson
Total
1
Rishi
Manish
134
2
Preethi
Rohan
545
3
Rishi
Manish
1042
4
Rishi
Manish
928
  
The above relation satisfies the properties of a relation and is said to be in first normal form (or 1NF). Conceptually it is convenient to have all the information in one relation since it is then likely to be easier to query the database.

Second Normal Form (2nd NF):

In 2nd NF:

•    Remove Partial Dependencies.

•    Functional Dependency: The value of one attribute in a table is determined entirely by the value of another.

•    Partial Dependency: A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key).

•    Create separate table with the functionally dependent data and the part of the key on which it depends. Tables created at this step will usually contain descriptions of resources.

Definition: A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.

Example:

The following relation is not in Second Normal Form:
 




Order
Customer
Contactperson
Total
1
Rishi
Manish
134
2
Preethi
Rohan
545
3
Rishi
Manish
1042
4
Rishi
Manish
928
  
In the table above, the order number serves as the primary key. Notice that the customer and total amount are dependent upon the order number -- this data is specific to each order. However, the contact person is dependent upon the customer. An alternative way to accomplish this would be to create two tables:



Customer
Contactperson
Rishi
Manish
Preethi
Rohan



Order
Customer
Total
1
Rishi
134
2
Preethi
545
3
Rishi
1042
4
Rishi
928
  
The creation of two separate tables eliminates the dependency problem. In the first table, contact person is dependent upon the primary key -- customer name. The second table only includes the information unique to each order. Someone interested in the contact person for each order could obtain this information by performing a Join Operation.

Third Normal Form (3rd NF):

In 3rd NF:

•    Remove transitive dependencies.

•    Transitive Dependency A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.

•    Create a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step will usually contain descriptions of either resources or agents. Keep a copy of the key attribute in the original file.

A relation is in third normal form, if it is in 2NF and every non-key attribute of the relation is non-transitively dependent on each candidate key of the relation.

Non-transitive dependency:

Let A, B and C be three attributes of a relation R such that Aïƒ B and Bïƒ C. From these FDs, we may derive AïƒC. This dependence Aïƒ C is transitive.

Example:
Company
City
State
ZIP
ABC Ltd.
Mumbai
MH
10169
XYZ Ltd.
Noida
UP
33196
ASD Ltd.
Chennai
TN
21046
  
The above table is not in the 3NF.

In this example, the city and state are dependent upon the ZIP code. To place this table in 3NF, two separate tables would be created -- one containing the company name and ZIP code and the other containing city, state, ZIP code pairings.



Company
ZIP
ABC Ltd.
10169
XYZ Ltd.
33196
ASD Ltd.
21046

City
State
ZIP
Mumbai
MH
10169
Noida
UP
33196
Chennai
TN
21046
  
This may seem overly complex for daily applications and indeed it may be. Database designers should always keep in mind the tradeoffs between higher level normal forms and the resource issues that complexity creates.

Boyce-Codd Normal Form (BCNF):

In BCNF:

•    When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.

•    3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.e. composite candidate keys with at least one attribute in common.

•    BCNF is based on the concept of a determinant.

•    A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.

•    A relation is in BCNF is, and only if, every determinant is a candidate key.

Definition: A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key. The difference between 3NF and BCNF is that for a functional dependency   A ïƒ B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, Whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Example:

ClientNo
InterviewDate
InterviewTime
StaffNo
RoomNo
CR76
13-may-11
10:30
SG5
G101
CR76
13-may-11
12:00
SG5
G101
CR74
13-may-11
12:00
SG37
G102
CR56
02-july-11
10:30
SG5
G102
FD1 ClientNo, InterviewDate -> InterviewTime, StaffNo, RoomNo (Primary Key)

FD2 StaffNo, InterviewDate, InterviewTime -> ClientNo (Candidate key)

FD3 RoomNo, InterviewDate, InterviewTime -> ClientNo, StaffNo (Candidate key)

FD4 StaffNo, InterviewDate -> RoomNo (not a candidate key)

As a consequence the ClientInterview relation may suffer from update anomalies. To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and StaffRoom as shown below,

Interview (ClientNo, InterviewDate, InterviewTime, StaffNo)
StaffRoom (StaffNo, InterviewDate, RoomNo)

Interview:

ClientNo
InterviewDate
InterviewTime
StaffNo
CR76
13-may-11
10:30
SG5
CR76
13-may-11
12:00
SG5
CR74
13-may-11
12:00
SG37
CR56
02-july-11
10:30
SG5

StaffRoom:

StaffNo
InterviewDate
RoomNo
SG5
13-may-11
G101
SG37
13-may-11
G102
SG5
02-july-11
G102

BCNF Interview and StaffRoom relations.

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:

•    Have no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.

•    Many: many relationships are resolved independently.

Fourth Normal Form (4th NF):

In 4th NF:

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:

•    Have no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.

•    Fourth Normal Form applies to situations involving many-to-many relationships.

In relational databases, many-to-many relationships are expressed through cross-reference tables.

Definition: A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than one multi-valued dependency.

Example:

Take an example of Employee Table
info (Employee, Skills, Hobbies)

Employee
Skills
Hobbies
1
programming
Golf
1
programming
Bowling
1
Analysis
Golf
1
Analysis
Bowling
2
Analysis
Golf
2
Analysis
Gardening
2
Management
Golf
2
Management
Gardening
 
This table is difficult to maintain since adding a new hobby requires multiple new rows corresponding to each skill. This problem is created by the pair of multi-valued dependencies EMPLOYEE -> SKILLS and EMPLOYEE -> HOBBIES. A much better alternative would be to decompose INFO into two relations:



Employee
Skills
1
Programming
1
Analysis
2
Analysis
2
Management

0 comments:

Post a Comment