WebMingle: Relational Database Normalization Process

Illogically or inconsistently stored data can cause a number of problems. In a relational database, a logical and efficient design is just as critical. A poorly designed database may provide erroneous information, may be difficult to use, or may even ill to work properly.

Most of these problems are the result of two bad design features called: redundant data and anomalies. Redundant data is unnecessary reoccurring data (repeating groups of data). Anomalies are any occurrence that weakens the integrity of your data due to irregular or inconsistent storage (delete, insert
and update irregularity, that generates the inconsistent data).

Basically, normalisation is the process of efficiently organising data in a database. There are two main objectives of the normalization process: eliminate redundant data (storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table).
Both of these are valuable goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The process of designing a relational database includes making sure that a table contains only data directly related to the primary key, that each data field contains only one item of data, and that redundant
(duplicated and unnecessary) data is eliminated. The task of a database designer is to structure the data in a way that eliminates unnecessary duplication(s) and provides a rapid search path to all necessary information. This process of specifying and defining tables, keys, columns, and relationships in order
to create an efficient database is called normalization.

Normalisation is part of successful database design.

Without normalisation, database systems can be inaccurate, slow, and inefficient and they might not produce the data you expect.

We use the normalization process to design efficient and functional databases. By normalizing, we store data where it logically and uniquely belongs. The normalization process involves a few steps and each step is called a form. Forms range from the first normal form (1NF) to fifth normal form (5NF). There is also one higher level, called domain key normal form (DK/NF) and BCNF.

When normalising a database you should achieve four goals:

Arranging data into logical groups such that each group describes a small part of the whole
Minimizing the amount of duplicated data stored in a database
Building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data storage
Organising the data such that, when you modify it, you make the changes in only one place

Normalization is a complex process with many specific rules and different intensity levels. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.

A properly normalised design allows you to:

Use storage space efficiently
Eliminate redundant data
Reduce or eliminate inconsistent data
Ease the database maintenance burden

A bad database design usually include:

Repetition of information
Inability to represent certain information
Loss of information
Difficulty to maintain information

When you normalise a database, you start from the general and work towards the specific, applying certain tests (checks) along the way. Some users call this process decomposition. It means decomposing (dividing/breaking down) a ‘big’ un-normalise table (file) into several smaller tables by:

Eliminating insertion, update and delete anomalies
Establishing functional dependencies
Removing transitive dependencies
Reducing non-key data redundancy

Un-normalised data

Un-normalised data have repeating groups, inconsistent data, delete and insert anomalies.
The following table is un-normalized:

Un-normalised data
Project	Color	Size
P1	Red	Large
P1	Green	Large
P1	Red	Medium
P1	Green	Medium
P1	Red	Small
P1	Green	Small
P2	Black	Large
P2	Black	Small

1NF meets the following criteria:

ELIMINATE REPEATING GROUPS (make a separate table for each set of related attributes and give each table a primary key).
A table is in 1NF if there are no duplicated rows in the table.
Each table contains all atomic data items, no repeating groups, and a designated primary key (no duplicated rows)

The following tables are in 1NF of above given un-normalized data :

Project Table
ProjectID	Project	Color
1	P1	Red
2	P1	Green
3	P2	Black

Detail Table
DetailId	ProjectId	Size
1	1	Large
2	2	Large
3	1	Medium
4	2	Medium
5	1	Small
6	2	Small
7	3	Large
8	3	Small

Second Normal Form (2NF)

ELIMINATE REDUNDANT DATA (if an attribute depends on only part of multi-valued key, remove it to a separate table) . A table is in 2NF if it met all database requirements for 1NF, and if each non-key attribute is fully functionally dependent on the whole primary key;
2NF meets the following criteria:

Each table contains all atomic data items, no repeating groups, and a designated primary key (no duplicated rows)
Each table has all non-primary key attributes fully functionally dependant on the whole primary key

Project Table
ProjectID	Project	Color
1	P1	Red
2	P1	Green
3	P2	Black

Size Table
SizeID	Size
1	Large
2	Medium
3	Small

Detail Table
DetailId	ProjectId	SizeID
1	1	1
2	2	1
3	1	2
4	2	2
5	1	3
6	2	3
7	3	1
8	3	3

Third Normal Form (3NF)

ELIMINATE COLUMNS NOT DEPENDANT ON KEY (if attributes do not contribute to a description of the key remove them to a separate table) . A table is in 3NF if it met all database requirements for both 1NF and 2NF, and if all transitive dependencies are eliminated (each column must depend directly on the primary key; all attributes that are not dependant upon the primary key must be eliminated (e.g. attributes that can be derived from data contained in other fields and tables must be removed)).
3NF meets the following criteria:

Each table contains all-atomic data items, no repeating groups, and a designated primary key
Each table has all non-primary key attributes fully functionally dependant on the whole primary key
All transitive dependencies are removed from each table

Project Table
ProjectID	Project
1	P1
2	P2

Color Table
ColorID	Color
1	Red
2	Green
3	Black

Size Table
SizeID	Size
1	Large
2	Medium
3	Small

Detail Table
DetailId	ProjectId	ColorId	SizeId
1	1	1	1
2	1	2	1
3	1	1	2
4	1	2	2
5	1	1	3
6	1	2	3
7	2	3	1
8	2	3	3

Fourth Normal Form (4NF)

ISOLATE INDEPENDENT MULTIPLE RELATIONSHIPS (no table may contain two or more 1:M or M:M relationships that are not directly related) . This applies to only 1:M and M:M relationships. To be in Fourth Normal Form (4NF).

Project Table
ProjectID	Project
1	P1
2	P2

Color Table
ColorID	Color
1	Red
2	Green
3	Black

Size Table
SizeID	Size
1	Large
2	Medium
3	Small

DetailProjectColor
DPCID	ProjectId	ColorId
1	1	1
2	1	2
3	2	3

DetailProjectSize
DPSID	DPCID	SizeId
1	1	1
2	2	1
3	1	2
4	2	2
5	1	3
6	2	3
7	3	1
8	3	3

Fifth Normal Form (5NF)

The normal forms discussed so far required that the given relation R if not in the given normal form be decomposed in two relations to meet the requirements of the normal form. In some rare cases, a relation can have problems like redundant information and update anomalies because of it but cannot be decomposed in two relations to remove the problems. In such cases it may be possible to decompose the relation in three or more relations using the 5NF.
The fifth normal form deals with join-dependencies which is a generalisation of the MVD. The aim of fifth normal form is to have relations that cannot be decomposed further. A relation in 5NF cannot be constructed from several smaller relations.
A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies at least one of the following holds.
(a) (R1, R2, ..., Rn) is a trivial join-dependency (that is, one of Ri is R) (b) Every Ri is a candidate key for R.
An example of 5NF can be provided by the example below.

Project Table
ProjectID	Project
1	P1
2	P2

Color Table
ColorID	Color
1	Red
2	Green
3	Black

Size Table
SizeID	Size
1	Large
2	Medium
3	Small

DetailProjectColor
DPCID	ProjectId	ColorId
1	1	1
2	1	2
3	2	3

DetailProjectSize
DPSID	ProjectId	SizeId
1	1	1
2	1	2
3	1	3
4	2	1
5	2	3

Detail Table
DetailId	DPCID	DPSID
1	1	1
2	2	1
3	1	2
4	2	2
5	1	3
6	2	3
7	3	4
8	3	5

==============================================
Maulik Bhatt |Senior QA Engineer

WebMingle Technology
Accelerated by knowledge. Driven by values.
www.webMingle.in

WebMingle

Tuesday, August 3, 2010

Relational Database Normalization Process