Tuesday, August 3, 2010

Relational Database Normalization Process

Illogically or inconsistently stored data can cause a number of problems. In a relational database, a logical and efficient design is just as critical. A poorly designed database may provide erroneous information, may be difficult to use, or may even ill to work properly.

Most of these problems are the result of two bad design features called: redundant data and anomalies. Redundant data is unnecessary reoccurring data (repeating groups of data). Anomalies are any occurrence that weakens the integrity of your data due to irregular or inconsistent storage (delete, insert
and update irregularity, that generates the inconsistent data).

Basically, normalisation is the process of efficiently organising data in a database. There are two main objectives of the normalization process: eliminate redundant data (storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table).
Both of these are valuable goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The process of designing a relational database includes making sure that a table contains only data directly related to the primary key, that each data field contains only one item of data, and that redundant
(duplicated and unnecessary) data is eliminated. The task of a database designer is to structure the data in a way that eliminates unnecessary duplication(s) and provides a rapid search path to all necessary information. This process of specifying and defining tables, keys, columns, and relationships in order
to create an efficient database is called normalization.

Normalisation is part of successful database design.


Without normalisation, database systems can be inaccurate, slow, and inefficient and they might not produce the data you expect.


We use the normalization process to design efficient and functional databases. By normalizing, we store data where it logically and uniquely belongs. The normalization process involves a few steps and each step is called a form. Forms range from the first normal form (1NF) to fifth normal form (5NF). There is also one higher level, called domain key normal form (DK/NF) and BCNF.

When normalising a database you should achieve four goals:

  1. Arranging data into logical groups such that each group describes a small part of the whole
  2. Minimizing the amount of duplicated data stored in a database
  3. Building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data storage
  4. Organising the data such that, when you modify it, you make the changes in only one place
Normalization is a complex process with many specific rules and different intensity levels. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.

A properly normalised design allows you to:

  • Use storage space efficiently
  • Eliminate redundant data
  • Reduce or eliminate inconsistent data
  • Ease the database maintenance burden
A bad database design usually include:

  • Repetition of information
  • Inability to represent certain information
  • Loss of information
  • Difficulty to maintain information
When you normalise a database, you start from the general and work towards the specific, applying certain tests (checks) along the way. Some users call this process  decomposition. It means decomposing (dividing/breaking down) a ‘big’ un-normalise table (file) into several smaller tables by:

  • Eliminating insertion, update and delete anomalies
  • Establishing functional dependencies
  • Removing transitive dependencies
  • Reducing non-key data redundancy


Un-normalised data

Un-normalised data have repeating groups, inconsistent data, delete and insert anomalies.
The following table is un-normalized:

Un-normalised data
Project Color Size
P1 Red Large
P1 Green Large
P1 Red Medium
P1 Green Medium
P1 Red Small
P1 Green Small
P2 Black Large
P2 Black Small


1NF meets the following criteria:

  1. ELIMINATE REPEATING GROUPS (make a separate table for each set of related attributes and give each table a primary key).
  2. A table is in 1NF if there are no duplicated rows in the table.
  3. Each table contains all atomic data items, no repeating groups, and a designated primary key (no duplicated rows)
The following tables are in 1NF of above given un-normalized data :

Project Table
ProjectID Project Color
1 P1 Red
2 P1 Green
3 P2 Black


Detail Table
DetailId ProjectId Size
1 1 Large
2 2 Large
3 1 Medium
4 2 Medium
5 1 Small
6 2 Small
7 3 Large
8 3 Small



Second Normal Form (2NF)


ELIMINATE REDUNDANT DATA (if an attribute depends on only part of multi-valued key, remove it to a separate table) . A table is in 2NF if it met all database requirements for 1NF, and if each non-key attribute is fully functionally dependent on the whole primary key;
2NF meets the following criteria:
  1. Each table contains all atomic data items, no repeating groups, and a designated primary key (no duplicated rows)
  2. Each table has all non-primary key attributes fully functionally dependant on the whole primary key
Project Table
ProjectID Project Color
1 P1 Red
2 P1 Green
3 P2 Black


Size Table
SizeID Size
1 Large
2 Medium
3 Small


Detail Table
DetailId ProjectId SizeID
1 1 1
2 2 1
3 1 2
4 2 2
5 1 3
6 2 3
7 3 1
8 3 3



Third Normal Form (3NF)


ELIMINATE COLUMNS NOT DEPENDANT ON KEY (if attributes do not contribute to a description of the key remove them to a separate table) . A table is in 3NF if it met all database requirements for both 1NF and 2NF, and if all transitive dependencies are eliminated (each column must depend directly on the primary key; all attributes that are not dependant upon the primary key must be eliminated (e.g. attributes that can be derived from data contained in other fields and tables must be removed)).
3NF meets the following criteria:
  1. Each table contains all-atomic data items, no repeating groups, and a designated primary key
  2. Each table has all non-primary key attributes fully functionally dependant on the whole primary key
  3. All transitive dependencies are removed from each table
Project Table
ProjectID Project
1 P1
2 P2


Color Table
ColorID Color
1 Red
2 Green
3 Black


Size Table
SizeID Size
1 Large
2 Medium
3 Small


Detail Table
DetailId ProjectId ColorId SizeId
1 1 1 1
2 1 2 1
3 1 1 2
4 1 2 2
5 1 1 3
6 1 2 3
7 2 3 1
8 2 3 3



Fourth Normal Form (4NF)


ISOLATE INDEPENDENT MULTIPLE RELATIONSHIPS (no table may contain two or more 1:M or M:M relationships that are not directly related) . This applies to only 1:M and M:M relationships. To be in Fourth Normal Form (4NF).
Project Table
ProjectID Project
1 P1
2 P2



Color Table
ColorID Color
1 Red
2 Green
3 Black


Size Table
SizeID Size
1 Large
2 Medium
3 Small


DetailProjectColor
DPCID ProjectId ColorId
1 1 1
2 1 2
3 2 3


DetailProjectSize
DPSID DPCID SizeId
1 1 1
2 2 1
3 1 2
4 2 2
5 1 3
6 2 3
7 3 1
8 3 3



Fifth Normal Form (5NF)


The normal forms discussed so far required that the given relation R if not in the given normal form be decomposed in two relations to meet the requirements of the normal form. In some rare cases, a relation can have problems like redundant information and update anomalies because of it but cannot be decomposed in two relations to remove the problems. In such cases it may be possible to decompose the relation in three or more relations using the 5NF.
The fifth normal form deals with join-dependencies which is a generalisation of the MVD. The aim of fifth normal form is to have relations that cannot be decomposed further. A relation in 5NF cannot be constructed from several smaller relations.
A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies at least one of the following holds.
(a) (R1, R2, ..., Rn) is a trivial join-dependency (that is, one of Ri is R) (b) Every Ri is a candidate key for R.
An example of 5NF can be provided by the example below.


Project Table
ProjectID Project
1 P1
2 P2


Color Table
ColorID Color
1 Red
2 Green
3 Black


Size Table
SizeID Size
1 Large
2 Medium
3 Small


DetailProjectColor
DPCID ProjectId ColorId
1 1 1
2 1 2
3 2 3


DetailProjectSize
DPSID ProjectId SizeId
1 1 1
2 1 2
3 1 3
4 2 1
5 2 3


Detail Table
DetailId DPCID DPSID
1 1 1
2 2 1
3 1 2
4 2 2
5 1 3
6 2 3
7 3 4
8 3 5



==============================================
Maulik Bhatt |Senior QA Engineer

WebMingle Technology
Accelerated by knowledge. Driven by values.
www.webMingle.in

1 comment: