Normalization is a database design technique, which is used to design a
relational database table up to higher normal form. The process is progressive, and a higher level of database normalization cannot be achieved unless the previous levels have been satisfied. That means that, having data in
unnormalized form (the least normalized) and aiming to achieve the highest level of normalization, the first step would be to ensure compliance to
first normal form, the second step would be to ensure
second normal form is satisfied, and so forth in order mentioned above, until the data conforms to
sixth normal form. However, normal forms beyond
4NF are mainly of academic interest, as the problems they exist to solve rarely appear in practice.
The data in the following example was intentionally designed to contradict most of the normal forms. In practice it is often possible to skip some of the normalization steps because the data is already normalized to some extent. Fixing a violation of one normal form also often fixes a violation of a higher normal form. In the example, one table has been chosen for normalization at each step, meaning that at the end, some tables might not be sufficiently normalized. Initial data Let a database table exist with the following structure: This table is in
4NF, but the Supplier ID is equal to the join of its projections:
{{Supplier ID, Title}, {Title, Franchisee ID}, {Franchisee ID, Supplier ID}}. No component of that join dependency is a
superkey (the sole superkey being the entire heading), so the table does not satisfy the
ETNF and can be further decomposed:
Satisfying DKNF Let's have a look at the
Book table from previous examples and see if it satisfies the
domain-key normal form: Logically,
Thickness is determined by number of pages. That means it depends on
Pages which is not a key. Let's set an example convention saying a book up to 350 pages is considered "slim" and a book over 350 pages is considered "thick". This convention is technically a constraint but it is neither a domain constraint nor a key constraint; therefore we cannot rely on domain constraints and key constraints to keep the data integrity. In other words – nothing prevents us from putting, for example, "Thick" for a book with only 50 pages – and this makes the table violate
DKNF. To solve this, a table holding enumeration that defines the
Thickness is created, and that column is removed from the original table: That way, the domain integrity violation has been eliminated, and the table is in
DKNF. Normalization does not prevent all cases of impossible/conflicting/unpredictable output. In this example, Min/Max pages of 1/350, 200/999,999,999,999 would lead to unpredictable results. It would therefore be better to specify and use only Min pages.
Satisfying 6NF A simple and intuitive definition of the
sixth normal form is that
"a table is in 6NF when the row contains the Primary Key, and at most one other attribute". That means, for example, the
Publisher table designed while
creating the 1NF: needs to be further decomposed into two tables: |} The obvious drawback of 6NF is the proliferation of tables required to represent the information on a single entity. If a table in 5NF has one primary key column and N attributes, representing the same information in 6NF will require N tables; multi-field updates to a single conceptual record will require updates to multiple tables; and inserts and deletes will similarly require operations across multiple tables. For this reason, in databases intended to serve
online transaction processing (OLTP) needs, 6NF should not be used. However, in
data warehouses, which do not permit interactive updates and which are specialized for fast query on large data volumes, certain DBMSs use an internal 6NF representation – known as a
columnar data store. In situations where the number of unique values of a column is far less than the number of rows in the table, column-oriented storage allow significant savings in space through data compression. Columnar storage also allows fast execution of range queries (e.g., show all records where a particular column is between X and Y, or less than X.) In all these cases, however, the database designer does not have to perform 6NF normalization manually by creating separate tables. Some DBMSs that are specialized for warehousing, such as
Sybase IQ, use columnar storage by default, but the designer still sees only a single multi-column table. Other DBMSs, such as Microsoft SQL Server 2012 and later, let you specify a "columnstore index" for a particular table. == See also ==