Drizzle | Fourth Normal Form (4NF)
This guide assumes familiarity with:
  • You should be familiar with BCNF

The Fourth Normal Form (4NF) is a level of database normalization where a table is already in BCNF, and for every non-trivial multivalued dependency, the determinant must be a super key. The goal is to eliminate redundant data and maintain data consistency.

Key concepts

A Multi-valued Dependency (MVD) is a type of dependency that exists in a relation with at least three attributes A, B, and C if, for each value of A, the set of possible values of B associated with A and the set of possible values of C associated with A are independent of each other. This means that the values of B associated with A are not determined by the values of C associated with A, and vice versa. Essentially, for a given A, you can combine any value from the set of B values with any value from the set of C values.

Multi-valued Dependency X ->> Y in a relation R is considered trivial if Y is a subset of X or X and Y together make up all the attributes of the relation.

Example

We have a table courses_instructors_textbooks with the following schema:

schema.ts
schema.sql
schema.txt
import { pgTable, primaryKey, varchar } from "drizzle-orm/pg-core";

export const coursesInstructorsTextbooks = pgTable("courses_instructors_textbooks", {
  courseName: varchar("course_name", { length: 255 }).notNull(),
  instructor: varchar("instructor", { length: 255 }).notNull(),
  textbook: varchar("textbook", { length: 255 }).notNull(),
}, (t) => [
  primaryKey({ columns: [t.courseName, t.instructor, t.textbook],
    name: "pk_courses_instructors_textbooks",
   }),
]);
  1. Course can be taught by multiple instructors.
  2. Each course can have multiple textbooks.
  3. The set of instructors assigned to teach a specific course is independent of the set of textbooks required for that same course. Knowing an instructor for a course tells you nothing specific about which textbook is used (beyond the list of all possible textbooks for that course), and vice-versa.

Multi-valued Dependencies

  1. course_name ->> instructor
  2. course_name ->> textbook

Candidate keys

course_name, instructor, textbook is the only candidate key. We need all three to uniquely identify a row because a course can have multiple instructors and textbooks.

BCNF Analysis

The table is in BCNF because there are no non-trivial FDs where the determinant is not a superkey.

4NF Analysis

Table is not in 4NF because the multi-valued dependencies are not trivial. The determinant course_name is not a super key.

This leads to:

  1. Information duplication.
  2. Insertion anomalies. If a new instructor or textbook needs to be added for a course, we would need to insert multiple rows to account for all the combinations of instructors and textbooks for that course.
  3. Deletion anomalies. If an instructor stops teaching a course, multiple rows (one for each textbook associated with the course) must be deleted.

4NF Decomposition

We need to split the table into two tables: courses_instructors and courses_textbooks:

schema.ts
schema.sql
schema.txt
import { pgTable, primaryKey, varchar } from "drizzle-orm/pg-core";

export const coursesInstructors = pgTable("courses_instructors", {
  courseName: varchar("course_name", { length: 255 }).notNull(),
  instructor: varchar("instructor", { length: 255 }).notNull(),
}, (t) => [
  primaryKey({ columns: [t.courseName, t.instructor] }),
]);

export const coursesTextbooks = pgTable("courses_textbooks", {
  courseName: varchar("course_name", { length: 255 }).notNull(),
  textbook: varchar("textbook", { length: 255 }).notNull(),
}, (t) => [
  primaryKey({ columns: [t.courseName, t.textbook] }),
]);
IMPORTANT

You should also add foreign key constraints on the course_name column in both the courses_instructors and courses_textbooks tables. These constraints should reference the course_name primary key column in a courses table (or equivalent table defining courses).

Now, the tables are free from the non-trivial multi-valued dependencies that violated 4NF in the original table. We have achieved 4NF by storing each independent relationship separately.