You should also add foreign key constraints on the course_name
column in both the courses_instructors
and courses_textbooks
tables. These constraints should reference the course_name
primary key column in a courses
table (or equivalent table defining courses).
- You should be familiar with BCNF
The Fourth Normal Form (4NF) is a level of database normalization where a table is already in BCNF, and for every non-trivial multivalued dependency, the determinant must be a super key. The goal is to eliminate redundant data and maintain data consistency.
Key concepts
A Multi-valued Dependency
(MVD) is a type of dependency that exists in a relation with at least three attributes A, B, and C if, for each value of A, the set of possible values of B associated with A and the set of possible values of C associated with A are independent of each other. This means that the values of B associated with A are not determined by the values of C associated with A, and vice versa. Essentially, for a given A, you can combine any value from the set of B values with any value from the set of C values.
Multi-valued Dependency
X ->> Y
in a relation R
is considered trivial if Y
is a subset of X
or X
and Y
together make up all the attributes of the relation.
Example
We have a table courses_instructors_textbooks
with the following schema:
import { pgTable, primaryKey, varchar } from "drizzle-orm/pg-core";
export const coursesInstructorsTextbooks = pgTable("courses_instructors_textbooks", {
courseName: varchar("course_name", { length: 255 }).notNull(),
instructor: varchar("instructor", { length: 255 }).notNull(),
textbook: varchar("textbook", { length: 255 }).notNull(),
}, (t) => [
primaryKey({ columns: [t.courseName, t.instructor, t.textbook],
name: "pk_courses_instructors_textbooks",
}),
]);
- Course can be taught by multiple instructors.
- Each course can have multiple textbooks.
- The set of instructors assigned to teach a specific course is independent of the set of textbooks required for that same course. Knowing an instructor for a course tells you nothing specific about which textbook is used (beyond the list of all possible textbooks for that course), and vice-versa.
Multi-valued Dependencies
course_name ->> instructor
course_name ->> textbook
Candidate keys
course_name, instructor, textbook
is the only candidate key. We need all three to uniquely identify a row because a course can have multiple instructors and textbooks.
BCNF Analysis
The table is in BCNF because there are no non-trivial FDs where the determinant is not a superkey.
4NF Analysis
Table is not in 4NF because the multi-valued dependencies are not trivial. The determinant course_name
is not a super key.
This leads to:
- Information duplication.
- Insertion anomalies. If a new instructor or textbook needs to be added for a course, we would need to insert multiple rows to account for all the combinations of instructors and textbooks for that course.
- Deletion anomalies. If an instructor stops teaching a course, multiple rows (one for each textbook associated with the course) must be deleted.
4NF Decomposition
We need to split the table into two tables: courses_instructors
and courses_textbooks
:
import { pgTable, primaryKey, varchar } from "drizzle-orm/pg-core";
export const coursesInstructors = pgTable("courses_instructors", {
courseName: varchar("course_name", { length: 255 }).notNull(),
instructor: varchar("instructor", { length: 255 }).notNull(),
}, (t) => [
primaryKey({ columns: [t.courseName, t.instructor] }),
]);
export const coursesTextbooks = pgTable("courses_textbooks", {
courseName: varchar("course_name", { length: 255 }).notNull(),
textbook: varchar("textbook", { length: 255 }).notNull(),
}, (t) => [
primaryKey({ columns: [t.courseName, t.textbook] }),
]);
Now, the tables are free from the non-trivial multi-valued dependencies that violated 4NF in the original table. We have achieved 4NF by storing each independent relationship separately.