Mastering Slowly Changing Dimensions (SCD): A Guide Using Vendor Feeds
PermalinkWhat is Slowly Changing Dimensions (SCD)?
So, let’s say you’re dealing with a data warehouse—a massive storehouse of information that helps businesses make sense of complex data. Now, Slowly Changing Dimensions (SCD) is a method used in data warehousing to handle data that changes gradually over time. Think of it like managing a company’s history and making sure you can track how things evolve.
For instance, imagine you work in the financial domain, dealing with data feeds like Bloomberg for market data or GLEIF for Legal Entity Identifiers (LEIs). These sources are crucial because they help businesses manage key information.
Now, here's where the “slowly changing” part comes in. Vendor data doesn't change every day. For example, a company's address might change once in a while, or maybe a merger happens and the company’s name changes. But these changes need to be captured and tracked. Let’s dive right in.
PermalinkTypes of SCD and Real-Life Scenarios
Below, we’ll explore SCD Types 0, 1, 2, 3, and 4 using the following a Vendor Table structure.
PermalinkTable Structure:
CREATE TABLE VENDOR (
Vendor_ID INT PRIMARY KEY,
Vendor_Name VARCHAR(100),
Address VARCHAR(150),
LEI VARCHAR(20),
Start_Date DATE,
End_Date DATE,
Is_Current BOOLEAN,
Previous_Address VARCHAR(150)
);
For each type, we’ll cover:
Insert, Update, and Delete SQL queries.
A database table example with changes highlighted.
Pros and cons of the approach.
PermalinkSCD Type 0: Retain Original Data
PermalinkBehavior:
- Data is static—changes are ignored. No updates occur in the table.
PermalinkUse Case:
- Regulatory reports where data is captured once (e.g., baseline snapshots).
PermalinkScenario:
Vendor updates the party address from “123 Wall St” to “456 Main St”, but we don’t update the table.
PermalinkSQL Queries:
PermalinkInsert:
INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, LEI, Start_Date)
VALUES (1, 'XYZ Corp', '123 Wall St', '12345LEI', '2023-01-01');
PermalinkTable (Static Data):
Vendor_ID | Vendor_Name | Address | LEI | Start_Date |
1 | XYZ Corp | 123 Wall St | 12345LEI | 2023-01-01 |
PermalinkPros:
Simple to implement.
No overhead for tracking changes.
PermalinkCons:
- No history or updates—useful only for static reports.
PermalinkSCD Type 1: Overwrite (No History)
PermalinkBehavior:
- Changes overwrite old data. Historical data is lost.
PermalinkUse Case:
- Non-critical data where only the latest state is needed (e.g., live dashboards).
PermalinkScenario:
Vendor updates the party address from “123 Wall St” to “456 Main St”.
PermalinkSQL Queries:
PermalinkInsert:
INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, LEI, Start_Date)
VALUES (1, 'XYZ Corp', '123 Wall St', '12345LEI','2023-01-01');
PermalinkUpdate:
UPDATE Vendor
SET Address = '456 Main St'
WHERE Vendor_ID = 1;
PermalinkTable Before Update:
Vendor_ID | Vendor_Name | Address | LEI | Start_Date |
1 | XYZ Corp | 123 Wall St | 12345LEI | 2023-01-01 |
PermalinkTable After Update (Overwrite):
Vendor_ID | Vendor_Name | Address | LEI | Start_Date |
1 | XYZ Corp | 456 Main St | 12345LEI | 2023-01-01 |
PermalinkPros:
- Simple, efficient, and storage-friendly.
PermalinkCons:
- No history of changes.
PermalinkSCD Type 2: Add Historical Rows
PermalinkBehavior:
- Mechanism for tracking changes in dimension tables by preserving the history of data changes. Instead of overwriting existing data (as in Type 1), It inserts a new row for each change.
PermalinkUse Case:
- Regulatory compliance or audit trails requiring full history.
PermalinkScenario:
Vendor updates the party address from “123 Wall St” to “456 Main St”. In SCD Type-2, the existing record is marked as inactive or has an end date indicating when the record became obsolete.
PermalinkSQL Queries:
PermalinkMark old record as historical:
UPDATE Vendor
SET End_Date = '2023-06-01', Is_Current = FALSE
WHERE Vendor_ID = 1 AND Is_Current = TRUE;
PermalinkInsert new record:
INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, LEI, Start_Date, Is_Current)
VALUES (1, 'XYZ Corp', '456 Main St', '12345LEI', '2023-06-01', TRUE);
PermalinkTable After Update:
Vendor_ID | Vendor_Name | Address | LEI | Start_Date | End_Date | Is_Current |
1 | XYZ Corp | 123 Wall St | 12345LEI | 2023-01-01 | 2023-06-01 | FALSE |
1 | XYZ Corp | 456 Main St | 12345LEI | 2023-06-01 | NULL | TRUE |
PermalinkPros:
- Full historical tracking.
PermalinkCons:
- Higher storage requirements.
PermalinkSCD Type 3: Partial History Maintenance
PermalinkBehavior:
- This Allows us to track a limited amount of historical data by storing both the current and the previous value of a dimension attribute in the same row. This type is useful when you only need to retain a small number of historical values (typically the previous value) and don’t require a full history like in SCD Type 2.
PermalinkUse Case:
- Limited history (e.g., tracking only the previous state).
PermalinkScenario:
Vendor updates the party address from “123 Wall St” to “456 Main St”. SCD Type 3 updates the existing record with a new value and keeps the previous value in a separate column. Like CurrentAddress
and PreviousAddress
.
PermalinkSQL Queries:
PermalinkInsert:
INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, Previous_Address, LEI)
VALUES (1, 'XYZ Corp', '123 Wall St', NULL, '12345LEI');
PermalinkUpdate:
UPDATE Vendor
SET Previous_Address = Address,
Address = '456 Main St'
WHERE Vendor_ID = 1;
PermalinkTable After Update:
Vendor_ID | Vendor_Name | Address | Previous_Address | LEI |
1 | XYZ Corp | 456 Main St | 123 Wall St | 12345LEI |
PermalinkPros:
- Efficient for limited history.
PermalinkCons:
- Cannot track multiple changes.
PermalinkSCD Type 4: Separate Historical Table
PermalinkBehavior:
- This is a hybrid approach that involves separating the historical data from the current data. Unlike SCD Type 2, which keeps historical and current data in the same table, SCD Type 4 stores the current data in the main dimension table and moves historical data to a separate historical table.
PermalinkUse Case:
- Complex systems where current data is accessed frequently, but full history is needed.
PermalinkScenario:
Vendor updates the party address from “123 Wall St” to “456 Main St”. SCD Type 4 maintains Current Dimension Table that stores only the most recent records with current data and Historical Table that Stores all previous versions of records, with Start_Date
and End_Date
used to track when the record was valid.
PermalinkSQL Queries:
PermalinkMove old record to history table:
INSERT INTO Vendor_History (Vendor_ID, Vendor_Name, Address, LEI, Change_Date)
SELECT Vendor_ID, Vendor_Name, Address, LEI, CURRENT_DATE
FROM Vendor_Current
WHERE Vendor_ID = 1;
PermalinkUpdate current table:
UPDATE Vendor_Current
SET Address = '456 Main St'
WHERE Vendor_ID = 1;
PermalinkConclusion
SCD Types are essential for managing evolving data in systems where information changes gradually over time.
Use the right type based on your system's requirements:
Type 1 for simplicity.
Type 2 for full history.
Type 3 for limited history (store current and previous values).
Type 4 for separate historical records.