Mastering Slowly Changing Dimensions (SCD): A Guide Using Vendor Feeds


What is Slowly Changing Dimensions (SCD)?

So, let’s say you’re dealing with a data warehouse—a massive storehouse of information that helps businesses make sense of complex data. Now, Slowly Changing Dimensions (SCD) is a method used in data warehousing to handle data that changes gradually over time. Think of it like managing a company’s history and making sure you can track how things evolve.

For instance, imagine you work in the financial domain, dealing with data feeds like Bloomberg for market data or GLEIF for Legal Entity Identifiers (LEIs). These sources are crucial because they help businesses manage key information.

Now, here's where the “slowly changing” part comes in. Vendor data doesn't change every day. For example, a company's address might change once in a while, or maybe a merger happens and the company’s name changes. But these changes need to be captured and tracked. Let’s dive right in.


Types of SCD and Real-Life Scenarios

Below, we’ll explore SCD Types 0, 1, 2, 3, and 4 using the following a Vendor Table structure.

Table Structure:

CREATE TABLE VENDOR (
    Vendor_ID INT PRIMARY KEY,
    Vendor_Name VARCHAR(100),
    Address VARCHAR(150),
    LEI VARCHAR(20),
    Start_Date DATE,
    End_Date DATE,
    Is_Current BOOLEAN,
    Previous_Address VARCHAR(150)
);

For each type, we’ll cover:

  1. Insert, Update, and Delete SQL queries.

  2. A database table example with changes highlighted.

  3. Pros and cons of the approach.


SCD Type 0: Retain Original Data

Behavior:

  • Data is static—changes are ignored. No updates occur in the table.

Use Case:

  • Regulatory reports where data is captured once (e.g., baseline snapshots).

Scenario:

Vendor updates the party address from “123 Wall St” to “456 Main St”, but we don’t update the table.

SQL Queries:

Insert:

INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, LEI, Start_Date)
VALUES (1, 'XYZ Corp', '123 Wall St', '12345LEI', '2023-01-01');

Table (Static Data):

Vendor_IDVendor_NameAddressLEIStart_Date
1XYZ Corp123 Wall St12345LEI2023-01-01

Pros:

  • Simple to implement.

  • No overhead for tracking changes.

Cons:

  • No history or updates—useful only for static reports.

SCD Type 1: Overwrite (No History)

Behavior:

  • Changes overwrite old data. Historical data is lost.

Use Case:

  • Non-critical data where only the latest state is needed (e.g., live dashboards).

Scenario:

Vendor updates the party address from “123 Wall St” to “456 Main St”.

SQL Queries:

Insert:

INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, LEI, Start_Date)
VALUES (1, 'XYZ Corp', '123 Wall St', '12345LEI','2023-01-01');

Update:

UPDATE Vendor
SET Address = '456 Main St'
WHERE Vendor_ID = 1;

Table Before Update:

Vendor_IDVendor_NameAddressLEIStart_Date
1XYZ Corp123 Wall St12345LEI2023-01-01

Table After Update (Overwrite):

Vendor_IDVendor_NameAddressLEIStart_Date
1XYZ Corp456 Main St12345LEI2023-01-01

Pros:

  • Simple, efficient, and storage-friendly.

Cons:

  • No history of changes.

SCD Type 2: Add Historical Rows

Behavior:

  • Mechanism for tracking changes in dimension tables by preserving the history of data changes. Instead of overwriting existing data (as in Type 1), It inserts a new row for each change.

Use Case:

  • Regulatory compliance or audit trails requiring full history.

Scenario:

Vendor updates the party address from “123 Wall St” to “456 Main St”. In SCD Type-2, the existing record is marked as inactive or has an end date indicating when the record became obsolete.

SQL Queries:

Mark old record as historical:

UPDATE Vendor
SET End_Date = '2023-06-01', Is_Current = FALSE
WHERE Vendor_ID = 1 AND Is_Current = TRUE;

Insert new record:

INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, LEI, Start_Date, Is_Current)
VALUES (1, 'XYZ Corp', '456 Main St', '12345LEI', '2023-06-01', TRUE);

Table After Update:

Vendor_IDVendor_NameAddressLEIStart_DateEnd_DateIs_Current
1XYZ Corp123 Wall St12345LEI2023-01-012023-06-01FALSE
1XYZ Corp456 Main St12345LEI2023-06-01NULLTRUE

Pros:

  • Full historical tracking.

Cons:

  • Higher storage requirements.

SCD Type 3: Partial History Maintenance

Behavior:

  • This Allows us to track a limited amount of historical data by storing both the current and the previous value of a dimension attribute in the same row. This type is useful when you only need to retain a small number of historical values (typically the previous value) and don’t require a full history like in SCD Type 2.

Use Case:

  • Limited history (e.g., tracking only the previous state).

Scenario:

Vendor updates the party address from “123 Wall St” to “456 Main St”. SCD Type 3 updates the existing record with a new value and keeps the previous value in a separate column. Like CurrentAddress and PreviousAddress .

SQL Queries:

Insert:

INSERT INTO Vendor (Vendor_ID, Vendor_Name, Address, Previous_Address, LEI)
VALUES (1, 'XYZ Corp', '123 Wall St', NULL, '12345LEI');

Update:

UPDATE Vendor
SET Previous_Address = Address,
    Address = '456 Main St'
WHERE Vendor_ID = 1;

Table After Update:

Vendor_IDVendor_NameAddressPrevious_AddressLEI
1XYZ Corp456 Main St123 Wall St12345LEI

Pros:

  • Efficient for limited history.

Cons:

  • Cannot track multiple changes.

SCD Type 4: Separate Historical Table

Behavior:

  • This is a hybrid approach that involves separating the historical data from the current data. Unlike SCD Type 2, which keeps historical and current data in the same table, SCD Type 4 stores the current data in the main dimension table and moves historical data to a separate historical table.

Use Case:

  • Complex systems where current data is accessed frequently, but full history is needed.

Scenario:

Vendor updates the party address from “123 Wall St” to “456 Main St”. SCD Type 4 maintains Current Dimension Table that stores only the most recent records with current data and Historical Table that Stores all previous versions of records, with Start_Date and End_Date used to track when the record was valid.

SQL Queries:

Move old record to history table:

INSERT INTO Vendor_History (Vendor_ID, Vendor_Name, Address, LEI, Change_Date)
SELECT Vendor_ID, Vendor_Name, Address, LEI, CURRENT_DATE
FROM Vendor_Current
WHERE Vendor_ID = 1;

Update current table:

UPDATE Vendor_Current
SET Address = '456 Main St'
WHERE Vendor_ID = 1;

Conclusion

  • SCD Types are essential for managing evolving data in systems where information changes gradually over time.

  • Use the right type based on your system's requirements:

    • Type 1 for simplicity.

    • Type 2 for full history.

    • Type 3 for limited history (store current and previous values).

    • Type 4 for separate historical records.