Tuesday, 22 October 2013

Implementing SCD type 2 logic when loading a dimension table

To know about SCD, read this:

http://en.wikipedia.org/wiki/Slowly_changing_dimension


Consider the case where we have to implement SCD type 2 on an employee dimension.


The employee attributes are: code, name, .spouse name, designation, manager name.


Any change in designation or manager name, triggers a new record in the employee dimension.


The MERGE statement can be used to implement SCD 2. 


drop table scd2_emp_dim; -- employee dimension table

drop table scd2_emp_dim_stg; -- stage table

create table scd2_emp_dim (

    emp_wid numeric IDENTITY primary key, -- surrogate key
    emp_code varchar(100) not null,  -- natural key
    emp_name varchar(100) not null, 
    emp_spouse_name varchar(100) not null,
    emp_desig varchar(100) not null,
    emp_mgr_name  varchar(100) not null,
    constraint scd2_emp_dim_uk unique (emp_code, emp_desig, emp_mgr_name) -- to ensure one record per emp code+scd columns
);

create table scd2_emp_dim_stg (

    emp_code varchar(100) primary key,
    emp_name varchar(100) not null,
    emp_spouse_name varchar(100) not null,
    emp_desig varchar(100) not null,
    emp_mgr_name  varchar(100) not null
);

delete from scd2_emp_dim_stg;

insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e1','e1','s1','d1','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e2','e2','s2','d2','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e3','e3','s3','d3','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e4','e4','s4','d4','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e5','e5','s5','d5','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e6','e6','s6','d6','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e7','e7','s7','d7','boss');


merge into scd2_emp_dim t1 using scd2_emp_dim_stg t2 on (t1.emp_code = t2.emp_code AND t1.emp_desig = t2.emp_desig AND t1.emp_mgr_name = t2.emp_mgr_name)

when matched then
    update set t1.emp_name = t2.emp_name, t1.emp_spouse_name = t2.emp_spouse_name
when not matched then
    insert (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) 
    values (t2.emp_code, t2.emp_name, t2.emp_spouse_name, t2.emp_desig, t2.emp_mgr_name);

select * from scd2_emp_dim;


Since the dimension was empty so far, all records got inserted.


delete from scd2_emp_dim_stg;

insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e2','e2','s2','d2','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e3','e3 new','s3 new','d3','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e4','e4','s4','d4 new','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e5','e5','s5','d5','boss new');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e6','e6','s6','d4 new','boss new');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e7','e7 new','s7 new','d7 new','boss new');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e8','e8','s8','d8','boss');

merge into scd2_emp_dim t1 using scd2_emp_dim_stg t2 on (t1.emp_code = t2.emp_code AND t1.emp_desig = t2.emp_desig AND t1.emp_mgr_name = t2.emp_mgr_name)

when matched then
    update set t1.emp_name = t2.emp_name, t1.emp_spouse_name = t2.emp_spouse_name
when not matched then
    insert (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) 
    values (t2.emp_code, t2.emp_name, t2.emp_spouse_name, t2.emp_desig, t2.emp_mgr_name);

select * from scd2_emp_dim;

    
The new records got inserted. Existing employee records got a new entry if any of the SCD columns changed (designation, manager name). Existing employee records which changed in non-SCD columns like name or spouse name, were simply updated.


This is a minimalistic SCD 2 scheme. We can add current indicator column to this scheme fairly easily. We can also add effective start date and end date to this easily. Also, records which were not there in the stage but in the dimension table were left untouched. MERGE Statement can be enhanced to mark those soft-deleted.

No comments:

Post a Comment