Modelling Change In Your Dimensions

Money

You use dimension tables to hold the descriptive attributes that you want to use to analyse your measures.

This is fine but what do you do when this information changes? There are three common ways that you can handle changes in your dimensional model.

Type 1 – Overwrite

In this method when a change happens you simply overwrite the dimension record with the new values. This is nice and simple however, the old values are gone.

In this example a saleswoman has recorded some sales.
SalesPersonDimension

SalesPersonKey SalesPersonID SalesPersonName
1 000001 Mary Jones
2 000002 John Smith

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00

Of course I would never store a date in a fact table, I would use a foreign key to a date dimension. I am trying to keep the example as concise as possible.

Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes.
SalesPersonDimension

SalesPersonKey SalesPersonID SalesPersonName
1 000001 Mary Smith
2 000002 John Smith

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00
1 1-Mar-2008 £11,000.00
2 1-Mar-2008 £12,500.00

As you can see Mary’s name has been simply overwritten with the new value. However if I look back at February’s sales figures, Mary Smith had sales, but she was actually Mary Jones in February.

When using type 1 dimensions you need to be sure that you don’t mind the values being changed for all time!

Type 2 – Versioning

In this method your requirements state that you must accurately report on the information as it existed at the time. In order to do this, when the data changes you must keep the old version and then create a new version of the record with the updated values. In order to get best use from this I would always add three additional columns to my dimension: ValidFrom, ValidTo and CurrentFlag.

SalesPersonDimension

SalesPersonKey SalesPersonID SalesPersonName ValidFrom ValidTo CurrentFlag
1 000001 Mary Jones 1-Jan-2005 31-Dec-9999 Y
2 000002 John Smith 1-Mar-2006 31-Dec-9999 Y

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00

Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes.
SalesPersonDimension

SalesPersonKey SalesPersonID SalesPersonName ValidFrom ValidTo CurrentFlag
1 000001 Mary Jones 1-Mar-2006 1-Mar-2008 N
2 000002 John Smith 1-Mar-2006 31-Dec-9999 Y
3 000001 Mary Smith 1-Mar-2008 31-Dec-9999 Y

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00
3 1-Mar-2008 £11,000.00
2 1-Mar-2008 £12,500.00

There are a few things to note here:

Dimension record #1 Mary’s name has not been changed but it has been end-dated to mark the period when this name was valid.

Dimension record #3 has been created with her new details and again you can see the period that this record is valid for.

Mary’s sales in the fact table, in February point to dimension #1, Mary Jones and in March point to dimension #3, Mary Smith.

Type 1 & Type 2 – The best of both worlds!

Depending on the attributes you are capturing, some of them may be very dynamic and change constantly, likewise there may be attributes that you do want to track changes in, and some that you do not. This can be catered for by using both of the schemes described above. Rather than considering the whole dimension to be type 1 or type 2, the attrubutes within the dimension will be of type 1 or type 2.

In this method your requirements state that you must accurately report on the information as it existed at the time. In order to do this, when the data changes you must keep the old version and then create a new version of the record with the updated values. In order to get best use from this I would

SalesPersonDimension

Sales
Person
Key
Sales
Person
ID
Sales
Person
Name
Mobile
Tel
No
Valid
From
Valid
To
Current
Flag
1 000001 Mary Jones 07771 123 4567 1-Jan-2005 31-Dec-9999 Y
2 000002 John Smith 07772 321 7654 1-Mar-2006 31-Dec-9999 Y

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00

Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes. At the same time John gets a new mobile phone and his number changes.
SalesPersonDimension

Sales
Person
Key
Sales
Person
ID
Sales
Person
Name
Mobile
Tel
No
Valid
From
Valid
To
Current
Flag
1 000001 Mary Jones 07771 123 4567 1-Mar-2006 1-Mar-2008 N
2 000002 John Smith 07775 987 2323 1-Mar-2006 31-Dec-9999 Y
3 000001 Mary Smith 07771 123 4567 1-Mar-2008 31-Dec-9999 Y

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00
3 1-Mar-2008 £11,000.00
2 1-Mar-2008 £12,500.00

There are a few things to note here:

Mary’s changes have been recorded and tracked as it is important that we can see when the values changed.

The update to John’s mobile telephone number has simply been done “in place”, we don’t care what his number was last month, we just need to know what it’s current value is.

This is the most common implementation; with a dimension table holding a mixture of type 1 and type 2 attributes.

Type 3 – “Soft” or “Limited” Versioning

With type 2 changes there is a definite milestone event that causes the data to change. The data can change multiple times, maybe an unlimited number of times.

Another typical change is speculative reorganisation. “What If: I reorganised my sales team, how would that affect my numbers?”. You may not wish to “commit” these changes but you have to save them to be able to report on them.
SalesPersonDimension

Sales
Person
Key
Sales
Person
ID
Sales
Person
Name
Mobile
Tel
No
Sales
Region
New
Region
Valid
From
Valid
To
Current
Flag
1 000001 Mary Jones 07771 123 4567 South 1-Jan-2005 31-Dec-9999 Y
2 000002 John Smith 07772 321 7654 North 1-Mar-2006 31-Dec-9999 Y

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00

Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes. At the same time John gets a new mobile phone and his number changes. The sales director also wants to try reallocating his staff to different regions to see what impact this will have on the sales figures.
SalesPersonDimension

Sales
Person
Key
Sales
Person
ID
Sales
Person
Name
Mobile
Tel
No
Sales
Region
New
Region
Valid
From
Valid
To
Current
Flag
1 000001 Mary Jones 07771 123 4567 South 1-Mar-2006 1-Mar-2008 N
2 000002 John Smith 07775 987 2323 North South 1-Mar-2006 31-Dec-9999 Y
3 000001 Mary Smith 07771 123 4567 South North 1-Mar-2008 31-Dec-9999 Y

SalesFactTable

SalesPersonKey SalesDate SalesAmount
1 1-Feb-2008 £10,000.00
2 1-Feb-2008 £9,500.00
3 1-Mar-2008 £11,000.00
2 1-Mar-2008 £12,500.00

There are a few things to note here:

Two new columns have been added, SalesRegion and NewSalesRegion. SalesRegion will hold the “current” value and NewRegion will hold the proposed new value. Both of these columns are type 1 as we are not tracking changes to them. We could of course have created OldRegion, CurrentRegion and NewRegion. I’m sure you can see the problem, the more history we want to track the more columns we have to add and the more maintenance that has to be done to them: Old=North, Current=South becomes: Old=North, Current=South, New=West becomes Old=South, Current=West, New=East and so on.

The onus is on the report writer to choose which value they want to report on.

Of the three; type 3 is the least common but there are certain requirements where it is definitely the “correct” choice.

3 thoughts on “Modelling Change In Your Dimensions”

  1. There is another method to handle the changes to Dimension values. The lineage of the dimension values would be tracked using the attributes “Valid From? and “Valid To?, which would suffice to indicate the validity of the value at a point of time.

    The reports would use these two date attributes to select the appropriate value, albeit without a flag. A “31-Dec-999? in the “Valid To? column would indicate the record to be ‘Current’.

    Concisely, this would be Versioning without the flag.

  2. You are quite correct in what you say.
    What you describe is Type-2 Slowly Changing Dimensions.
    You are also correct that you do not need the CurrentFlag attribute that it can be completely replaced with the predicate ValidTo=to_date(’31-Dec-9999′, ‘dd-Mon-YYYY’).

    The reason that I tend to add this simple flag, one byte per record, is for simple ease of use. If all users will be using Business Objects then I would replace the physical attribute with a logical one that used the logic you described. If however users will be using other query tools such as SAS or even SQL then they have to “know” which date to use.

    I would never question the competency of the people that will be querying these systems but the easier that we can make it the better!

    Ross

Leave a Reply

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)