Monday, April 28, 2014

Generated Key in the Table Comparison Transform

When I first learnt how to use Data Services, I knew that I was supposed to set the Generated Key Column in the table comparison transform, but I never really thought about I why I was supposed to do that.

So lets take a look at the example above. In this example I have a dimension for Customer with a data warehouse key of CUSTOMER_KEY. This is my generated key column. I also have a field for the natural key that comes from the source system and this is CUSTOMER_NK.

I have been maintaining history on this table, so individual CUSTOMER_NK values can appear in there multiple times. Once for each row of history maintained. So if a customer has lived in more than one city, there will be a row in the customer table for each city the customer has lived in.

So data in the table might look something like this:


If a new row for John Smith arrives, lets say he moves to San Antonio, how will Data Services know which of the 3 rows with a CUSTOMER_NK of 'A' to compare against? This is where the Generated Key Column comes in. Data Services looks for the row with the maximum generated key column value and just compares against that row. So for John Smith, DS will only do the compare against the row with CUSTOMER_KEY = 3.

If you didn't set the generated key column value, then DS would compare against all 3 rows for John Smith instead of just the one with the maximum CUSTOMER_KEY.

The history preservation transform grabs the 1 update row from the table comparison transform and generates an update to close off the old row, and issues an insert for the new row for John's new address in San Antonio.

If you are not using history preservation then the Generated Key Column is not that important, but if you are doing type 2 history, then its essential that you set it for the history preservation to work correctly.



No comments:

Post a Comment