SSIS Pivot Transformation with Multiple Set Keys

When you pivot data, there are three dimensions that you need to set: the Set Keys, the Pivot Keys and the Pivot Values (to use the SSIS terms). In Excel it is very easy to select multiple Set Keys (ROWS), multiple Pivot Keys (COLUMNS) and even multiple Pivot Values (Values). However, using the SSIS Pivot Transformation editor one can only choose one Set Key, one Pivot Key and one Pivot Value. However, you can have multiple Set Key columns by using the Advanced Editor.

The Pivot key in the SSIS Pivot Transformation is the column that contains the values that are going to define the new columns. The Pivot Value is the value that is going to be inserted into these new columns. Basically, you map a value from the Pivot Key column to a new output column. And SSIS will put the value from the Pivot Value column into those output columns. The Set Key identifies the column combination that defines the rows.

Why do we need multiple Set keys? In many situations, what defines a unique row is not a single column. For example, one might be loading data from a source keyed to the state and county. For example the following data set.

SUMLEV	STNAME	CTYNAME	Year	Population
50	Alabama	Washington County	2010	17610
50	Arkansas	Washington County	2010	204026
50	Colorado	Washington County	2010	4801
50	Alabama	Washington County	2011	17336
50	Arkansas	Washington County	2011	207882
50	Colorado	Washington County	2011	4809

If I were to use the county name as the Set Key, then I would have duplicates that could cause the Pivot transformation to fail. For this small set, you need the STNAME and the CTYNAME to distinguish the rows.

The District of Columbia introduces a different twist. District of Columbia is the name of the county and the name of the state. The SUMLEV distinguishes this usage for each row (50 means County level, 40 means State level).

SUMLEV	STNAME	CTYNAME	Year	Population
50	District of Columbia	District of Columbia	2010	605126
40	District of Columbia	District of Columbia	2011	620472
50	District of Columbia	District of Columbia	2011	620472
40	District of Columbia	District of Columbia	2010	605126

Never fear, it is possible to have multiple columns in your Set key. You just have to use the Advanced editor. Let’s start.

First create the package (I have a finished example at the end). Add the flat file source to get the data. Then add a Sort transformation to sort by SUMLEVEL, STNAME, and CTYNAME. Finally, add the Pivot Transformation and configure it like the following:

You will need to identify the Pivot Key values that you need to pivot on and paste them into the Generate Pivot output columns from values textbox. Then click the “Generate Columns Now” button. This will generate the Column names in the Existing Pivoted output columns textbox (you can rename these columns in the advanced editor).

The package will run at this point. However, it doesn’t show whether the row is about the STATE or about a COUNTY in the STATE (SUMLEV). Also, it doesn’t identify the STATE that each COUNTY is associated with. I want to see the SUMLEV and STNAME for each row and I want to guarantee that SSIS groups by SUMLEV, STNAME and CTYNAME, not just CTYNAME.

To do that you right click the Pivot transformation and select Advanced Editor:

Then go to the Input Columns tab.

Select all of the columns that you want to output, including those you don’t want to include in the Set Key. In this example, I have only selected the Set Key columns.
Next, go to the Input and Output Properties tab. You will need to do a couple things here.

Note the LineageID property. You will need that property value when you create a Pivot Default Output column corresponding to this column. Next, change PivotUsage from 0 to 1 (leave this value 0 if you want to simply pass the column through, without including it in the Set key).

Next, go to the Pivot Default Output and add a new Output Column. Add a column and give a name (could be the same name as the Input column). In the SourceColumn, set the value the same as the LineageID property from the corresponding Input column. This will set the DataType property for this output.

If you want to simply pass through the columns, leave the PivotUsage equal to 0 in the Input Columns. Then create the corresponding column in the Output Columns and set its SourceID equal to the LineageID of the Input Column.

SSIS allows you to do a lot more than shows on the surface. The Advanced editor provides some interesting modifications.

An example of using the Pivot transformation can be downloaded from here: Pivot transformation Example.

SSIS Pivot Transformation with Multiple Set Keys

Like this:

Related

Leave a ReplyCancel reply

Share this:

Like this:

Related

Leave a ReplyCancel reply