SSIS Pivot Transformation with Multiple Set Keys

When you pivot data, there are three dimensions that you need to set: the Set Keys, the Pivot Keys and the Pivot Values (to use the SSIS terms). In Excel it is very easy to select multiple Set Keys (ROWS), multiple Pivot Keys (COLUMNS) and even multiple Pivot Values (Values). However, using the SSIS Pivot Transformation editor one can only choose one Set Key, one Pivot Key and one Pivot Value. However, you can have multiple Set Key columns by using the Advanced Editor.

The Pivot key in the SSIS Pivot Transformation is the column that contains the values that are going to define the new columns. The Pivot Value is the value that is going to be inserted into these new columns. Basically, you map a value from the Pivot Key column to a new output column. And SSIS will put the value from the Pivot Value column into those output columns. The Set Key identifies the column combination that defines the rows.

Why do we need multiple Set keys? In many situations, what defines a unique row is not a single column. For example, one might be loading data from a source keyed to the state and county. For example the following data set.

SUMLEV STNAME CTYNAME Year Population
50 Alabama Washington County 2010 17610
50 Arkansas Washington County 2010 204026
50 Colorado Washington County 2010 4801
50 Alabama Washington County 2011 17336
50 Arkansas Washington County 2011 207882
50 Colorado Washington County 2011 4809

If I were to use the county name as the Set Key, then I would have duplicates that could cause the Pivot transformation to fail. For this small set, you need the STNAME and the CTYNAME to distinguish the rows.

The District of Columbia introduces a different twist. District of Columbia is the name of the county and the name of the state. The SUMLEV distinguishes this usage for each row (50 means County level, 40 means State level).

SUMLEV STNAME CTYNAME Year Population
50 District of Columbia District of Columbia 2010 605126
40 District of Columbia District of Columbia 2011 620472
50 District of Columbia District of Columbia 2011 620472
40 District of Columbia District of Columbia 2010 605126

Never fear, it is possible to have multiple columns in your Set key. You just have to use the Advanced editor. Let’s start.

First create the package (I have a finished example at the end). Add the flat file source to get the data. Then add a Sort transformation to sort by SUMLEVEL, STNAME, and CTYNAME. Finally, add the Pivot Transformation and configure it like the following:
04 Pivot With Primary After Generate

You will need to identify the Pivot Key values that you need to pivot on and paste them into the Generate Pivot output columns from values textbox. Then click the “Generate Columns Now” button. This will generate the Column names in the Existing Pivoted output columns textbox (you can rename these columns in the advanced editor).

The package will run at this point. However, it doesn’t show whether the row is about the STATE or about a COUNTY in the STATE (SUMLEV). Also, it doesn’t identify the STATE that each COUNTY is associated with. I want to see the SUMLEV and STNAME for each row and I want to guarantee that SSIS groups by SUMLEV, STNAME and CTYNAME, not just CTYNAME.

To do that you right click the Pivot transformation and select Advanced Editor:

06 Advanced Editor
Then go to the Input Columns tab.
07 Advanced Add Additional Columns

Select all of the columns that you want to output, including those you don’t want to include in the Set Key. In this example, I have only selected the Set Key columns.
Next, go to the Input and Output Properties tab. You will need to do a couple things here.

09 Set PivotUsage

Note the LineageID property. You will need that property value when you create a Pivot Default Output column corresponding to this column. Next, change PivotUsage from 0 to 1 (leave this value 0 if you want to simply pass the column through, without including it in the Set key).

Next, go to the Pivot Default Output and add a new Output Column. Add a column and give a name (could be the same name as the Input column). In the SourceColumn, set the value the same as the LineageID property from the corresponding Input column. This will set the DataType property for this output.

14 Add SUMLevel

If you want to simply pass through the columns, leave the PivotUsage equal to 0 in the Input Columns. Then create the corresponding column in the Output Columns and set its SourceID equal to the LineageID of the Input Column.

SSIS allows you to do a lot more than shows on the surface. The Advanced editor provides some interesting modifications.

An example of using the Pivot transformation can be downloaded from here: Pivot transformation Example.

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in ETL, Pivot, SSIS. Bookmark the permalink.

Leave a Reply