SSIS Flexible UnPivot

One of the problems I have encountered is trying to unpivot a file that has a variable number of columns. There are a set of columns that are fixed, that represent the Row keys. And then there are variable numbers of columns. This is especially true when your columns are based on dates. This sample is based on a question that was raised on the MSDN forums (https://social.technet.microsoft.com/Forums/sqlserver/en-US/cdd2cbd6-bed1-482a-be1c-f4cf434ed1ba/how-to-create-a-ssis-package-dynamically?forum=sqlintegrationservices&prof=required.

The UnPivot transformation, like most SSIS transformations is rigid as to the expected meta data. It expects all of the columns to have the same types, that there are exactly the same number of columns and they have the same name from run to run. This problem causes havoc with SSIS.

Product,2015Jan,2015Feb,2015Mar,2016Jul,2016Aug
0000001 - Product 0000001,52.31,48.97,47.94,48.47,49.52

I want to output this as:

Product,Year,Month,Margin
0000001 - Product 0000001,2015,Jan,52.31
0000001 - Product 0000001,2015,Feb,48.97

I might not start with column 2015Jan. I might start all of a sudden with 2016Jan. And I am likely to add columns at the end.

The Script Component provides the flexibility to handle this. First you create a File Connection to put the location for the file you are connecting to. Next, you create a Data Flow and add a Script Component to the Data Flow, setting its type to Source. Then you configure the Script Component. You add the Connection Manager to the Script:

Connection Manager

The SSIS Script component Connection Manager


You add the output columns:
Out Columns

The SSIS Script Component Out Columns

.
Then you can edit the script:

  int rowNum = 0;
  public override void CreateNewOutputRows()
  {
    // Open the file for reading
    StreamReader rdr = File.OpenText(this.Connections.SourceFile.ConnectionString);
    string[] months = new string[1]; // Place holder for the months and years
    int[] years = new int[1];
    while (rdr.EndOfStream == false)
    {
    string line = rdr.ReadLine();
    string[] cols = line.Split(',');
    if (cols.Length > 1)
    {
      rowNum++; // Add one to the row number
      if (rowNum++ > 1) // This is not the first rows
      {
        for (int i = 1; i < cols.Length; i++)
        {
        Output0Buffer.AddRow();
        Output0Buffer.Product = cols[0];
        Output0Buffer.Year = years[i - 1];
        Output0Buffer.Month = months[i - 1];
        Output0Buffer.Margin = decimal.Parse(cols[i]);
        }
      }
      else
      {
        months = new string[cols.Length - 1];
        years = new int[cols.Length - 1];

        for (int i = 1; i < cols.Length; i++)
        {
        months[i - 1] = cols[i].Substring(4);
        years[i - 1] = int.Parse(cols[i].Substring(0, 4));
        }
      }
    }
 }

The script does the following. I need to know the rownum. The first row I treat differently from the rest. I gather from the first row the identifiers that I am going to use. That is the else clause in the rownum test. I simple put the values into a months and years array. For the other rows, I take the value from the first column (cols[0]) and then create a row for each column after the first column.

I have not included error handling. You can download the source at FlexibleUnpivot.

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in ETL, Pivot, SSIS, SSIS Script. Bookmark the permalink.

Leave a Reply