SSIS Script Component – Last Row

One problem that you might encounter with data is where you are gathering information from multiple lines and want to output only one line. If you are going to simply aggregate the results, then you can use the Aggregate transformation. But what do you do if you want to create simply append one row’s value to another?
For example, what if I want to take the following and output one row per state:

State City Population
Texas Houston 2239558
Texas San Antonio 1436697
Texas Dallas 1281057
Oklahoma Oklahoma City 610613
Oklahoma Tulsa 398121
Oklahoma Norman 118197

Let’s say that my output is simply:

State Cities
Texas Houston=2239558;San Antonio=1436697;Dallas=1281057
Oklahoma Oklahoma City=610613;Tulsa=398121;Norman=118197

The way that I handle this is to have one variable to store the key (in this case the state) that I am grouping by and another that I use to gather the text. In SSIS I usually override the ProcessInputRow method for the input buffer.

public override void Input0_ProcessInputRow(Input0Buffer Row)

Basically the way that this works is that I have a variable that identifies the key for the output. In this case, I need to store a string: state. Then I have a second variable to store the text.
The program is structured as follows:
If Row.key != key, then output the data that was in the variables. Set the variables equal to the values from the Row.
If Row.key == key then append the text to the text variable.
This is the code:

string state = string.Empty;
bool firstRow = true;
string cities = string.Empty;

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (Row.State == state)
{
cities += string.Format(";{0}:{1}", Row.City, Row.Population);
}
else
{
if (!firstRow)
{
CitiesBuffer.AddRow();
CitiesBuffer.State = state;
CitiesBuffer.Cities = cities;
}
state = Row.State;
cities = string.Format("{0}:{1}", Row.City, Row.Population);
}
firstRow = false;
}

But here’s the problem. When I run this, it outputs the data for the first state up until the last state. But it leaves off the last state. The problem is that I need to output once I have finished loading the last row. But the Row doesn’t tell you when you have hit the last row.
I tried using the PostExecute method, but it appears that the buffers are all closed down when the PostExecute fires.
I found that I needed to use the ProcessInput method associated with my input. Using the ProcessInput I actually use the code to walk through all of the rows in the input. The Buffer.NextRow() will get the very next row and if there are no other rows, it will return false.


public override void Input0_ProcessInput(Input0Buffer Buffer)
{
// base.Input0_ProcessInput(Buffer);
string state = string.Empty;
bool firstRow = true;
string cities = string.Empty;

while (Buffer.NextRow())
{
if (Buffer.State == state)
{
cities += string.Format(";{0}:{1}", Buffer.City, Buffer.Population);
}
else
{
if (!firstRow)
{
CitiesBuffer.AddRow();
CitiesBuffer.State = state;
CitiesBuffer.Cities = cities;
}
state = Buffer.State;
cities = string.Format("{0}:{1}", Buffer.City, Buffer.Population);
}
firstRow = false;
}
if (!firstRow)
{
CitiesBuffer.AddRow();
CitiesBuffer.State = state;
CitiesBuffer.Cities = cities;
}
}

As soon as the routine runs out of rows from the input, I can then output what is in the buffer that I had been building.

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in ETL, SSIS Script. Bookmark the permalink.

Leave a Reply