SSIS Flexible UnPivot

One of the problems I have encountered is trying to unpivot a file that has a variable number of columns. There are a set of columns that are fixed, that represent the Row keys. And then there are variable numbers of columns. This is especially true when your columns are based on dates. This sample is based on a question that was raised on the MSDN forums (https://social.technet.microsoft.com/Forums/sqlserver/en-US/cdd2cbd6-bed1-482a-be1c-f4cf434ed1ba/how-to-create-a-ssis-package-dynamically?forum=sqlintegrationservices&prof=required.

The UnPivot transformation, like most SSIS transformations is rigid as to the expected meta data. It expects all of the columns to have the same types, that there are exactly the same number of columns and they have the same name from run to run. This problem causes havoc with SSIS.

Product,2015Jan,2015Feb,2015Mar,2016Jul,2016Aug
0000001 - Product 0000001,52.31,48.97,47.94,48.47,49.52

I want to output this as:

Product,Year,Month,Margin
0000001 - Product 0000001,2015,Jan,52.31
0000001 - Product 0000001,2015,Feb,48.97

I might not start with column 2015Jan. I might start all of a sudden with 2016Jan. And I am likely to add columns at the end.

The Script Component provides the flexibility to handle this. First you create a File Connection to put the location for the file you are connecting to. Next, you create a Data Flow and add a Script Component to the Data Flow, setting its type to Source. Then you configure the Script Component. You add the Connection Manager to the Script:

Connection Manager

The SSIS Script component Connection Manager


You add the output columns:
Out Columns

The SSIS Script Component Out Columns

.
Then you can edit the script:

  int rowNum = 0;
  public override void CreateNewOutputRows()
  {
    // Open the file for reading
    StreamReader rdr = File.OpenText(this.Connections.SourceFile.ConnectionString);
    string[] months = new string[1]; // Place holder for the months and years
    int[] years = new int[1];
    while (rdr.EndOfStream == false)
    {
    string line = rdr.ReadLine();
    string[] cols = line.Split(',');
    if (cols.Length > 1)
    {
      rowNum++; // Add one to the row number
      if (rowNum++ > 1) // This is not the first rows
      {
        for (int i = 1; i < cols.Length; i++)
        {
        Output0Buffer.AddRow();
        Output0Buffer.Product = cols[0];
        Output0Buffer.Year = years[i - 1];
        Output0Buffer.Month = months[i - 1];
        Output0Buffer.Margin = decimal.Parse(cols[i]);
        }
      }
      else
      {
        months = new string[cols.Length - 1];
        years = new int[cols.Length - 1];

        for (int i = 1; i < cols.Length; i++)
        {
        months[i - 1] = cols[i].Substring(4);
        years[i - 1] = int.Parse(cols[i].Substring(0, 4));
        }
      }
    }
 }

The script does the following. I need to know the rownum. The first row I treat differently from the rest. I gather from the first row the identifiers that I am going to use. That is the else clause in the rownum test. I simple put the values into a months and years array. For the other rows, I take the value from the first column (cols[0]) and then create a row for each column after the first column.

I have not included error handling. You can download the source at FlexibleUnpivot.

Posted in ETL, Pivot, SSIS, SSIS Script | Leave a comment

SSIS Removing Duplicate Rows Using Conditions

Sometime your data source has duplicates. SSIS has an out of the box tool to remove these duplicates. The Sort transformation has a “Remove rows with duplicate sort values” property (https://www.mssqltips.com/sqlservertip/3036/removing-duplicates-rows-with-ssis-sort-transformation/). But it has a problem. It is very random as to the values that it removes. How do you specify which row to remove?

One scenario that one might encounter is a situation where you are bringing in data from multiple sources. If the data occurs in one source, then use that row, otherwise use data from the second source. I have built a simple SSIS package that you can download from
Sort Remove Duplicates Package. It has the following data flow:

SSIS DataFlow

SSIS Delete Duplicate DataFlow


The data sources are both Script components.

 public override void CreateNewOutputRows()
    {
        /*
          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        */
        int i = 0;
        int s = 1;
        Output0Buffer.AddRow();
        Output0Buffer.Set = s;
        Output0Buffer.ID = ++i;
        Output0Buffer.Val = 100;
        Output0Buffer.AddRow();
        Output0Buffer.Set = s;
        Output0Buffer.ID = ++i;
        Output0Buffer.Val = 100;
        Output0Buffer.AddRow();
        Output0Buffer.Set = s;
        Output0Buffer.ID = ++i;
        Output0Buffer.Val = 100;
        Output0Buffer.AddRow();
        Output0Buffer.Set = s;
        Output0Buffer.ID = ++i;
        Output0Buffer.Val = 100;
        Output0Buffer.AddRow();
        Output0Buffer.Set = s;
        Output0Buffer.ID = ++i;
        Output0Buffer.Val = 100;
        Output0Buffer.AddRow();
        Output0Buffer.Set = s;
        Output0Buffer.ID = ++i;
        Output0Buffer.Val = 100;
        Output0Buffer.AddRow();
        Output0Buffer.Set = s;
        Output0Buffer.ID = ++i;
        Output0Buffer.Val = 100;

    }

The script for the Secondary Data Set is roughly the same. I set the s variable to 2 for the Secondary Data Set. This is so that its data always follows the corresponding Primary Data Set row if it exists. I set the variable i to 4 so that the first rows (1 – 4) only have set 1 values, the following 3 rows have both set 1 and set 2 rows and the last rows are exclusively set 2. I also assign a different value to the set 2 output.

The next transformation is a Sort. The key I use is ID, Set. I want one row for each ID. I want to pick the first Set for that ID and I want to use Set to determine which is the first row for that ID.

Finally, I have a Script component to only output the first row encountered for each ID. I have to configure a few elements. The first thing that I do is to select the ID and Set columns as Input columns. That is so that I can refer to these columns in my Script component.

I also have to provide some way to exclude and include rows. To do that I use Exclusion Groups.

SSIS Script Component Exclusion Group

SSIS Script Component Exclusion Group


What you do is to set the ExclusionGroup to some number other than 0. What this will do is to allow my script to direct a particular row to that ExclusionGroup. Also note the SynchronousInputID. It should have a value that points to the only input.

The script is rather simple. You override the Input0_ProcessInputRow method:

    int lastID = -1;

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (Row.ID != lastID)
        {
            Row.DirectRowToOutput0();
            lastID = Row.ID;
        }
    }

Note that I have added a class level variable lastID. Every row is going to execute this code. If the ID column value (Row.ID) is not the same as lastID, then simply DirectRowToOutput0(). Then, don’t forget to set lastID equal to Row.ID for the next pass. What happens is that if a new ID is encountered, then the row will be sent on. Otherwise the Row disappears.

Please let me know if there is a simpler way to do this. Also, let me know how I can improve this script.

Posted in ETL, SSIS, SSIS Script | Leave a comment

Reordering Columns Using PowerShell

One of the regular issues that is discussed on the SSIS forum is what to do when a source file changes the order of columns. You build your SSIS package to handle Col1, Col2, Col3 and you get the files with the following order Col2, Col3, Col1. Or worse, you get Col2, Col1. How do you handle this case? It usually involves some nasty .Net code.

But why not process the file using PowerShell? In previous posts (listed at the end), I suggest that one can use PowerShell to preprocess a comma delimited file, sorting the result, removing columns and filtering the results. The same technique can reorder the columns, even add a column.

Here is my base file.

ID,Name,State,Country
1,Russ,Sandys,Bermuda
2,Katie,Texas,USA
3,Gail,Sandys,Bermuda
4,Melanie,Oregon,USA
5,Stephanie,Arizona,USA

I expect to find columns in the following order: ID, Name, State, Country.

But sometimes the downstream process provides you with the following formats:

Country,ID,Name,State
Bermuda,1,Russ,Sandys
USA,2,Katie,Texas
Bermuda,3,Gail,Sandys
USA,4,Melanie,Oregon
USA,5,Stephanie,Arizona

Or worse:

ID,Name,State
1,Russ,Sandys
2,Katie,Texas
3,Gail,Sandys
4,Melanie,Oregon
5,Stephanie,Arizona

Either of these will break the import.

Using PowerShell, I can very easily reformat the data to expected format:

cd C:\MyFolder\
Import-Csv -Path SampleData.csv `
  | SELECT  ID, Name, State, Country `
  | ConvertTo-CSV -NoTypeInformation `
  | % {$_ -replace  `
  '\G(?<start>^|,)(("(?<output>[^,"]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
  ,'${start}${output}'} `
  | Out-File SampleDataSorted.csv -fo -en ascii ; 

In Removing quotes from CSV created by PowerShell I explain the why I use the regular expression replace.

The SELECT line of the PowerShell script is what determines the order of the columns. Interestingly, if the column does not exist in the input, it simple adds the column with a empty value.

This very simple script will always put the columns in the expected order. It will produce an empty column if the column does not have a value. By the way, while this may prevent the ETL process from failing it can leave you with invalid data.

Let me know if you have questions about this process. Can you see any use for this in your environment?

Previous posts:

Posted in ETL, PowerShell, SSIS | Leave a comment

Removing quotes from CSV created by PowerShell

In a previous post, I demonstrate how to reshape a comma delimited file (CSV) using PowerShell. I noted that the results put all of the column values in double quotes:

"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"

I used this code to produce this result:

cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  |Sort-Object State `
  |Export-Csv SampleDataSorted.csv `
  -NoTypeInformation; 

In this post, I remove the double quotes.

A Microsoft blog suggests one way to remove these double quotes. They suggest using the ConvertTo-CSV cmdlet rather than Export-CSV and doing a string replace of the double quotes with empty string (code from the article).

dir c:\fso -Filter *.csv | ? {$_.basename -like 'users?'} `
 | Import-Csv `
 |  sort lname,fname `
 | convertto-csv -NoTypeInformation `
 | % { $_ -replace '"', ""} `
 | out-file c:\fso\usersconsolidated.csv -fo -en ascii

I convert my command to:

cd c:\MyFolder
Import-Csv -Path SampleData.csv `
  | Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  | Sort-Object State `
  | ConvertTo-CSV -NoTypeInformation `
  | % {$_ -replace '"',""} `
  | Out-File SampleDataSorted.csv -fo -en ascii
  ; 
Import-Csv -Path SampleData.csv Read source csv
Where {$_.Country -eq "USA"} Filter rows
SELECT ID, Name, State Set output columns
ConvertTo-CSV -NoTypeInformation Create CSV output
% {$_ -replace '"',""} For each row, replace double quotes with empty string
Out-File SampleDataSorted.csv -fo -en ascii Output to file

This produces the results I expect.

ID,Name,State
5,Stephanie,Arizona
4,Melanie,Oregon
2,Katie,Texas

But here is the problem. ETL scenarios can be messy. CSV files frequently need the double quotes. When a column contains a comma, you need to put the quotes around the column value. For example a file like this is problematic.

ID,Name,State,Country
1,Russ,Sandys,Bermuda
3,Gail,Sandys,Bermuda
4,Melanie,Oregon,USA
5,Stephanie,Arizona,USA
6,"Katie, ""Scott"", Arianna",Texas,USA

I want to keep the quotes around the second column that last row.

Regular expressions to the rescue. The replace command can do regular expression replacement without any additional work. That isn’t to say that building the regular expression isn’t painful.

cd c:\MyFolder
Import-Csv -Path SampleData.csv `
  | Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  | Sort-Object State `
  | ConvertTo-CSV -NoTypeInformation `
  | % {$_ -replace  `
  '\G(?<start>^|,)(("(?<output>[^,"]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
  ,'${start}${output}'} `
  | Out-File SampleDataSorted.csv -fo -en ascii ; 

Let’s break down the expression. The first expression (which I will explain in a moment) is the expression to be matched. The second (‘${start}${output}’} is what that expression is replaced with. I use named captures (start, output) to identify what I want the expression to be replaced with. One way to view named captures is like variables. If the pattern is matched then assign the text to that cvariable. Note that the single quotes around this expression are important http://www.johndcook.com/blog/powershell_perl_regex/#capture. So every match that is found will be replaced with the capture called “start” followed by the capture called “output.”

The first expression is interpreted as follows.

\G
Must occur after the previous match. That means that the expression won’t match if there isn’t an uninterrupted sequences of matches, one right after the other.

(?<start>^|,)
This defines the "start" capture, which is either the beginning of the string (^) or comma

((…)|(…))
Try to match first expression and if it fails, try the second. The pipe | between the two expressions in parentheses means to try the first pattern and as soon as it fails, try the second pattern.

("(?<output>[^,"]*?)"(?=,|$))
  ("(?<output>[^,"]*?)"(?=,|$))
Starts and ends with double quote
  ("(?<output>[^,"]*?)"(?=,|$))
After the second double quote, a comma must be present or it must be the end of the string. The construction ?= is not part of the match. That is important since I need to use that comma in the next match.
  ("(?<output>[^,"]*?)"(?=,|$))
This is what will be put in the “output” capture. The way I understand this is that there is a variable called “output.” If this pattern matches, then the “output” variable will be assigned the text that is found (represented by the pattern in the rest of the parentheses). The quotes will not be included in my replace statement above if this pattern is matched because they are outside of the “output” parentheses.
  ("(?<output>[^,"]*?)"(?=,|$))
This is a sequence of zero or more characters that are not comma or double quote. The * means to get zero or more. The ? after the * prevents the match from going to the end of the string. This is what is assigned to the “output” capture (variable) if the pattern is matched.

(?<output>".*?(?<!")("")*?"(?=,|$))
This is the pattern that is matched if the first fails. Note that the whole pattern, including the surrounding double quotes, is in the “output” capture and thus will be output by the replace statement.
  (?<output>".*?(?<!")("")*?"(?=,|$))
Match every character. It will stop if it finds a double quote followed by the comma or end of string.
  (?<output>".*?(?<!")("")*?"(?=,|$))
This handles the case where there are two double quotes before a comma. I want to leave them alone. If there are an even number of double quotes before the comma then continue looking for the end of the column.
  (?<output>".*?(?<!")("")*?"(?=,|$))
Double quotes cannot precede the next pattern. What I am trying to avoid is interpreting exactly two (or any even number of) double quotes preceding a comma being interpreted as the end of the match. If there are two quotes, then treat the comma as part of the current column. If there is one quote, then treat the comma as the break between two columns.
  (?<output>".*?(?<!")("")*?"(?=,|$))
This represents a sequence of two double quotes. Basically, if there are an even number of double quotes before a comma then treat that comma as part of the current column and continue looking for the next sequence of double quotes followed by comma to end the column.

Let me know if I can improve my regular expression. And definitely post any questions.

Posted in ETL, PowerShell | Leave a comment

Shaping a comma delimited file with PowerShell

Here is a use case for what I am thinking of. You have comma delimited file that you need to load into SQL Server. It is a very large file that you need to join to an equally large table in a database. For example at one place I worked we had a set of account balances that we needed to join to a customer table that had 15 million rows in it. We used a lookup, but it took about 20 minutes to load the customer table into cache. My thought at the time was whether we could use two data sources, both sorted on the customer id and use a join. Could that have sped up the process?

There would be a problem if you are using a comma delimited file. Sorting the file would necessarily be done in the SSIS process. But what if the comma delimited file had already been sorted before loading in the data source. You could simply set the IsSorted flag and you could use the Merge Join.

But to avoid putting a load on the SSIS Server, why not off load that process on a less busy server? Sort the file then transfer the file to the SSIS Server (or read the file from a shared drive). But if that server is only going to sort a file why install SSIS on that server. Why not use a lighter tool?

Import-Csv -LiteralPath SampleData.csv|Sort-Object Country|Export-Csv SampleDataSorted.csv
Posted in ETL, PowerShell | Leave a comment

Using PowerShell to shape comma delimited file

PowerShell has power to handle many ETL tasks (PowerShell as an ETL tool). In this post I illustrate how you can use PowerShell to shape a comma delimited file. That is, how to remove columns, filter rows and sort the file output.

Let’s start with a simple file (SampleData.csv):

ID,Name,State,Country
1,Russ,Sandys,Bermuda
2,Katie,Texas,USA
3,Gail,Sandys,Bermuda
4,Melanie,Oregon,USA
5,Stephanie,Arizona,USA

I want to reduce my set to only people in the USA. I want to remove the Country column and I want to sort by State. This is my intended output (SampleDataSorted.csv).

ID,Name,State
5,Stephanie,Arizona
4,Melanie,Oregon
2,Katie,Texas

This would be very easy to do using SSIS. I would need a Flat File Source to read my csv file. I would need a Conditional Split transformation to remove the rows where the Country is not USA. And I would require a Sort transformation. Finally, I would use a Flat File Destination to output only the three columns.

I can do this with PowerShell as well (for reasons why PowerShell as an ETL tool).

So here is my ETL package:

cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  |Sort-Object State `
  |Export-Csv SampleDataSorted.csv ; 
SSIS Equivalent Task PowerShell
Flat File Source Read source Import-Csv -Path SampleData.csv
Conditional Split Filter Where {$_.Country -eq “USA”}
Flat File Destination Reduce the columns SELECT ID, Name, State

Sort Transformation Sort Sort-Object State
Flat File Destination Output CSV Export-Csv SampleDataSorted.csv

But it has a problem. This is the output.

#TYPE System.Management.Automation.PSCustomObject
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"

It would be a major pain to handle the #TYPE header. How do I get rid of this? And how do I remove the double quotes? I type this to learn more about the Export-CSV command:

help Export-CSV

That gives a little information, but the response suggests that I try this:

Get-help Export-Csv -Online

This brings me to a web page with all kinds of information. I learn that the command has a “-NoTypeInformation” that will remove the #TYPE line.

Unfortunately, you can’t remove quotes from the Export-CSV output. This is a minor problem, I will examine solutions for this problem in future posts.

So the final command and its output is:

cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  |Sort-Object State `
  |Export-Csv SampleDataSorted.csv `
  -NoTypeInformation; 
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"
Posted in ETL, PowerShell | Leave a comment

PowerShell as an ETL tool

“If Your Only Tool Is a Hammer Then Every Problem Looks Like a Nail.” My hammer has been SSIS for most of the last 8 years. I have reluctantly used scripting language like .Net framework (C# and VB.Net) and even VBA and (don’t choke) Visual Basic 6.0. But I think that I have found a screwdriver for problems that really aren’t nail problems. That screwdriver is PowerShell. I am going to look at some interesting uses for PowerShell in the next few posts.

One of the challenges of using SSIS is licensing. You have to pay megabucks to install SQL Server on a machine if you are going to use SSIS. But what if the tasks you are working on don’t require the power of SSIS? What if all you need a particular server to do is to download a file from an FTP server? Or what if you need to shape a file for loading, removing columns and rows or sorting it? If you could do this on a server separate from your normal SSIS server, you can reserve that server for the heavy lifting of data validation and data loading.

PowerShell can handle quite a few simple ETL tasks. It may not be the fastest tool, but if the task is not in the critical path, perhaps it can be done offline on a second server.

Have you have had any experience using PowerShell for your ETL tasks? Please share your stories.

Posted in ETL, PowerShell | Leave a comment

What to include in source view

In a previous post (ETL Source: full query or view) I suggest that you create a view (or function) to use as the source for your data load. In this post, I ask what should you include in your source view? I argue that in this case more really is more.

There is a lot to be said for including in your view just the columns and rows needed for the load. For example, when I am loading my fact table, I will need to find the data mart dimension keys. One of the design principles the Kimball group advocates is using surrogate keys for your dimension tables and relating your fact tables to the dimension tables using those surrogate keys (http://www.kimballgroup.com/1998/05/surrogate-keys/). The source table will have business keys. I just need to join the business keys to the dimension tables to get the dimension surrogate keys.

Here’s the rub. I really don’t need the business key, right? So why show that column? I really don’t intend to insert rows that don’t have a corresponding dimension row, so why not use an inner join to exclude rows without matching dimension rows? I may also be filtering out rows from the staging table. Why not filter those rows out in the view? And if I don’t load those filter columns into the fact table, I certainly want to exclude those columns from the view output.

But then certain rows are not inserted into the fact table. What happened? If I have a lean view, I have deconstruct the view to try to identify the points where these rows were excluded. It could be an element in the filter that inexplicably filters out the row. Or it could be that the business key is not in the dimension table.

If I write the view using a left outer join and include the business key in the view output, I can find the rows where the surrogate key is missing. If I don’t put the filter in the view, I can run the query without the filter. It is possible there is an unexpected business key value. Or the filter excludes the row. I can find the row that I expect to find and determine what causes it to be excluded.

When I use the query with the view, I can apply appropriate filters to get the results I want. I can apply not null filters on the view to exclude rows not matching dimension rows and I can filter the output. And certainly I won’t use the columns that are not necessary.

Posted in ETL | Leave a comment

ETL Source: full query or view

When copying data, you generally need to shape the source data before loading it. This may be as simple as joining business keys to dimension tables to get dimension keys. Or the query can be quite complex, involving multiple joins, derived tables, etc. Do you embed the full query in the stored procedure (or data source) to load your target table? Or do you use a view or table valued function? There is almost a religious fervor associated with each position.

I suggest that for the purposes of debugging, using a view/function makes sense.

Here is the situation that I encountered. A stored procedure copied data from staging to our data mart. There were multiple joins between different staging tables, dimension tables and other fact tables. We had to use common table expressions embedded in common table expressions. At the end of this post is a majorly simplified outline of the actual stored procedure.

I want to be able to view results without changing the data. In addition, I would like to be able to compare the results of the old code with the new. And I would like to easily incorporate the revisions that I make in the debug process.

If I replace the hard coded SQL with a table valued function, I could create a new version of the function:

create function MyFunction_test ... as (same code are original)

Now that I have multiple functions, I can run various selects.

select * from MyFunction (@param1, @param2)
except
select * from MyFunction_test (@param1, @param2)

I keep changing MyFunction_test until I get the results I expect. Then I would alter MyFunction and put the new code into source control.

Alter function MyFunction ... as (revised code)

It would be challenging to do that with hard coded SQL. What I did was to comment out the logging. I commented out the create proc line and converted the parameters to variables. I commented out various begin and end statements. Then when I finally got the query to return the results right I had to paste the code in the right place.

Create proc myproc 
Parameters AS
Calculate variables
Log start
Begin try
Begin transaction
With cte1 as (
Select x, y, z, sum(a) as TotalA
  , Sum(case 
    when b = 1 then c 
    when d = 2 then e 
    else 0 end) as TotalCE
From tbl1
Where f = @param and g = @variable
Group by x, y, z
), cte2 as (
Select x, sum(amt) as totalAmount from stagingtable
Where f = @param
Group by x
_)
Insert facttable
Select d.surrkey
 , cte1.y, cte1.z
 , cte1.TotalA, cte1.TotalCE
 , cte2.TotalAmount
From cte1 inner join cte2 on cte1.x = cte2.x
Inner join dimtable d on cte1.x = d.x
Left outer join facttable f of d.surrkey = f.dimkey
Where f.dimkey is null;
Commit transaction
End try
Begin catch; Rollback; End catch;

Posted in ETL, SSIS | Leave a comment

Another argument for using command source in SSIS

You are building an SSIS package and carelessly create a data flow with a source that connects to a SQL Server table. You simply use the defaults in the OleDB source and select a table name. Then some slacker changes the table definition, not removing or adding columns, but simply carelessly renaming a column, changing the case on some of the letters. What happens? As Nakul Vachhrajani points out (http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/136082/) you will get the dreaded VS_NEEDSNEWMETADATA error.

To summarize Nakul’s point. If you point an OleDB source to a table with a column called IsOrderAGift on my development server, and then deploy the package/project to production where that column is named IsOrderagift, SSIS will think that you have a new column. The problem, as Nakul explains, is that SSIS is case sensitive and SQL Server is normally configured to be case insensitive.

Here is where using a SQL command comes to the rescue. If rather than selecting a table as the source you use a SQL command, you can easily avoid the problem by using the following SQL command:

SELECT IsOrderAGift From tablename

When you move the package to production, SQL Server happily returns IsOrderagift and aliases it as IsOrderAGift. SSIS is happy and you are happy not to get that last minute error right after deployment.

Posted in SSIS | Comments Off on Another argument for using command source in SSIS