Reordering Columns Using PowerShell

One of the regular issues that is discussed on the SSIS forum is what to do when a source file changes the order of columns. You build your SSIS package to handle Col1, Col2, Col3 and you get the files with the following order Col2, Col3, Col1. Or worse, you get Col2, Col1. How do you handle this case? It usually involves some nasty .Net code.

But why not process the file using PowerShell? In previous posts (listed at the end), I suggest that one can use PowerShell to preprocess a comma delimited file, sorting the result, removing columns and filtering the results. The same technique can reorder the columns, even add a column.

Here is my base file.

ID,Name,State,Country
1,Russ,Sandys,Bermuda
2,Katie,Texas,USA
3,Gail,Sandys,Bermuda
4,Melanie,Oregon,USA
5,Stephanie,Arizona,USA

I expect to find columns in the following order: ID, Name, State, Country.

But sometimes the downstream process provides you with the following formats:

Country,ID,Name,State
Bermuda,1,Russ,Sandys
USA,2,Katie,Texas
Bermuda,3,Gail,Sandys
USA,4,Melanie,Oregon
USA,5,Stephanie,Arizona

Or worse:

ID,Name,State
1,Russ,Sandys
2,Katie,Texas
3,Gail,Sandys
4,Melanie,Oregon
5,Stephanie,Arizona

Either of these will break the import.

Using PowerShell, I can very easily reformat the data to expected format:

cd C:\MyFolder\
Import-Csv -Path SampleData.csv `
  | SELECT  ID, Name, State, Country `
  | ConvertTo-CSV -NoTypeInformation `
  | % {$_ -replace  `
  '\G(?<start>^|,)(("(?<output>[^,"]*?)"(?=,|$))|(?<output>".*?(?<!")("")*?"(?=,|$)))' `
  ,'${start}${output}'} `
  | Out-File SampleDataSorted.csv -fo -en ascii ; 

In Removing quotes from CSV created by PowerShell I explain the why I use the regular expression replace.

The SELECT line of the PowerShell script is what determines the order of the columns. Interestingly, if the column does not exist in the input, it simple adds the column with a empty value.

This very simple script will always put the columns in the expected order. It will produce an empty column if the column does not have a value. By the way, while this may prevent the ETL process from failing it can leave you with invalid data.

Let me know if you have questions about this process. Can you see any use for this in your environment?

Previous posts:

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in ETL, PowerShell, SSIS. Bookmark the permalink.

Leave a Reply