PowerShell has power to handle many ETL tasks (PowerShell as an ETL tool). In this post I illustrate how you can use PowerShell to shape a comma delimited file. That is, how to remove columns, filter rows and sort the file output.
Let’s start with a simple file (SampleData.csv):
ID,Name,State,Country 1,Russ,Sandys,Bermuda 2,Katie,Texas,USA 3,Gail,Sandys,Bermuda 4,Melanie,Oregon,USA 5,Stephanie,Arizona,USA
I want to reduce my set to only people in the USA. I want to remove the Country column and I want to sort by State. This is my intended output (SampleDataSorted.csv).
ID,Name,State 5,Stephanie,Arizona 4,Melanie,Oregon 2,Katie,Texas
This would be very easy to do using SSIS. I would need a Flat File Source to read my csv file. I would need a Conditional Split transformation to remove the rows where the Country is not USA. And I would require a Sort transformation. Finally, I would use a Flat File Destination to output only the three columns.
I can do this with PowerShell as well (for reasons why PowerShell as an ETL tool).
So here is my ETL package:
cd c:\MyFolder Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} ` | SELECT ID, Name, State ` |Sort-Object State ` |Export-Csv SampleDataSorted.csv ;
SSIS Equivalent | Task | PowerShell | |
Flat File Source | Read source | Import-Csv -Path SampleData.csv | |
Conditional Split | Filter | Where {$_.Country -eq “USA”} | |
Flat File Destination | Reduce the columns | SELECT ID, Name, State | |
Sort Transformation | Sort | Sort-Object State | |
Flat File Destination | Output CSV | Export-Csv SampleDataSorted.csv |
But it has a problem. This is the output.
#TYPE System.Management.Automation.PSCustomObject "ID","Name","State" "5","Stephanie","Arizona" "4","Melanie","Oregon" "2","Katie","Texas"
It would be a major pain to handle the #TYPE header. How do I get rid of this? And how do I remove the double quotes? I type this to learn more about the Export-CSV command:
help Export-CSV
That gives a little information, but the response suggests that I try this:
Get-help Export-Csv -Online
This brings me to a web page with all kinds of information. I learn that the command has a “-NoTypeInformation” that will remove the #TYPE line.
Unfortunately, you can’t remove quotes from the Export-CSV output. This is a minor problem, I will examine solutions for this problem in future posts.
So the final command and its output is:
cd c:\MyFolder Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} ` | SELECT ID, Name, State ` |Sort-Object State ` |Export-Csv SampleDataSorted.csv ` -NoTypeInformation; "ID","Name","State" "5","Stephanie","Arizona" "4","Melanie","Oregon" "2","Katie","Texas"