Using PowerShell to shape comma delimited file

PowerShell has power to handle many ETL tasks (PowerShell as an ETL tool). In this post I illustrate how you can use PowerShell to shape a comma delimited file. That is, how to remove columns, filter rows and sort the file output.

Let’s start with a simple file (SampleData.csv):

ID,Name,State,Country
1,Russ,Sandys,Bermuda
2,Katie,Texas,USA
3,Gail,Sandys,Bermuda
4,Melanie,Oregon,USA
5,Stephanie,Arizona,USA

I want to reduce my set to only people in the USA. I want to remove the Country column and I want to sort by State. This is my intended output (SampleDataSorted.csv).

ID,Name,State
5,Stephanie,Arizona
4,Melanie,Oregon
2,Katie,Texas

This would be very easy to do using SSIS. I would need a Flat File Source to read my csv file. I would need a Conditional Split transformation to remove the rows where the Country is not USA. And I would require a Sort transformation. Finally, I would use a Flat File Destination to output only the three columns.

I can do this with PowerShell as well (for reasons why PowerShell as an ETL tool).

So here is my ETL package:

cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  |Sort-Object State `
  |Export-Csv SampleDataSorted.csv ; 
SSIS Equivalent Task PowerShell
Flat File Source Read source Import-Csv -Path SampleData.csv
Conditional Split Filter Where {$_.Country -eq “USA”}
Flat File Destination Reduce the columns SELECT ID, Name, State

Sort Transformation Sort Sort-Object State
Flat File Destination Output CSV Export-Csv SampleDataSorted.csv

But it has a problem. This is the output.

#TYPE System.Management.Automation.PSCustomObject
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"

It would be a major pain to handle the #TYPE header. How do I get rid of this? And how do I remove the double quotes? I type this to learn more about the Export-CSV command:

help Export-CSV

That gives a little information, but the response suggests that I try this:

Get-help Export-Csv -Online

This brings me to a web page with all kinds of information. I learn that the command has a “-NoTypeInformation” that will remove the #TYPE line.

Unfortunately, you can’t remove quotes from the Export-CSV output. This is a minor problem, I will examine solutions for this problem in future posts.

So the final command and its output is:

cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  |Sort-Object State `
  |Export-Csv SampleDataSorted.csv `
  -NoTypeInformation; 
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"

Leave a Reply

%d