Using PowerShell to shape comma delimited file

PowerShell has power to handle many ETL tasks (PowerShell as an ETL tool). In this post I illustrate how you can use PowerShell to shape a comma delimited file. That is, how to remove columns, filter rows and sort the file output.

Let’s start with a simple file (SampleData.csv):

ID,Name,State,Country
1,Russ,Sandys,Bermuda
2,Katie,Texas,USA
3,Gail,Sandys,Bermuda
4,Melanie,Oregon,USA
5,Stephanie,Arizona,USA

I want to reduce my set to only people in the USA. I want to remove the Country column and I want to sort by State. This is my intended output (SampleDataSorted.csv).

ID,Name,State
5,Stephanie,Arizona
4,Melanie,Oregon
2,Katie,Texas

This would be very easy to do using SSIS. I would need a Flat File Source to read my csv file. I would need a Conditional Split transformation to remove the rows where the Country is not USA. And I would require a Sort transformation. Finally, I would use a Flat File Destination to output only the three columns.

I can do this with PowerShell as well (for reasons why PowerShell as an ETL tool).

So here is my ETL package:

cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  |Sort-Object State `
  |Export-Csv SampleDataSorted.csv ; 
SSIS Equivalent Task PowerShell
Flat File Source Read source Import-Csv -Path SampleData.csv
Conditional Split Filter Where {$_.Country -eq “USA”}
Flat File Destination Reduce the columns SELECT ID, Name, State

Sort Transformation Sort Sort-Object State
Flat File Destination Output CSV Export-Csv SampleDataSorted.csv

But it has a problem. This is the output.

#TYPE System.Management.Automation.PSCustomObject
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"

It would be a major pain to handle the #TYPE header. How do I get rid of this? And how do I remove the double quotes? I type this to learn more about the Export-CSV command:

help Export-CSV

That gives a little information, but the response suggests that I try this:

Get-help Export-Csv -Online

This brings me to a web page with all kinds of information. I learn that the command has a “-NoTypeInformation” that will remove the #TYPE line.

Unfortunately, you can’t remove quotes from the Export-CSV output. This is a minor problem, I will examine solutions for this problem in future posts.

So the final command and its output is:

cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
  | SELECT ID, Name, State `
  |Sort-Object State `
  |Export-Csv SampleDataSorted.csv `
  -NoTypeInformation; 
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in ETL, PowerShell. Bookmark the permalink.

Leave a Reply