In a previous post, I demonstrate how to reshape a comma delimited file (CSV) using PowerShell. I noted that the results put all of the column values in double quotes:
"ID","Name","State"
"5","Stephanie","Arizona"
"4","Melanie","Oregon"
"2","Katie","Texas"
I used this code to produce this result:
cd c:\MyFolder
Import-Csv -Path SampleData.csv|Where {$_.Country -eq "USA"} `
| SELECT ID, Name, State `
|Sort-Object State `
|Export-Csv SampleDataSorted.csv `
-NoTypeInformation;
In this post, I remove the double quotes.
A Microsoft blog suggests one way to remove these double quotes. They suggest using the ConvertTo-CSV cmdlet rather than Export-CSV and doing a string replace of the double quotes with empty string (code from the article).
dir c:\fso -Filter *.csv | ? {$_.basename -like 'users?'} `
| Import-Csv `
| sort lname,fname `
| convertto-csv -NoTypeInformation `
| % { $_ -replace '"', ""} `
| out-file c:\fso\usersconsolidated.csv -fo -en ascii
I convert my command to:
cd c:\MyFolder
Import-Csv -Path SampleData.csv `
| Where {$_.Country -eq "USA"} `
| SELECT ID, Name, State `
| Sort-Object State `
| ConvertTo-CSV -NoTypeInformation `
| % {$_ -replace '"',""} `
| Out-File SampleDataSorted.csv -fo -en ascii
;
Import-Csv -Path SampleData.csv |
Read source csv |
Where {$_.Country -eq "USA"} |
Filter rows |
SELECT ID, Name, State |
Set output columns |
ConvertTo-CSV -NoTypeInformation |
Create CSV output |
% {$_ -replace '"',""} |
For each row, replace double quotes with empty string |
Out-File SampleDataSorted.csv -fo -en ascii |
Output to file |
This produces the results I expect.
ID,Name,State
5,Stephanie,Arizona
4,Melanie,Oregon
2,Katie,Texas
But here is the problem. ETL scenarios can be messy. CSV files frequently need the double quotes. When a column contains a comma, you need to put the quotes around the column value. For example a file like this is problematic.
ID,Name,State,Country
1,Russ,Sandys,Bermuda
3,Gail,Sandys,Bermuda
4,Melanie,Oregon,USA
5,Stephanie,Arizona,USA
6,"Katie, ""Scott"", Arianna",Texas,USA
I want to keep the quotes around the second column that last row.
Regular expressions to the rescue. The replace command can do regular expression replacement without any additional work. That isn’t to say that building the regular expression isn’t painful.
cd c:\MyFolder
Import-Csv -Path SampleData.csv `
| Where {$_.Country -eq "USA"} `
| SELECT ID, Name, State `
| Sort-Object State `
| ConvertTo-CSV -NoTypeInformation `
| % {$_ -replace `
'\G(?^|,)(("(?