Shaping a comma delimited file with PowerShell

Here is a use case for what I am thinking of. You have comma delimited file that you need to load into SQL Server. It is a very large file that you need to join to an equally large table in a database. For example at one place I worked we had a set of account balances that we needed to join to a customer table that had 15 million rows in it. We used a lookup, but it took about 20 minutes to load the customer table into cache. My thought at the time was whether we could use two data sources, both sorted on the customer id and use a join. Could that have sped up the process?

There would be a problem if you are using a comma delimited file. Sorting the file would necessarily be done in the SSIS process. But what if the comma delimited file had already been sorted before loading in the data source. You could simply set the IsSorted flag and you could use the Merge Join.

But to avoid putting a load on the SSIS Server, why not off load that process on a less busy server? Sort the file then transfer the file to the SSIS Server (or read the file from a shared drive). But if that server is only going to sort a file why install SSIS on that server. Why not use a lighter tool?

Import-Csv -LiteralPath SampleData.csv|Sort-Object Country|Export-Csv SampleDataSorted.csv

Leave a Reply

%d