Shaping a comma delimited file with PowerShell

Here is a use case for what I am thinking of. You have comma delimited file that you need to load into SQL Server. It is a very large file that you need to join to an equally large table in a database. For example at one place I worked we had a set of account balances that we needed to join to a customer table that had 15 million rows in it. We used a lookup, but it took about 20 minutes to load the customer table into cache. My thought at the time was whether we could use two data sources, both sorted on the customer id and use a join. Could that have sped up the process?

There would be a problem if you are using a comma delimited file. Sorting the file would necessarily be done in the SSIS process. But what if the comma delimited file had already been sorted before loading in the data source. You could simply set the IsSorted flag and you could use the Merge Join.

But to avoid putting a load on the SSIS Server, why not off load that process on a less busy server? Sort the file then transfer the file to the SSIS Server (or read the file from a shared drive). But if that server is only going to sort a file why install SSIS on that server. Why not use a lighter tool?

Import-Csv -LiteralPath SampleData.csv|Sort-Object Country|Export-Csv SampleDataSorted.csv

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in ETL, PowerShell. Bookmark the permalink.

Leave a Reply