Selective text qualifiers in SSIS

One of the things that messes up processing delimited files is when the delimiter character is actually part of the data rather than acting as a delimiter. So look at the following:

Russ,Loski,Southlake,TX
John,Smith,Jr.,Dallas,TX

The second line has 4 commas but the first has only 3 commas. The second comma is actually part of the data:

First Name:  John
Last Name: Smith, Jr.
City: Dallas
State: TX

One convention to address this issue is to wrap the text with what SSIS calls a “text qualifier.” A text qualifier is normally quotes or double quotes. SSIS can very naturally produce the following:

"Russ","Loski","Southlake","TX"
"John","Smith,Jr.",Dallas","TX"

This format works very nicely with most tools that process delimited text. But frequently the users who view the results only want text qualifiers around columns that have commas in them. They want something like:

Russ,Loski,Southlake,TX
John,"Smith,Jr.",Dallas,TX

I have not found a setting in SSIS that will produce this result. So how do you achieve this goal? You can do this with a script component transformation using regular expressions.

Before I provide the script code to address this, let me provide some additional information about text qualifiers. The text qualifier is put at the beginning and end of a data column that contains the delimiter. If the delimiter is a comma, the column will be surrounded by double quotes if it contains a comma. However, the data in a column may contain double quotes, which might be interpreted as a text qualifier rather than as data. To distinguish double quotes that are text qualifiers from data double quotes, the data double quotes are escaped, normally by repeating the double quotes. ” becomes “”.

If you are trying to produce the file in SSIS, you would need to add a flat file connection manager. Configure the connection manager with no text qualifier. You are going to handle the text qualifier in the data flow.

Add a script component to your data flow. Add the columns that you wish text qualify. They need to be added as read/write columns. Then in the code, you will add two elements:

  • A class variable to hold a regular expression
  • A method that will use this regular expression to add the text qualifiers and replace the double quotes with two double quotes

This is the code that I used:

  System.Text.RegularExpressions.Regex reg = 
        new Regex("((?<quote>\")|(?<comma>,))", RegexOptions.Compiled);
    public string TextQualify(string inp)
    {
        string ret = inp;

        if (reg.IsMatch(inp))
        {
            ret = "\"" + reg.Replace(inp, "${quote}${quote}${comma}") + "\"";
        }
       return ret;

    }

The regex object looks for either double quotes or commas. The double quotes is put into a named group called “quote” and the comma into a name group called “comma.”

The TextQualify method takes a string as its argument. If a double quote or comma is found, the comma is replaced with a comma and the double quotes are doubled. Then the results are wrapped in double quotes.

I found that this produced the results I needed.

Let me know if there is a better way to achieve this.

Leave a Reply