Shredding multiline column using xml

I had an ETL project to copydata from one SQL Server to another. The conversion rules, the data constraints and other metadata were in an Excel spreadsheet. Many of the cells consisted of multiple rows of data. But how to extract the rows? I was able to shred these cells into SQL rows by using the SQL Server XML XQuery methods. I convert each line in the cell into an XML, then I use the SQL XML nodes method to shred each element into separate rows.

Before we can work with the xml, we need to do a little bit of housekeeping. We need to convert certain characters to ones that work with XML. For example XML doesn’t work with &, <, and >.  Then you append the opening and closing tag for each row (ie <row>…</row>) around the entire text of the column. Next you replace the end of line character with the closing tag followed by the opening tag (</row><row>). You would end up with a column value that looks like:

<row>First row</row><row>More Rows</row>

Finally you convert the text to the XML data type. You can use the .nodes method to shred the XML into separate records. Finally, you get the value for each row using the .value method.

Here is the script.

declare @tbl table (
col nvarchar(1000)
);

declare @nvalue nvarchar(1000);
declare @i int = 0;

while @i < 10
begin
set @nvalue = N'Row ' + cast(@i as nvarchar(2)) + nchar(13)+ nchar(10)
 + N'Date ' + cast(sysdatetime() as nvarchar(40)) + nchar(13)+ nchar(10)
 + N'Random ' + cast (rand() as nvarchar(40)) + nchar(13)+ nchar(10)
 + N'Control characters & < >';

 insert @tbl values (@nvalue);

set @i += 1;
end

; with xmlRows as(
select t.*, 
cast ( (N'<row>' +
Replace(Replace(Replace(Replace(col, N'&', '&amp;')
, '<', '&lt;') , '>', '&gt;')
, nchar(13)+ nchar(10) , N'</row><row>')
+ N'</row>')
as xml ) as xml 
from @tbl  t

)
select x.*, 
row.data.value ('text()[1]', 'nvarchar(1000)') as rowValue
from xmlRows x 
cross apply x.xml.nodes ('row') as row(data);

This technique will work with other delimiters including tab, comma and pipe.

Leave a Reply