ETL Source: full query or view

When copying data, you generally need to shape the source data before loading it. This may be as simple as joining business keys to dimension tables to get dimension keys. Or the query can be quite complex, involving multiple joins, derived tables, etc. Do you embed the full query in the stored procedure (or data source) to load your target table? Or do you use a view or table valued function? There is almost a religious fervor associated with each position.

I suggest that for the purposes of debugging, using a view/function makes sense.

Here is the situation that I encountered. A stored procedure copied data from staging to our data mart. There were multiple joins between different staging tables, dimension tables and other fact tables. We had to use common table expressions embedded in common table expressions. At the end of this post is a majorly simplified outline of the actual stored procedure.

I want to be able to view results without changing the data. In addition, I would like to be able to compare the results of the old code with the new. And I would like to easily incorporate the revisions that I make in the debug process.

If I replace the hard coded SQL with a table valued function, I could create a new version of the function:

create function MyFunction_test ... as (same code are original)

Now that I have multiple functions, I can run various selects.

select * from MyFunction (@param1, @param2)
select * from MyFunction_test (@param1, @param2)

I keep changing MyFunction_test until I get the results I expect. Then I would alter MyFunction and put the new code into source control.

Alter function MyFunction ... as (revised code)

It would be challenging to do that with hard coded SQL. What I did was to comment out the logging. I commented out the create proc line and converted the parameters to variables. I commented out various begin and end statements. Then when I finally got the query to return the results right I had to paste the code in the right place.

Create proc myproc 
Parameters AS
Calculate variables
Log start
Begin try
Begin transaction
With cte1 as (
Select x, y, z, sum(a) as TotalA
  , Sum(case 
    when b = 1 then c 
    when d = 2 then e 
    else 0 end) as TotalCE
From tbl1
Where f = @param and g = @variable
Group by x, y, z
), cte2 as (
Select x, sum(amt) as totalAmount from stagingtable
Where f = @param
Group by x
Insert facttable
Select d.surrkey
 , cte1.y, cte1.z
 , cte1.TotalA, cte1.TotalCE
 , cte2.TotalAmount
From cte1 inner join cte2 on cte1.x = cte2.x
Inner join dimtable d on cte1.x = d.x
Left outer join facttable f of d.surrkey = f.dimkey
Where f.dimkey is null;
Commit transaction
End try
Begin catch; Rollback; End catch;

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in ETL, SSIS. Bookmark the permalink.

Leave a Reply