SSIS: Using Lookup to not only find records, but add them too

One of the things that I like about providing training is the kind of questions that come up.  Many times they are issues that I have thought about but been just plain too lazy to examine.  This week one of my students asked if it were possible to lookup at record in another table and if the value was not found in that table to create a new record and to then get the surrogate key that SQL generated for that new record.  I had heard somewhere that it was possible to do this with the Lookup transformation (using Partial Cache), but I can’t find out where I had heard this.  So I decided to write my own.

Let’s set the stage.  I have a data source that that has product data.  The table includes two columns for the subcategory:  the key for the subcategory and the subcategory name.  I populate two tables as the target for my data load:  DimProduct and DimProductSubCategory.   When I load the DimProduct table, I need to put the surrogate key for the subcategory into the ProductSubCategoryKey column.

DimProduct DimProductSubCategory
ProductKey ProductSubCategoryKey
ProductAltKey SubCategoryAltKey
ProductSubCategoryKey SubCategoryName

The following query of the Product and ProductSubCategory tables in the AdventureWorks2012 database illustrates this source.

SELECT p.ProductID, p.Name
, p.ProductSubcategoryID, psc.Name AS ProductSubCategory
FROM Production.Product AS p 
 Production.ProductSubcategory AS psc
 ON p.ProductSubcategoryID = psc.ProductSubcategoryID

The primary keys of both of the AdventureWorks2012 tables (ProductID and ProductSubcategoryID) I put into the AltKey columns (ProductAltKey and SubCategoryAltKey) of my new Dim tables.  The Key columns (ProductKey and ProductSubCategoryKey) for the Dim tables are identity columns.  The ProductSubCategoryKey columns in both tables form a foreign key relationship.

Here is the problem.  If I lookup the record in the DimProductSubCategory table and there is no record corresponding to the ProductSubCategoryID and ProductSubCategoryName in the DimProductSubCategory table, how can I insert the record and at the same time get the identify value just populated to use when inserting the DimProduct record?  To do this, I use a little known feature of the Lookup transformation.

The package I created to achieve this has the following Data Flow:

Data Flow for the Lookup update

Data flow

The Data Flow gets the records based on the query above, then uses the Lookup to not only find existing records in the DimProductSubCategory table, but to actually insert new records.  The Lookup provides as output the ProductSubCategoryKey that is inserted into the DimProduct table.

The Columns Tab for the Lookup

The Columns Tab for the Lookup

From looking at this task, it looks like it is going to simply find existing records and be done with it. However, I have made some changes to some of the defaults so that this component no longer just finds records that exist, but rather can add new records to the lookup if the records don’t exist.

First, I set the Cache Mode property on the Lookup to Partial Cache.  What this means is that if a record is found in cache then SSIS will return the values I am seeking.  However, if it does not find the value, it will normally query the SQL Server table directly to find the record.  But here is the magic.  I can change the query that SSIS uses to find the record.  I go to the Advanced tab for Lookup and check the Modify the SQL Statement check box in the Custom query section of the page.  I can then change the SQL Statement used to find the record.

Advanced tab for the Lookup transformation

Advanced Tab

The query will have parameters (marked by question marks “?”) as placeholders and then mapped to the search columns in the Parameters dialog, accessed from the button of that name.

I used the following query:

Declare @SubCategoryAltKey int = ? ;
Declare @SubCategoryName nvarchar(200) = ?;
Declare @SubCategoryKey int;

if @SubCategoryAltKey is not null and @SubCategoryName is not null 
select @SubCategoryKey = max(ProductSubCategoryKey)
from [dbo].[DimProductSubCategory] 
where  [SubCategoryAltKey] = @SubCategoryAltKey
  and  [SubCategoryName] = @SubCategoryName

 if @SubCategoryKey is null
   insert into DimProductSubCategory 
      (SubCategoryAltKey, SubCategoryName)
   output inserted.*
   values (@SubCategoryAltKey , @SubCategoryName)
 select * from [dbo].[DimProductSubCategory] 
 where [ProductSubCategoryKey] = @SubCategoryKey


 Select * from [dbo].[DimProductSubCategory] where 1 = 0;  

The flow of this is that I first get the parameters and put them into SQL variables.  The variable names match the column names that I am using.  Next, I test to see if the columns are both not null.  If either is null, I return an empty result.  However, if both are not null, then I try to find the ProductSubCategoryKey in the DimProductSubCategory table.  If I don’t find it (the @SubCategoryKey is null), then I insert the record using the values I passed in. I return the inserted values (the Output clause of the insert statement).  If I find the ProductSubCategoryKey value, I use that value to return the record that has that value as its key.

A couple caveats.  This strategy will not work well if there is a large number of different lookup values.

I have included the source code for this here.  Let me know if you have questions.

AddWithLookup Source Files

About Russ

Russ Loski is a data warehouse developer focusing on SSIS and SQL Server development. He has worked with SQL Server since 1998. He has used various technologies to move data from one source to others. He has worked for well-known companies such as the Bank of America, the Dallas Cowboys Football Club, T-Mobile and Coca Cola Enterprises. Currently he is developing a Data Warehouse for a large health care organization.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply