One of the things that I like about providing training is the kind of questions that come up. Many times they are issues that I have thought about but been just plain too lazy to examine. This week one of my students asked if it were possible to lookup at record in another table and if the value was not found in that table to create a new record and to then get the surrogate key that SQL generated for that new record. I had heard somewhere that it was possible to do this with the Lookup transformation (using Partial Cache), but I can’t find out where I had heard this. So I decided to write my own.
Let’s set the stage. I have a data source that that has product data. The table includes two columns for the subcategory: the key for the subcategory and the subcategory name. I populate two tables as the target for my data load: DimProduct and DimProductSubCategory. When I load the DimProduct table, I need to put the surrogate key for the subcategory into the ProductSubCategoryKey column.
The following query of the Product and ProductSubCategory tables in the AdventureWorks2012 database illustrates this source.
SELECT p.ProductID, p.Name , p.ProductSubcategoryID, psc.Name AS ProductSubCategory FROM Production.Product AS p INNER JOIN Production.ProductSubcategory AS psc ON p.ProductSubcategoryID = psc.ProductSubcategoryID
The primary keys of both of the AdventureWorks2012 tables (ProductID and ProductSubcategoryID) I put into the AltKey columns (ProductAltKey and SubCategoryAltKey) of my new Dim tables. The Key columns (ProductKey and ProductSubCategoryKey) for the Dim tables are identity columns. The ProductSubCategoryKey columns in both tables form a foreign key relationship.
Here is the problem. If I lookup the record in the DimProductSubCategory table and there is no record corresponding to the ProductSubCategoryID and ProductSubCategoryName in the DimProductSubCategory table, how can I insert the record and at the same time get the identify value just populated to use when inserting the DimProduct record? To do this, I use a little known feature of the Lookup transformation.
The package I created to achieve this has the following Data Flow:
The Data Flow gets the records based on the query above, then uses the Lookup to not only find existing records in the DimProductSubCategory table, but to actually insert new records. The Lookup provides as output the ProductSubCategoryKey that is inserted into the DimProduct table.
From looking at this task, it looks like it is going to simply find existing records and be done with it. However, I have made some changes to some of the defaults so that this component no longer just finds records that exist, but rather can add new records to the lookup if the records don’t exist.
First, I set the Cache Mode property on the Lookup to Partial Cache. What this means is that if a record is found in cache then SSIS will return the values I am seeking. However, if it does not find the value, it will normally query the SQL Server table directly to find the record. But here is the magic. I can change the query that SSIS uses to find the record. I go to the Advanced tab for Lookup and check the Modify the SQL Statement check box in the Custom query section of the page. I can then change the SQL Statement used to find the record.
The query will have parameters (marked by question marks “?”) as placeholders and then mapped to the search columns in the Parameters dialog, accessed from the button of that name.
I used the following query:
Declare @SubCategoryAltKey int = ? ; Declare @SubCategoryName nvarchar(200) = ?; Declare @SubCategoryKey int; if @SubCategoryAltKey is not null and @SubCategoryName is not null begin select @SubCategoryKey = max(ProductSubCategoryKey) from [dbo].[DimProductSubCategory] where [SubCategoryAltKey] = @SubCategoryAltKey and [SubCategoryName] = @SubCategoryName if @SubCategoryKey is null begin insert into DimProductSubCategory (SubCategoryAltKey, SubCategoryName) output inserted.* values (@SubCategoryAltKey , @SubCategoryName) end else begin select * from [dbo].[DimProductSubCategory] where [ProductSubCategoryKey] = @SubCategoryKey end end else begin Select * from [dbo].[DimProductSubCategory] where 1 = 0; end
The flow of this is that I first get the parameters and put them into SQL variables. The variable names match the column names that I am using. Next, I test to see if the columns are both not null. If either is null, I return an empty result. However, if both are not null, then I try to find the ProductSubCategoryKey in the DimProductSubCategory table. If I don’t find it (the @SubCategoryKey is null), then I insert the record using the values I passed in. I return the inserted values (the Output clause of the insert statement). If I find the ProductSubCategoryKey value, I use that value to return the record that has that value as its key.
A couple caveats. This strategy will not work well if there is a large number of different lookup values.
I have included the source code for this here. Let me know if you have questions.