The task I had on my hands was simple. I had to perform lookup on a target table with my source data and get all non matched data. The data that I was fetching for the lookup from target had additional string manipulation done to obtain the required lookup data. I then went on to write a query to fetch that data and testing on a sample subset (using TOP 10) to check for it’s correctness and went on to implement.
I set everything up and when I ran the package the lookup was just now working. The rows that I expected to have a match were simply getting redirected to no-match. When I started debugging, it then became apparent that the query that I had used to retrieve the dataset in the lookup was an incorrect one. This time around I ran the query without the subset and the query failed to get executed showing me the actual error.
Moral of the story is one should not rely on ‘Preview’ results offered from the ‘Lookup’ transform or use a subset when sampling the data. It should be checked to see if the query works for the entire set.
Let me illustrate this with an example.
Below is a sample of Source Data –
FileName Test1.txt Test3.txt
Here is the data that I am doing a lookup on from a table say dbo.LookupTable (a simple Id and FilePath column) –
The string manipulation query for this is as you might have already guessed, to get FileName from the ‘FilePath’.
select Id,RIGHT(FilePath, CHARINDEX('\', REVERSE(FilePath)) -1) as 'FileName' from dbo.LookupTable;<strong> </strong>
I have a simple Data Flow Task which does the following –
- OLE_SRC – Get Source Data – Gets the source data as shown above.
- LKP – Get Id – Using the query above, it fetches the Id and FileName. Note that the moment you put that query and go to ‘Columns’, it will throw up an error. For the purpose of this illustration just ignore it and do the mapping.
- Trash Destination – A priceless open-sourced transformation from Konesans and a must have development aid.
Here is how the package looks like after running it-
As can be seen only 1 row gets shown as matched even though there are 2 matching rows.
The data in this sample illustration is very small but the original data that I had about 100k records and it was difficult to debug on why this error occurred.