SSIS Lookup transformations have this nice feature called caching. This has the potential to give your package a nice performance boost, especially if the values you need to lookup are often repeated in the input (typical for things like codes, …). The type of caching can be (from BOL)
- full – the complete reference dataset is read before the input is processed.
- partial – the transformation limits the size of the cache to a specified size
- none – each lookup results in a new query to the database
Everything perfect so far: switching between the different caching modes is nothing more than just selecting the type of caching you want. However … it seems that the behavior of the Lookup depends on which caching mode is selected. See this entry in Kristian Wedberg’s blog for a full description, but the short of it is that there’s a difference in the way things are compared in SSIS (full caching) and SQL Server (no or partial
caching). Two known situations in which this bug occurs are
varcharstrings – if the string contains spaces at the end
- empty strings
And to make things worse, I seem to be bitten by another variant of this ‘feature’: when I was testing my code to upload a flat-file to the database, the code ran much slower than I expected. When I did some profiling on the database (with SQL Server profiler, nice tool btw), I noticed a lot of identical queries being fired. This kind of surprised me, as I was sure I had enabled caching on my lookup transformations. So, when I double-checked this, I found out that the caching type was set to ‘partial’ because
I had specified an upper limit on the amount of ram the cache could use. Simple enough: I just removed that restriction to switch to full caching et voila … things would be much faster. Alas! Instead of being faster, things now don’t work anymore: the first lookup fails and redirects all rows to its error handler (as designed). I’ve spent the better part of yesterday afternoon trying to find out what is causing this, and have had no luck so far. Maybe looking at it with a fresh head tomorrow will do the trick…
After some further investigation, I seem to have located the problem: according to the post on Kristian Wedberg’s blog, the difference between full and no/partial caching only occurs for padded varchar (and empty strings), but for some reason, I’m getting the same strange behavior with chars! After switching the types of the columns in my database to varchar and then rtrimming the data, my cached lookups now work! I’m gonna download the example package from the blog and see if
I can reproduce my problems there.
The cache type does seem to have impact on char columns also! In my case, some of the columns I’m reading from a fixed-width flat file, need to be transformed by a lookup. The data coming from the file does not contain any spaces, but the lookup table fields are chars (not of the same length as the data in the file). Setting the cache type to ‘full’ causes the lookups to fail (presumably because SSIS is now comparing the data instead of SQL Server, and it uses other ‘equality’
I do find it pretty strange that Microsoft does not consider this to be a bug. Oh well, I’ll just convert all my lookup fields (in the database) to varchars and make sure the data is rtrimmed.