SQL


I’ve just read this on Ayende’s blog and I’m baffled! I’ve spent quite some time with SSIS, and the most frustrating part is the complete lack of ‘support’ for storing the .dtsx files in a source control system (like subversion). At first glance, everything seems to be ok, as the dtsx files are actually just xml (and thus text) which should be diff’able. However, in practice, it’s a nightmare: every time the file is changed (however small the change you made), a large part of the contents changes and even worse: the changes are clearly not local as they occur all over the place! To make it even better, someone at MS decided it was probably a good idea to store both the ‘code’ (control/data flow logic, …) and the visual representation in the same file! Now even just moving a box 1 pixel messes up the complete file.

And no, the traditional (MS) answer that you should do a reserved checkout simply doesn’t work for 2 reasons

1/ having used the ‘edit-merge-commit’ pattern most of my professional life, I just *refuse* to go back to a checkout-edit-checkin way of working (for one project, I was forced to use VSS, and, to say it nicely, I didn’t enjoy it). This is 2008 you know?!

2/ What about branches and merging? Nope, thought so!

I just cannot imagine that they do not have ‘decent support for teamwork’ as one of their top (non-functional) requirements!! Even the best product (and SSIS clearly is not, IMHO) would be close to useless if it didn’t support this!

To check whether a table already contains an identity column (mostly useful in change scripts), use the following sql code:


SELECT *
FROM information_schema.tables
WHERE table_name = 'your_table_name' AND
OBJECTPROPERTY(OBJECT_ID(TABLE_NAME), 'TableHasIdentity') = 0

On my current project, we’ve been running out of disk space on our development servers. The culprit is SQL Server’s tempdb. During installation, it was left on the c: drive, which is of course our smallest disk. User databases can easily be moved via detach/attach, but this doesn’t work for the tempdb. To move the tempdb, another solution exists. First, use


use tempdb
go
sp_helpfile
go

to see where the tempdb files are located at the moment (you’ll need this path in a moment). Move them to another place with


use master
go
Alter database tempdb modify file (name = tempdev, filename = 'd:\databases\tempdb.mdf')
go
Alter database tempdb modify file (name = templog, filename = 'e:\database_logs\templog.ldf')
go

and restart SQL Server. However, I noticed that even after the restart, the c: disk was still being used for the tempdb! The solution turned out to be to delete the tempdb.mdf and templog.ldf files on the c: disk (see the paths returned by sp_helpfile before the move).

After being hit with numerous collation problems on a previous project, I’ve been very careful about never explicitly stating the collation in my SQL scripts. I just make sure that the database default collation is set correctly, and all tables automatically have the correct collation. Since then, I’ve managed to avoid any collation problems. Until yesterday that is! Some background: I managed to mess up the collation when I was installing SQL Server 2005 on my laptop some months ago. As a result, most of my databases have a different collation than the server they are running on. Normally, this isn’t a problem, but you should be careful when you’re working with temp tables.

When you explicitly create the temp table

CREATE TABLE #MyTempTable (
name varchar(30),
firstname varchar(20)
)

INSERT INTO #MyTempTable(name, firstname)
SELECT name, firstName
FROM somewhere

the table will use the collation of the tempdb (where the table is actually created) not that of your current database! Typically, the collation of tempdb is that of the database server, so in my case, where the database collation differs from the server collation, trying to insert entries from the temp table into another table:

INSERT INTO Person(name, firstName)
SELECT name, firstName
FROM #MyTempTable
EXCEPT
SELECT name, firstName
FROM Person

results in the following error

Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the EXCEPT operation.

The easiest way to fix this is to let SQL Server create the temp table for you via a SELECT INTO statement:

SELECT name, firstName
FROM somewhere
INTO #MyTempTable

The temp table will still be created in tempdb, but it’s fields will take the collation from their source (in this case, the fields in the <somewhere> table.

Of course, the SELECT INTO statement has the additional benefit that you don’t have to keep the temp table structure up-to-date manually!

On my current project, we’re using SSIS quite extensively for uploading data (text files) from the mainframe to our database. Although the body of the work is mostly finished, I still had to bundle all the work in an easy to run package. Until now, I had always manually run all the different packages that are required for the upload. So last week, I spent some time creating a ‘master’ package that calls all the (child) packages in the correct order. I first edited all my child packages to use parent configuration: this way, I can specify the configuration at the top-level, and all child-packages pick up this setting automatically (and I still can run the packages by themselves if I want). I then created the parent package: it consists of a number of ExecutePackage tasks, which just pass control the configured child package. I thought that all of this could be completed in a few hours, but alas, when I tried to run the parent package, I almost immediately got the following error:

Error: Error 0xC0012050 while loading package file “<SomeChildPackage>.dtsx”. Package failed validation from the ExecutePackage task. The package cannot run. .

I hoped the child package’s output would show more information on what went wrong, but nope, nothing. After some experiments, I found out that I could get things to work by setting the ExecuteOutOfProcess property of the ExecutePackageTask to true. This forces SSIS to create a separate process for each child package to run in. However I noticed that these processes are not cleaned up quickly: several minutes after the parent package had completed, I still saw several dthost.exe process running, each taking a considerable amount of memory. Btw, I half expected this to be the case, as I remembered this post.

So, although I now had a work-around, I still was confused as to why my packages wouldn’t work if I ran them in the same process as their parent. I didn’t want to spend more time on it, so I went to create some other parent packages (for other parts of the project). I tried running one before I had set the ExecuteOutOfProcess property of each child-package, and behold, some (but not all) of the child-packages did run! I compared the properties of the child packages (both of the ExecutePackage task as of the child package itself), and found that the only difference between a working and a non-working child package was that the DelayValidation property of the non-working ones was set to true! Changing it to false solved the problem!

So, the short version of this is: if you get the above error when running a child-package with the ExecutePackage task, make sure that DelayValidation is set to false on the child-package definition (the package itself, not the ExecutePackageTask)!

I just found a cool new feature in Sql Server 2005: the OUTPUT statement: for example, the following query

INSERT INTO Transport(transportTypeCode, departureCodeWay, arrivalCodeWay, departureCodeBack, arrivalCodeBack)
OUTPUT INSERTED.transportId
VALUES ('vl', 'BRU', 'LAX', 'LAX', 'BRU')

returns:

|transportID|
|1|

So you no longer have to add a

SELECT @@SCOPE_IDENTITY

to retrieve the id of the last inserted record (in the current scope)!

Today, I finally came around to fixing a bug that had been in our SSIS packages for a while. We've implemented a logging infrastructure based on this article from Jamie Thompson. In short, we use event-handlers to log audit info and errors to our database. We also use a custom task to read some configuration settings from file – stuff like database connection strings and so (The reason we use a custom
task is that I didn't really like the configuration features built into SSIS, but that's a topic for another time). To make sure that the custom task was executed before any other task, I just added it to the pre-execute event of the package itself. Knowing that SSIS events 'bubble up' through the event-handler hierarchy, I put all of the other tasks in a Sequence Container and blocked the event-bubbling on that level (set System::Propagate to false on the sequence container's pre-execute event handler).
Based on the information I had, this should do the trick. Unfortunately, when I ran my package, I noticed that my custom task still was called multiple times.

Today, I finally took some time to find out what was happening. I always had assumed that the System::Propagate variable wasn't set correctly or something like that. So I set-up a small package to test the event-propagation: the package consists of the following control-flow:

Control Flow

Two event handlers are defined: one on the package and one on the sequence container. Both handlers contain a script that just dump the sourcename (ie. the task that caused the event) to the output. The execution order I expected is:

1. Package starts
    EventHandlerTest.onPreExecute event-handler

2. Sequence container starts

MySequence.onPrexecute event-handler

EventHandlerTest.onPreExecute event-handler

3. DummyScript starts

MySequence.onPrexecute event-handler (and stops event from propating to package event handler)

However, when I run this package (with System::Propagate=false on the sequence container), I was surprised by the output:

1. EventHandlerTest.onPreExecute: EventHandlerTest
2. EventHandlerTest.onPreExecute: MySequence PreExecute
3. MySequence.onPreExecute: SEQC MySequence
4. EventHandlerTest.onPreExecute: MySequence PreExecute
5. MySequence.onPreExecute: Dummy Script

The first line is ok: the package event handlers catches the pre-execute event of the package itself. The second line however is one I didn't expect: it shows the package event handler handling the event thrown by the sequence container's event handler! So the tasks defined in an event-handler also throw events! Although this makes perfect sense from a technological point of view (after all, an event-handler is just another taskcontainer), I still find it strange that the event-handler causes
an event by itself! Unfortunately, I have found no way to disable these events.

So, it turned out that the System::Propagate was correctly defined. The event-handlers I had defined on the container level were causing the custom configuration task to be loaded multiple times. The workaround was then to move the custom task to the main body of the control flow, but outside of the sequence container (to be sure that the configuration is loaded before any of the sequence container event-handlers are fired). Simple enough, just took some time to figure this one out!

Next Page »