I'm an idiot! I spend almost the entire day today screwing around with some multi-threading code that was locking up on occasion. As everyone knows debugging multi-threaded code - running inside of ASP.NET no less - is no trivial matter to work with and debug.
So I'm rebuilding the West Wind Web Connection ISAPI extension as an HttpHandler and a few HttpModules. The component is basically an application service that manages COM servers using a custom pool in order to pass messages to an existing application interface. The original extension is a nasty bit of ISAPI code that I always dread to have to work on. Now with IIS 7 bringing the integrated pipeline that can running managed code I figured it's finally time to move this server interface into a handler/module combination and take advantage of the better maintainability plus a number of features I've been wanting to add to the server interface but just never wanted to attempt in C++ <s>... life's too short to code in C++.
Anyway, it took me about a day to duplicate the core functionality of the ISAPI extension as a handler. The manager basically handles incoming requests from IIS and packages up the request data in the format that West Wind Web Connection requires and then calls a COM server and passes the raw request data into the server which then handles the request and returns a response.
The key to this is a pool of active server instances that can stay persistent and can be managed - a classic thread pool scenario. All of this was super smooth and easy to do. Compared to the C++ this code was a piece of cake.
But as I started testing the server under load I ran into these lockup problems. And right away it got very hard <s>... So I started instrumenting the code in every possible way. Nothing - no clue. Well, not quite as I was going through the code I realized there were a few state leaks of class data bleeding across thread lines - uh, a few oversights there. But this wasn't what was causing these lockups.
The way the code works it uses AutoResetEvent objects to notify a thread in the pool to resume execution. The server starts and immediate goes into a wait state - until the the server's thread gets signaled -a call to the server's Resume method - fired off the ASP.NET thread. When ServerManager.Resume() is called the ServerInstance.RunInternal method that is running in a loop gets signaled:
/// This method runs in a loop and waits for the wait handle to be set
protected void RunInternal()
// *** Let Run() know that server is started
processingHandle.WaitOne(); // wait to be signaled with "Resume"
this.StartTickCount = Environment.TickCount;
this.ResponseOutputString = new StringBuilder();
// *** Check to release this thread and the server
this.ErrorMessage = "Server thread cancelled. Shutting down thread.";
continue; // And exit out!
catch (Exception ex)
this.ErrorMessage = ex.Message;
// *** Signal completion
// *** unsignal the wait handle
this.Active = false;
Note that there are two AutoResetEvent handles - processingHandle (which signals the main thread loop and callCompletionHandle which is used to let the calling thread know when the request is complete.
Now I used AutoResetEvent - which as the name suggests, automatically resets itself. But if you look closely (in bold) I used a manual Reset() on the waitHandle! That one innocuous line is enough to cause this problem that took nearly a day to fix.
The Reset itself doesn't have any real effect since the waitHandle is already automatically reset. However, in some very rare circumstances the interval between the sending the callCompletion wait handle and the call to the processingHandle's Reset call another request might have sneaked in and have already set the processingHandle for a new request, which now is getting unset before the WaitOne() call at the top had a chance to pick it up - and bam I have a hung server instance.
Removing that line of code - which is totally unnecessary - problem solved. Moral of the story - RTFM. Oh and don't ever assume a damn thing about multi-threaded code. The timing of that particular placement of the processingHandler.Reset() is so tight that I would not have suspected it for the failure - the reason I found this was actually because I reviewed the AutoResetEvent docs for any problems I might have overlooked (yeah - blame Microsoft - not <g>) when the 'AUTO' part jumped out at me.
After mucking with this code all day I'm relived enough to have a good laugh at my own expense here <s>...
The good news I have excellent logging throughout the handler now...
BTW, interesting piece of statistics: This implementation actually performs better than the original ISAPI extension, primarily because I was able to build a more efficient pool manager than the one I used in C++ code. The pool basically creates a seperate set of threads and keeps the servers running on these threads so there's no thread switching and marshalling for the COM objects. THe ISAPI code used various nasty COM thread marshalling APIs which works but is fairly expensive because it has to synchronize threads.
Yeah I could have done the same thing in C++ but the prospect of building a thread pool in C++ has been - uh, not worth it in my mind. So I'm happy to see that the this port is actually better performing than the ISAPI code especially since I was somewhat worried about perf.
Other Posts you might also like