Search This Blog

Friday, May 04, 2007

I Didn't Actually Win

A ways back I posted about my great amount of amusement at one of the bugs that showed up on my list at work. Obviously I never got around to posting about what I found when I actually had a chance to investigate the bug.

It turned out to be a mixture of several problems. What turned out to be happening is that the program was crashing (a simple user-mode crash; nothing fancy). However, because a user-mode debugger wasn't installed on that computer, the crash launched the kernel debugger (don't ask my why there was a kernel debugger but not a user mode debugger; I don't know). This kernel debugger, in fact, would halt the entire system and stop at a breakpoint in kernel mode code; debugging could then be done by linking the computer to another computer (the one with the debugger client) with a serial cable. So, thanks to the kernel debugger getting invoked, a common crash got elevated to a complete system halt, complete with hosed hardware.

Annoyed, I installed WinDbg on the computer, and tried it again, with the hope of finding what was crashing. The cause immediately became clear, to my further annoyance: IsBadReadPtr was throwing an access violation. For those not familiar with this function, it consists of establishing a structured exception handling frame, then reading from the supplied pointer. Normally, the access violation is caught by the exception handler and the function merely returns true. But in this case, something was catching the exception before the handler.

That something was AppVerifier - a program offered by MS to perform very strict code checks on a program. While these checks tend to whine a lot about stuff that isn't really a problem, they're helpful in that they can catch things that would normally result in a crash, often in rare circumstances (making the crash very difficult to debug). In this case, AppVerifier was catching the exception too early, and making a fuss about something that couldn't possible have resulted in a crash anyway.

Unfortunately, that wasn't the end of the matter. A quick look at the stack revealed that IsBadReadPtr was being called from an internal Windows function. As this was probably the function checking for an invalid parameter passed to an API function (and thus could potentially mean that my program was passing an invalid parameter to an API function - bad), this meant that I couldn't ignore it.

It turned out to be a bug in the GUI library our company wrote and uses (the author of that library is my arch-nemesis). The list view class contains two image list classes used for checkbox and other icons. What was happening is that, because of the poor architecture of this library (which I fight with regularly), the list view class was being destructed before the list view window itself (actually all windows are like that, in this library). This meant the destruction of all child classes, including the two image lists. Unfortunately, that list view was still USING those image lists, as the class did not unselect them from the list view window before destructing. When the dialog was closed, the list view window was destroyed, and the window attempted to free the image lists (this is the default behavior for list view windows; you can set an option to not automatically free them), and of course the image list pointers were now invalid.

Another day, another fixed bug, another few hundred calories burned laughing.

No comments: