Search This Blog

Thursday, July 20, 2006

& Lessons from the Morning Meeting

Thou shalt not play with a screwdriver like a sword without verifying that the bit is locked, lest ye nail your boss on the other side of the room in the face with a 4", extra-long screwdriver bit.

Sunday, July 16, 2006

The Animation Control Incident

So, I'm going about my business at work, working on my project. What I'm working on, generally, is a configuration wizard for the product we make. It tries to detect as much information as possible, and then let's the user review/change the setting before configuring/activating the program. One of the neat features it has (which I may blog about more in depth) is a mechanism of asynchronous data prefetch. As some of the detection methods can take several seconds (or even as long as 15 seconds, if a computer it's trying to reach on the network is offline), this hides that time by allowing the user to use the wizard while info is gathered in the background, ideally preventing the user from ever knowing how much time was spent gathering information.

Anyway, I needed a "please wait" dialog for when the user flips to a page whose data isn't loaded yet (and also to use for times when it has to spend time validating the user's input). The company has a nice little AVI to use for such occasions, so I threw it on a dialog with an animation control (the standard Windows common control). However, testing this dialog revealed something odd - once you told it to play with Animate_Play/ACM_PLAY, it would show the first frame for several seconds, until it finally began to play the animation.

A ridiculous amount of fidgeting with the parameters (and the source to the company's Animation control wrapper class) later, I was still unable to find anything I was doing wrong to cause this (although I did manage to observe that it wasn't just the first time - any time the window was completely obscured, this lag occurred). I asked Skywing about it, and about another weird thing I'd seen in the animation control class the company has. He said he'd seen the delay before, too, and had looked into it a bit, but was unable to locate the cause (and he said I should try and figure it out). He said that it seemed to be completely hiding the first iteration of the AVI.

So I looked at the control. It was creating a decoding thread, which looked like this:

.text:5D0AEC6D ; DWORD __stdcall PlayThread(LPVOID)
.text:5D0AEC6D _PlayThread@4 proc near
.text:5D0AEC6D
.text:5D0AEC6D arg_4 = dword ptr 8
.text:5D0AEC6D
.text:5D0AEC6D mov edi, edi
.text:5D0AEC6F push ebp
.text:5D0AEC70 mov ebp, esp
.text:5D0AEC72 push esi
.text:5D0AEC73 mov esi, [ebp+arg_4]
.text:5D0AEC76 push 1
.text:5D0AEC78 push esi
.text:5D0AEC79 call _DoNotify@8 ; DoNotify(x,x)
.text:5D0AEC7E push esi
.text:5D0AEC7F call _HandleTick@4 ; HandleTick(x)
.text:5D0AEC84 test eax, eax
.text:5D0AEC86 jz short loc_5D0AECB7
.text:5D0AEC88 push edi
.text:5D0AEC89 mov edi, 0FA0h
.text:5D0AEC8E mov ecx, [esi+5Ch]
.text:5D0AEC91 test ecx, ecx
.text:5D0AEC93 jz loc_5D0B8C48
.text:5D0AEC99 test eax, eax
.text:5D0AEC9B mov eax, [esi+44h]
.text:5D0AEC9E jl loc_5D0AF8BA
.text:5D0AECA4 push eax ; dwMilliseconds
.text:5D0AECA5 push ecx ; hHandle
.text:5D0AECA6 call ds:__imp__WaitForSingleObject@8 ; __declspec(dllimport) WaitForSingleObject(x,x)
.text:5D0AECAC push esi
.text:5D0AECAD call _HandleTick@4 ; HandleTick(x)
.text:5D0AECB2 test eax, eax
.text:5D0AECB4 jnz short loc_5D0AEC8E
.text:5D0AECB6 pop edi
.text:5D0AECB7 push 2
.text:5D0AECB9 push esi
.text:5D0AECBA call _DoNotify@8 ; DoNotify(x,x)
.text:5D0AECBF xor eax, eax
.text:5D0AECC1 pop esi
.text:5D0AECC2 pop ebp
.text:5D0AECC3 retn 4
.text:5D0AECC3 _PlayThread@4 endp

.text:5D0AF8BA add eax, edi
.text:5D0AF8BC jmp loc_5D0AECA4


HandleTick is what draws each frame (we'll get back to this in a minute). It was trivial to determine that the problem was due to WaitForSingleObject spending several seconds waiting before timing out. But that didn't explain why, or what it was waiting on.

By stepping through the loop I observed that the 5D0AEC9E-5D0AF8BA-5D0AF8BC-5D0AECA4 route was being taken. The value loaded from the struct was 100 ms, and it was getting 4000 ms added to it. I got out my cell phone (with built-in stopwatch) and verified that the drawing delay was exactly 4.1 seconds - not the 2.8 seconds Skywing predicted (the length of the animation). However, as best I could tell, it was doing this every single iteration. So why was there only a single gap in the animation?

As you can see in the disassembly, the 4 s delay path is only taken when HandleTick returns a negative number (which it was, in this case, returning -1). Looking at this function revealed the following of interest (I'm not pasting the whole function, here):

.text:5D0AECE7 push eax ; lpCriticalSection
.text:5D0AECE8 mov [ebp+lpCriticalSection], eax
.text:5D0AECEB call ds:__imp__EnterCriticalSection@4 ; __declspec(dllimport) EnterCriticalSection(x)
.text:5D0AECF1 push dword ptr [esi] ; hWnd
.text:5D0AECF3 call ds:__imp__GetDC@4 ; __declspec(dllimport) GetDC(x)
.text:5D0AECF9 mov ebx, eax
.text:5D0AECFB lea eax, [ebp+var_10]
.text:5D0AECFE push eax ; LPRECT
.text:5D0AECFF push ebx ; HDC
.text:5D0AED00 call ds:__imp__GetClipBox@8 ; __declspec(dllimport) GetClipBox(x,x)
.text:5D0AED06 cmp eax, 1
.text:5D0AED09 jz loc_5D0AF8A5

.text:5D0AED43 push ebx ; hDC
.text:5D0AED44 push dword ptr [esi] ; hWnd
.text:5D0AED46 mov edi, eax
.text:5D0AED48 call ds:__imp__ReleaseDC@8 ; __declspec(dllimport) ReleaseDC(x,x)
.text:5D0AED4E push [ebp+lpCriticalSection] ; lpCriticalSection
.text:5D0AED51 call ds:__imp__LeaveCriticalSection@4 ; __declspec(dllimport) LeaveCriticalSection(x)
.text:5D0AED57 pop ebx

.text:5D0AF8A5 mov eax, [esi+50h]
.text:5D0AF8A8 mov [esi+48h], eax
.text:5D0AF8AB xor eax, eax
.text:5D0AF8AD cmp [esi+4Ch], edi
.text:5D0AF8B0 setnz al
.text:5D0AF8B3 neg eax
.text:5D0AF8B5 jmp loc_5D0AED43

The jump at 5D0AED09 was being taken, resulting in eax getting set to -1 ([esi+4Ch]/-1 != edi/0). That one REALLY threw me. At first I thought that GetClipBox was succeeding, and as a result HandleTick was failing. But in fact GetClipBox has the following return values:
#define ERROR 0
#define NULLREGION 1
#define SIMPLEREGION 2
#define COMPLEXREGION 3

And everything becomes clear. If the animation's window is completely covered by another window, GetClipBox returns NULLREGION. If GetClipBox returns NULLREGION, HandleTick returns -1. If HandleTick returns -1, the timeout duration gets 4 s added to it.

Actually, there are two more pieces of information we still require to completely crack this case. The first was provided by Skywing and WinDbg - the event being waited on is actually a termination signal. When it's time for the animation control to stop playing, this event is set (among other things), short-circuiting the loop. This means that the value of [esi+44h] at 5D0AEC9B determines the rate at which frames are drawn. This is a bit of deviation from the most common use of events, where the event being set - not the timeout expiring - is the expected result.

It seems likely that the 4s addition is a backoff case. If the animation control window becomes completely obscured, there's no need to draw the frame, and the rendering thread stalls itself (excessively, if you ask me; I probably wouldn't have given more than a 1 second timeout).

So, now we know why the delay. That just leaves the question of why this timeout executes every time at the very beginning. After thinking about it a moment, the answer came to me: it's a consequence of how (or where) we're playing the animation - in the WM_CREATE (for windows) or WM_INITDIALOG (for dialogs) message handler. This is exactly where initialization that requires the window to already be created is supposed to go. Now, here's the trick: at this point, the window has been created - but it is still not visible (obviously - you want to do initialization BEFORE the window appears on screen). Since the rendering is done in a separate thread, this thread can execute concurrently with the UI thread. If the rendering thread gets to execute before the WM_CREATE/WM_INITDIALOG handler returns, and the window is shown, the rendering thread will go into timeout.

Thursday, July 13, 2006

& Poor Electrical Layout

We have a quote database set up, here at work. We enter humorous quotes various people in the company have said at one point or another, and Bugzilla shows a random quote on every page. This one just hit the database, during the cubicle reorganization today:
"That explains why that's so fucking hot all the time!" - Wayne, referring to his power strip at the end of five power strips daisy-chained together, with a total load of 14 computers, 10 monitors, and various network routing equipment (in other words, the entire developer area) plugged into a single 20-amp wall socket

Monday, July 10, 2006

Another One for the Blacklist

So, I was running IE to use the web-page-based version of our product (our product is actually pretty useful, although some of the horribly buggy internal testing tools we have to use give me nightmares, and because of such problems I can no longer run the standalone version of the program). Anyway, when I finished using IE and closed it, WinDbg popped up and notified me that IE just blew up.

However, the way in which it blew up was rather unusual: it ran out of stack space, before which it threw about 1200 exceptions. In fact, it seemed apparent that it was throwing exceptions from the exception handler, as the stack revealed an equal number of KiUserExceptionDispatcher stack frames, one on top of the next.

Consulting with Skywing revealed that this was the case, as the exception it was throwing repeatedly was STATUS_NONCONTINUABLE_EXCEPTION, indicating that the exception handler was trying to continue execution following a noncontinuable exception, resulting in another exception, which tries to continue execution, and so on, until the thread runs out of stack and goes boom.

However, looking waaaaaay back in the log, we can see that the original exception thrown is some unusual (not built-in) exception 0x0eedfade. As well, it appears that the original exception was thrown by RaiseException, which was called from comcasttoolbar.dll, meaning it was probably intentionally thrown, as a C++-style exception.

So, what exactly went wrong? Well, let's start by looking at the immediate cause: the exception handler that caught the exception.

00f4179d xor eax,eax
00f4179f push ebp
00f417a0 push 0xf418b8
00f417a5 push dword ptr fs:[eax]
00f417a8 mov fs:[eax],esp

That's the one, right there - 0xf418b8. However, looking in this function, it always returns ExceptionContinueSearch before the program goes boom (and in fact it stops getting called well before that); that's pretty much a positive indication that this isn't the exception handler we're looking for.

Okay, well, that wasn't quite what I was expecting (was expecting the immediate cause to be as idiotic and immediately obvious as the ultimate cause). Anyway, moving on up [the exception handler chain]... Breakpointing on the call to the handler in NTDLL reveals that the next handler is 0xf24515, which is... also not what we're looking for. Okay... fast forward a bit (9 exception handlers, to be precise)... and we come to 0xf24c64, which returns ExceptionContinueExecution; that's the one we're looking for. Let's have a look...

00f24c64 mov eax,[esp+0x4]
00f24c68 test dword ptr [eax+0x4],0x6
00f24c6f jne COMCAS_1+0x4cfe (00f24cfe)
00f24c75 cmp byte ptr [COMCAS_1!DllUnregisterServer+0xc76c8 (0108c028)],0x0
00f24c7c ja COMCAS_1+0x4c8d (00f24c8d)
00f24c7e lea eax,[esp+0x4]
00f24c82 push eax
00f24c83 call UnhandledExceptionFilter (00f21340)
00f24c88 cmp eax,0x0
00f24c8b jz COMCAS_1+0x4cfe (00f24cfe)
...do a bunch of stuff that checks for exception code 0x0eedfade, and results in the program being terminated with an error message...
00f24cfe xor eax,eax
00f24d00 ret

The branch at 00f24c8b gets taken, resulting in eax (the return value) getting set to 0 (ExceptionContinueExecution), causing Windows to attempt to continue execution. If I intercept that branch and cause it to not be taken, a message box that says "Runtime error 217 at 00F4182E" appears, and clicking on OK results in IE terminating relatively normally.

What this code is doing is checking what the default exception handler behavior is. UnhandledExceptionFilter returns EXCEPTION_EXECUTE_HANDLER if Windows has displayed an exception dialog. This is the default behavior most of the time; but when there is a just-in-time debugger installed (like on my computer), UnhandledExceptionFilter starts up the debugger and returns EXCEPTION_CONTINUE_SEARCH. In response to this, the exception handler is supposed to return ExceptionContinueSearch, so that the exception will be rethrown and caught by the debugger. But this function isn't returning ExceptionContinueSearch - it's returning ExceptionContinueExecution. Why? Well, let's take a look at what _except_handler returns:

typedef enum _EXCEPTION_DISPOSITION {
ExceptionContinueExecution = 0,
ExceptionContinueSearch = 1,
ExceptionNestedException = 2,
ExceptionCollidedUnwind = 3
} EXCEPTION_DISPOSITION;

And here's what UnhandledExceptionFilter returns:

#define EXCEPTION_EXECUTE_HANDLER 1
#define EXCEPTION_CONTINUE_SEARCH 0
#define EXCEPTION_CONTINUE_EXECUTION -1

It sure looks like the coder of this exception handler got the two mixed up, and returned ExceptionContinueExecution when they thought they were returning EXCEPTION_CONTINUE_SEARCH. However, that'd be a bit weird, as you normally do not code _except_handler functions at all - the compiler builds them for you you, based on the Windows __try/__except/__finally syntax or C++ try/catch syntax.

Doing some looking at the DLL in a hex editor suggests that it was written in Delphi (either that or Borland C++ Builder). Looking at other exception handlers, I'm really unsure whether or not this exception handler is kosher. There are three other exception handlers that call UnhandledException filter. They all start out with the following basic logic:

.text:00404A3C mov eax, [esp+4]
.text:00404A40 test dword ptr [eax+4], 6
.text:00404A47 jnz loc_404AF8
.text:00404A4D cmp dword ptr [eax], 0EEDFADEh
.text:00404A53 cld
.text:00404A54 call @System@_16542 ; System::_16542
.text:00404A59 jz short loc_404A82
...one of the handlers has some extra logic here...
.text:00404A5B cmp byte_56C02C, 0
.text:00404A62 jbe short loc_404A82
.text:00404A64 cmp byte_56C028, 0
.text:00404A6B ja short loc_404A82
.text:00404A6D lea eax, [esp+4]
.text:00404A71 push eax
.text:00404A72 call UnhandledExceptionFilter
.text:00404A77 cmp eax, 0
.text:00404A7A jz short loc_404AF8
.text:00404A7C mov eax, [esp+4]
.text:00404A80 jmp short loc_404A94
...handle Delphi exceptions here...
.text:00404AF8 mov eax, 1
.text:00404AFD retn

Notice what it's doing. It first checks the exception flags. If neither flag is set, it returns ExceptionContinueSearch; otherwise, it prepares to execute the handler (that's what the CLD and call to System::_16542 are), tries to crack the exception (with special care taken to looking for the 0xEEDFADE code), and calls UnhandledExceptionFilter only if it doesn't think it's one of the Delphi exceptions. Not only that, but they all return ExceptionContinueSearch as the default return value. Now compare all of that behavior with our misbehaving handler encountered earlier.

So, in the end we end up not knowing much. Our misbehaving handler in some ways resembles the other handlers, but in other ways not. That's the evidence, but it's not enough to conclusively determine whether this was generated by the compiler or by a person. In either case, I'm fairly confident that it's wrong.

*ahem* Moving on... what do you suppose that exception was, and who threw it, anyway? Here's who (421829, to be precise):

.text:004217D1 mov ecx, eax
.text:004217D3 xor edx, edx
.text:004217D5 mov eax, ebx
.text:004217D7 call unknown_libname_85
.text:004217DC cmp dword ptr [ebx+4], 0
.text:004217E0 jge loc_421893
.text:004217E6 lea edx, [ebp+var_18]
.text:004217E9 mov eax, esi
.text:004217EB call @Sysutils@ExpandFileName$qqrx17System@AnsiString ; Sysutils::ExpandFileName(System::AnsiString)
.text:004217F0 mov eax, [ebp+var_18]
.text:004217F3 mov [ebp+var_14], eax
.text:004217F6 mov [ebp+var_10], 0Bh
.text:004217FA call GetLastError_0
.text:004217FF lea edx, [ebp+var_1C]
.text:00421802 call sub_40E28C
.text:00421807 mov eax, [ebp+var_1C]
.text:0042180A mov [ebp+var_C], eax
.text:0042180D mov [ebp+var_8], 0Bh
.text:00421811 lea eax, [ebp+var_14]
.text:00421814 push eax
.text:00421815 push 1
.text:00421817 mov ecx, off_56ED00
.text:0042181D mov dl, 1
.text:0042181F mov eax, ds:off_41B358
.text:00421824 call @Sysutils@Exception@$bctr$qqrp20System@TResStringRecpx14System@TVarRecxi ; Sysutils::Exception::Exception(System::TResStringRec *,System::TVarRec *,int)
.text:00421829 call sub_404B00
.text:0042182E jmp short loc_421893

The branch at 4217DC seems to be not taken. Looks like unknown_libname_85 (among other things) stores its ecx param in its [eax+4], the latter being what gets checked at the compare in this block. This value comes from just above the previous block:

.text:004217B2 push 0
.text:004217B4 push 80h
.text:004217B9 push 2
.text:004217BB push 0
.text:004217BD push 0
.text:004217BF push 0C0000000h
.text:004217C4 mov eax, esi
.text:004217C6 call @System@@LStrToPChar$qqrv ; System::__linkproc__ LStrToPChar(void)
.text:004217CB push eax
.text:004217CC call CreateFileA_0

So... we have an exception getting thrown when CreateFile fails, where CreateFile tries to open the file for both reading and writing. Are you pondering what I'm pondering? Let's take a peek at what file it's failing to open, and find out if I'm right. Step into ExpandFileName and... "C:\Program Files\COMCASTTOOLBAR\Cache\COMBOSEARCH.acs". The shock is enough to kill you - it's a directory that limited users can only read from, and I'm a limited user.

So, in the end it turns out that we are indeed dealing with an idiot (or maybe more than one). Yeah, I know I never got around to the series on writing limited user-friendly programs (it's still on the todo list), but here's the gist of it: don't write outside the box, and Program Files sure isn't in your box (although I suppose that's not quite as bad as one guy at work I nailed for writing some temporary files in the Windows directory).

That leaves just one more question to answer (at least, one more to answer that we actually can answer): have they fixed this, yet? This Comcast toolbar was installed on this laptop when I got it from work. I have no idea how recent it may be. So let's see about downloading the latest version of the Comcast toolbar, and checking if it plays nicely, this time.

My version of the toolbar is 4.0.2.201. The most recent version is... 4.0.2.201. But just to make sure, let's run IE one last time, after intalling the newest version. It looks like the toolbar failed to even start up, staying in some kind of "loading..." mode (though at least it didn't crash). I bet that's due to exactly the same problem, where it's trying to write to the Program Files folder the first time it's run after installation. Running it once as admin confirms this. And running it one more time as a limited user confirms that the obnoxious crash indeed remains in the latest version.

So, I give it a D+ for limited user compatibility.

Sunday, July 09, 2006

Limited User Compatibility Grade

For a while now (a minute or two here or there) I've been thinking about how to make a limited user compatibility grade scale, based on how well programs run under a limited (non-admin) account. Here's what I'm thinking at the moment:
A - program works perfectly (after installation)
B - program has tolerable bugs (bugs that can be worked around or ignored) in minor features
C - program has blocking bugs (bugs that pretty much prevent use of a feature) in minor features, or tolerable bugs in major features
D - program runs to some extent, but has blocking bugs in some major features
F - program does not run at all

Comments/ideas?

Saturday, July 08, 2006

& Backlog

Man, looking at some of the stuff I started on this block, I've got a buttload of stuff I need to post. At least four series I started but never finished, and it's been aeons since I posted anything on LibQ (heck, it's been a few months since I've even worked on LibQ). I also have a couple of recent stories from work and some of the stuff I've done on my days off. As well, I might still post a few more case analyses of languages that weren't initially in the list, such as Chinese (Cantonese, to be specific) and a couple of extra-special mystery languages.

Also, I just thought I'd list some of the other blogs I read but aren't on the links to the left (not that I read all of those; some are just because they belong to friends), to go with the blogs I mentioned in the last post.

- Neo-neocon - an intellectual blog that often covers politics, and was known for having interesting (and lengthy) debates between those two agree and those that don't agree with the neocon perspective. It's been pretty boring since they banned all the "trolls" a week or so back, but a couple opponents have shown up the last day or two, and at least one thread has broken out into debate.
- Musings of a Palestinian Princess - life in the hot zone (the West Bank). Can be interesting when debates between Israelis and Palestinians (and their supporters) get into debates.
- Baghdad Burning - life in the other hot zone. A bit on the cynical side. Unfortunately, no commenting allowed, although if you read a bit, you can probably guess why.
- Narges - life in the "axis of evil" (Iran; noticing a theme, here?). This one's not like the others in that she is a programmer who is going for her masters' degree, and doing research on machine translation.
- Informed Comment - the blow by blow(-up) in Iraq. Middle-eastern news from a rather depressing, yet informative perspective.

As this is kind of a grab bag post, you shouldn't be surprised to see other things pop up on it, as I remember other things.

And Then There Were... Shoot, I've Lost Count

Skywing just created a blog, after much prodding by me and my boss. Unfortunately, he didn't join me on this blog, like I suggested (him and one other person I was looking to acquire). Also, my boss has had a blog for quite a while, although for some reason the contents don't interest me as much as some other blogs (or maybe I just like to write more than I like to read). Also, Ryan Govostes, one of the people I know on MSN, and who reads my blog, has had a blog for a while. I haven't looked at that one too much, but this most recent post (from a month ago) looks interesting. Too bad the blog sucks. And by that I mean he was too lazy to make any kind of a listing of recent posts or archive of old posts, and leaves you with "previous" and "next" on each post to navigate the posts.

While we're kinda-sorta on the topic of my job, here's a fact for you: 14% of all of the MS driver development MVPs in the world work at the company, and 50% of the people in the world who have gotten the award more than once.

Windows Structured Exception Handling

This post is a bit of a backgrounder for another post I've already written most of (but haven't posted yet). That post deals almost entirely with Windows Structured Exception Handling (SEH). As at least half of the people I know who read my blog are from inferior operating systems like Unix (half kidding) I thought I should start out by describing SEH.

C++ provides fairly robust error handling mechanisms. For example, you could do the following:
try
{
int *pInt = new int[some_number];
}
catch (bad_alloc)
{
cout << "Out of memory" << endl;
}

This would allow you to catch the out of memory exception and handle the condition. Of course, in C++ you could just as easily do:
int *pInt = (int *)malloc(some_number * sizeof(int));
if (pInt == NULL) printf("Out of memory\r\n");
In this case, you have options - you can either use the old C functions that indicate errors with special return values, or use the new C++ exception handling. However, there exists another class of exceptions - hardware exceptions. These include things like divide by zero, access violations, attempts to execute invalid instructions, etc. But neither C nor C++ provide facilities for handling these. POSIX provides a primitive method for handling these exceptions - when such an exception occurs, the OS transfers the thread in which the exception occurred to an exception handler function. This handler function can then display an error message and terminate the thread/program or attempt to recover using the tricky setjump/longjump.

Windows provides a dramatically superior method of handling hardware exceptions, and that is SEH. SEH is very similar to the C++ exception model: when a faulting instruction is executed, an exception is thrown, and the program searches for a suitable exception handler. If such a handler can't be found in the function that threw the exception, the calling function is checked; if not the calling function, the caller's calling function; and so on. Eventually either a suitable exception handler is found or the default handler is used; in either case, the stack is "unwound", destructing classes for each function, and executing cleanup code in each function in which a handler wasn't found, and the exception handler is ultimately executed, with the machine in a state as if the function containing the handler had jumped directly into the handler.
__try {
int *pInt = NULL;
*pInt = 0; // Boom
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
printf("Access violation");
}

The syntax of SEH exceptions (shown above) is almost identical to that of C++ exceptions, and, as a matter of fact, Windows C++ compilers use Windows SEH to implement C++ exceptions. One of the definitive guides on how SEH works is by (*shock and awe*) Matt Pietrek; but I'll give a brief summary of the process.

Each thread maintains a stack of exception handler records called EXCEPTION_REGISTRATIONs. This is a singly linked list of EXCEPTION_REGISTRATION structures constructed on the stack, where the head pointer is contained in an important thread-local storage structure called the Thread Environment Block (TEB). One of these exception handler records is created by the __try statement, and destroyed after the __except block. Each of the links contains a pointer to an _except_handler function - the exception handler for that block.

When an exception occurs, it is caught by the Windows kernel exception trap. Windows checks for certain types of exceptions, such as accessing a page of memory that is allocated, but currently paged to disk, and may handle the exception right then (in that example, Windows would respond by reading the page off disk, then retrying the instruction that caused the exception).

If it's not one of the special case exceptions, Windows transitions to user mode, and starts walking the list of exception handlers for the thread, calling each one with detailed information about the exception, looking for one that will handle the particular exception. The return value of the _except_handler tells Windows what to do next. ExceptionContinueSearch tells Windows that the exception handler does not want to handle the exception, and that Window should keep looking for a handler. ExceptionContinueExecution tells Windows to resume execution of the thread; this may or may not be at the same place that caused the exception, as the exception handler can alter the registers and memory of the thread.

A third option is that the exception handler calls the Windows RtlUnwind function. This function unwinds the stack to get to the desired stack frame - the stack frame of the function with the __except block that is going to get executed. Once this process is completed, the exception handler can JMP to the code in the desired __except block. Unwinding works by walking the exception handler stack from the top, calling each handler with the STATUS_UNWIND "exception code". This instructs the exception handler to destruct any stack variables that require destruction, or cleanup code that should be executed (think how C++ exceptions can be rethrown after doing some cleanup work). This allows Windows to ensure that the stack is cleaned up, without ever knowing anything about what's on the stack or how to clean it up. This also implies that "exception handler" functions aren't just to catch exceptions - there must also be such a function for every function that has destructible variables and in which an exception can be thrown (there is a VC++ project setting which tells the compiler to assume every function can throw SEH exceptions, or to assume only C++ exceptions can be thrown by some functions).

If no exception handler in the program handles the exception, the outermost exception handler will. This handler is located in the thread/process start stub in Kernel32.dll, where Windows wraps the call to your thread/process start function in a __try block. The __except block simply calls UnhandledExceptionFilter, which, by default, displays the "this program has performed an illegal operation" dialog you've no doubt seen before, and terminates the program.

Lastly, SEH exceptions are not strictly compatible with C++ exceptions. You cannot catch SEH exceptions with a C++ catch block (and in fact you can use SEH exception handling in a vanilla C app), although you can catch C++ exceptions with an __except block using certain compiler settings. I don't know about other compilers, but VC++ allows you to set a function the compiler's generated exception handlers will call to convert SEH exceptions to true C++ exceptions, however, which may be caught with C++ catch blocks. I haven't actually used this before, but that sounds pretty sweet.

Tuesday, July 04, 2006

Letters, Words, and Big Numbers, Oh My!

So, I was bored and... decided to play around with license keys some. Everybody hates license keys, right? Especially in cases like Neverwinter Nights, where they have 35 alphanumeric characters (!). We're currently in the process of making some changes to how the system works at work, and one of the results is that we'll now be using license keys for something. So, what can we do to make them more friendly?

Let's apply some basic psychology to the problem: the brain only has so many registers (bits of information that it can hold readily available without requiring effort to memorize the information; I think the exact number is like 5), with each one able to hold a symbol (remember that the brain's processing abilities are highly symbolic). For a random alphanumeric string, each character must be stored in a separate register (if you think that's dehumanizing, you should see the occasional debates me, DB, and BZ - all of us programmers, but all of us also interested in biology - have on the topic of reproduction and evolutionary psychology; and yes, I am aware that "me" isn't technically a subjective pronoun). For a string of, say, numbers where a pattern is evident, the brain can apply compression to the string, and store a larger number of numbers in its registers than it would be able to if there was no pattern.

So, my idea is simply this: use symbols larger than letters and numbers in the license key. This one dictionary I picked for this purpose has 16,100 English words of 7 characters or less (a somewhat arbitrary limitation I established), and that's not even including conjugations of most verbs.

Let's see how well this works. If we take the NWN key, the maximum number of theoretically possible keys is about 3*10^54 (36^35; quiz: how many bits is that?). I can remember about 5 characters at a time, meaning I will have to look at the key 7 times to type it in, if I don't type as I'm reading it (which is reasonable for people who aren't secretaries or programmers). It would take 13 words to represent this key (log(36^35) / log (16,100)). I can remember about 4 of these words at a time, meaning I can type them in 3-4 groups, almost half as many as for the string of random characters.

However, there's one more thing to consider, which is actually a facet of something already mentioned: most people aren't secretaries or programmers. They probably can't type even half as quickly as I can. In addition to the previously stated implications, this also means that they're going to have at least twice as long as I do to remember the words/characters, and 13 words would be a total of 75 characters (average word length of 5.75 letters). And unlike registers in a CPU, the brain's registers lose the information they were holding fairly quickly. Unfortunately, I don't really have any way of quantifying this effect, like I was able to benchmark how much I could remember and type in, so this really remains an unknown.

Sunday, July 02, 2006

& Linguistics - Case - Japanese 2

Anyway, getting back to the topic of the series, Japanese is unusual in (at least) one more way: it does not use case at all. Well, not technically, anyway. Instead, Japanese has about ten half-cases indicated by postposition (the opposite of prepositions) particles.

The emphatic nominative case (my name for it) is indicated by the postposition "ga", as in "anata wa watashi ga doko nimo inai to omotteru" ("You think that I don't exist"; you can get additional meaning out of this sentence after reading the next paragraph). This is like the nominative case, but it carries an additional property - it serves to emphasize the subject as opposed to the predicate (this is not clearly illustrated in the example).

The referential case (also my name; sometimes called the absolute case) is indicated by "wa". The referential case can be used in several ways. It can be used for the subject of the sentence, where the predicate is more important than the subject; kind of a "non-emphatic nominative" case. But it can also be used to set the point of reference of the sentence, which might not be the same as the subject; this would be translated as "with regard to" or "as for". This distinction is beautifully illustrated by "anata wa kodoku ga koko kara kieru to omotteru" ("I think that the loneliness [that] you [feel] will disappear from now on"; more literally, "for you, [the] loneliness will disappear from now on, [I] think").

"no" indicates the genitive case. Fully understanding the Japanese genitive case requires a small shift in paradigm. The genitive case conveys the general thought of "of", in its various meanings. This can either be in terms of possession (e.g. "hoshi no michibiki" - "star's guidance"), origin (e.g. "hontou no watashi" - "real me" or "me of reality") or distinguishing quality (e.g. "mizuiro no lute" - "light blue lute" or "lute of light blue"), reason or benefactor (e.g. "oya no mo" - "mourning for parents" or "mourning of parents"). The first three are nothing we haven't seen before in other common languages, but the last two (known as the benefactive or causative cases in other languages) are not like the genitive of previous languages.

The accusative case is indicated by "wo" (or "o"). This is trivial, and exactly like what we've seen previously (e.g. "tamashii no hanashi wo kikasete yo" - "Tell [me] the story of your soul"; "yo" indicates a request). However, when translating between English and Japanese, this can be a bit tricky, because we are used to some English verbs being intransitive and requiring a preposition, where the corresponding verbs are transitive in Japanese, and do not require a postposition.

"ni" indicates the locative case. This case represents the idea of being at some general position in space or time - either at, in, on, or around. This has a couple meanings. With verbs of existence, the meaning of position is used (e.g. "sora ni aru tobira" - "door [that is] in the sky").

This idea of position is applied in a similar but slightly different manner to yield the second meaning - that of recipient. In English we have the concept of having something on you - that is, possessing something; Japanese does, as well. In Japanese, "ni" is also used to indicate the target of a verb of transfer of possession. Think about it like this: in Japanese, you don't give something to someone - you put it on someone. In other words, the locative case is used where other languages would use the dative case (e.g. "kodomo ni yaru" - "give to [the] child").

(part 2 of 3)

& Linguistics - Case - Japanese 1 (UPDATED)

So, last post I discussed Finnish, a language unusual in that it has an overabundance of grammatical cases (far too many, as far as I'm concerned). This time I'm going to discuss an all-around unusual language (at least to us English/romance-language speakers): Japanese.

Japanese does a great many things notably differently than the languages common on this hemisphere. For example, unlike English, where every clause (save for imperative sentences) must have a subject, in Japanese it is perfectly acceptable (if not likely) for a sentence to lack a subject. Nor is it like romance languages, where the subject is implied in the verb, as the verb is inflected (it is written/spoken differently in different circumstances - declension is one type of inflection) for what pronoun the sentence refers to; no, Japanese sentences can totally and completely lack a subject, leaving listeners to deduce the subject from other sentences.

Japanese sentences are usually written/spoken in reverse-Polish (henceforth known as "Japanese") notation. If you picture a sentence as a syntax tree, child nodes are usually written preceding their parents in the tree, with the sentence ending with the verb and any sentence suffix, such as "ka" (indicating a question). For example, "mienai basho made hashiru nara" - "If you run to the unseen place" - literally "[you] not-seen place to run if".

Japanese completely lacks any dedicated personal pronouns. There are no words like "he", "she", etc. Instead, it commandeers a number of common nouns for this purpose, some examples being "watashi" ("selfishness") and "boku" ("servant" - used specifically by males) for the first person, or "kimi" ("prince") and "anata" ("over there" - something of a literal translation of "Hey, you!" although not as rude) for the second person.

Japanese also lacks grammatical gender and number. That is, it does not inflect its nouns, adjectives, or verbs based on the gender or number of the noun. For comparison, romance languages inflect nouns, adjectives, and verbs based on both gender and number; English, however, only inflects its nouns and verbs by number.

Japanese completely lacks articles, like Latin (but unlike English, Spanish, or Portuguese). But perhaps the oddest quality is that adjectives are half-verbs. It's possible to conjugate adjectives like verbs - complete with tense - and they are able to completely take the place of verbs in a sentence.