Search This Blog

Sunday, July 16, 2006

The Animation Control Incident

So, I'm going about my business at work, working on my project. What I'm working on, generally, is a configuration wizard for the product we make. It tries to detect as much information as possible, and then let's the user review/change the setting before configuring/activating the program. One of the neat features it has (which I may blog about more in depth) is a mechanism of asynchronous data prefetch. As some of the detection methods can take several seconds (or even as long as 15 seconds, if a computer it's trying to reach on the network is offline), this hides that time by allowing the user to use the wizard while info is gathered in the background, ideally preventing the user from ever knowing how much time was spent gathering information.

Anyway, I needed a "please wait" dialog for when the user flips to a page whose data isn't loaded yet (and also to use for times when it has to spend time validating the user's input). The company has a nice little AVI to use for such occasions, so I threw it on a dialog with an animation control (the standard Windows common control). However, testing this dialog revealed something odd - once you told it to play with Animate_Play/ACM_PLAY, it would show the first frame for several seconds, until it finally began to play the animation.

A ridiculous amount of fidgeting with the parameters (and the source to the company's Animation control wrapper class) later, I was still unable to find anything I was doing wrong to cause this (although I did manage to observe that it wasn't just the first time - any time the window was completely obscured, this lag occurred). I asked Skywing about it, and about another weird thing I'd seen in the animation control class the company has. He said he'd seen the delay before, too, and had looked into it a bit, but was unable to locate the cause (and he said I should try and figure it out). He said that it seemed to be completely hiding the first iteration of the AVI.

So I looked at the control. It was creating a decoding thread, which looked like this:

.text:5D0AEC6D ; DWORD __stdcall PlayThread(LPVOID)
.text:5D0AEC6D _PlayThread@4 proc near
.text:5D0AEC6D arg_4 = dword ptr 8
.text:5D0AEC6D mov edi, edi
.text:5D0AEC6F push ebp
.text:5D0AEC70 mov ebp, esp
.text:5D0AEC72 push esi
.text:5D0AEC73 mov esi, [ebp+arg_4]
.text:5D0AEC76 push 1
.text:5D0AEC78 push esi
.text:5D0AEC79 call _DoNotify@8 ; DoNotify(x,x)
.text:5D0AEC7E push esi
.text:5D0AEC7F call _HandleTick@4 ; HandleTick(x)
.text:5D0AEC84 test eax, eax
.text:5D0AEC86 jz short loc_5D0AECB7
.text:5D0AEC88 push edi
.text:5D0AEC89 mov edi, 0FA0h
.text:5D0AEC8E mov ecx, [esi+5Ch]
.text:5D0AEC91 test ecx, ecx
.text:5D0AEC93 jz loc_5D0B8C48
.text:5D0AEC99 test eax, eax
.text:5D0AEC9B mov eax, [esi+44h]
.text:5D0AEC9E jl loc_5D0AF8BA
.text:5D0AECA4 push eax ; dwMilliseconds
.text:5D0AECA5 push ecx ; hHandle
.text:5D0AECA6 call ds:__imp__WaitForSingleObject@8 ; __declspec(dllimport) WaitForSingleObject(x,x)
.text:5D0AECAC push esi
.text:5D0AECAD call _HandleTick@4 ; HandleTick(x)
.text:5D0AECB2 test eax, eax
.text:5D0AECB4 jnz short loc_5D0AEC8E
.text:5D0AECB6 pop edi
.text:5D0AECB7 push 2
.text:5D0AECB9 push esi
.text:5D0AECBA call _DoNotify@8 ; DoNotify(x,x)
.text:5D0AECBF xor eax, eax
.text:5D0AECC1 pop esi
.text:5D0AECC2 pop ebp
.text:5D0AECC3 retn 4
.text:5D0AECC3 _PlayThread@4 endp

.text:5D0AF8BA add eax, edi
.text:5D0AF8BC jmp loc_5D0AECA4

HandleTick is what draws each frame (we'll get back to this in a minute). It was trivial to determine that the problem was due to WaitForSingleObject spending several seconds waiting before timing out. But that didn't explain why, or what it was waiting on.

By stepping through the loop I observed that the 5D0AEC9E-5D0AF8BA-5D0AF8BC-5D0AECA4 route was being taken. The value loaded from the struct was 100 ms, and it was getting 4000 ms added to it. I got out my cell phone (with built-in stopwatch) and verified that the drawing delay was exactly 4.1 seconds - not the 2.8 seconds Skywing predicted (the length of the animation). However, as best I could tell, it was doing this every single iteration. So why was there only a single gap in the animation?

As you can see in the disassembly, the 4 s delay path is only taken when HandleTick returns a negative number (which it was, in this case, returning -1). Looking at this function revealed the following of interest (I'm not pasting the whole function, here):

.text:5D0AECE7 push eax ; lpCriticalSection
.text:5D0AECE8 mov [ebp+lpCriticalSection], eax
.text:5D0AECEB call ds:__imp__EnterCriticalSection@4 ; __declspec(dllimport) EnterCriticalSection(x)
.text:5D0AECF1 push dword ptr [esi] ; hWnd
.text:5D0AECF3 call ds:__imp__GetDC@4 ; __declspec(dllimport) GetDC(x)
.text:5D0AECF9 mov ebx, eax
.text:5D0AECFB lea eax, [ebp+var_10]
.text:5D0AECFE push eax ; LPRECT
.text:5D0AECFF push ebx ; HDC
.text:5D0AED00 call ds:__imp__GetClipBox@8 ; __declspec(dllimport) GetClipBox(x,x)
.text:5D0AED06 cmp eax, 1
.text:5D0AED09 jz loc_5D0AF8A5

.text:5D0AED43 push ebx ; hDC
.text:5D0AED44 push dword ptr [esi] ; hWnd
.text:5D0AED46 mov edi, eax
.text:5D0AED48 call ds:__imp__ReleaseDC@8 ; __declspec(dllimport) ReleaseDC(x,x)
.text:5D0AED4E push [ebp+lpCriticalSection] ; lpCriticalSection
.text:5D0AED51 call ds:__imp__LeaveCriticalSection@4 ; __declspec(dllimport) LeaveCriticalSection(x)
.text:5D0AED57 pop ebx

.text:5D0AF8A5 mov eax, [esi+50h]
.text:5D0AF8A8 mov [esi+48h], eax
.text:5D0AF8AB xor eax, eax
.text:5D0AF8AD cmp [esi+4Ch], edi
.text:5D0AF8B0 setnz al
.text:5D0AF8B3 neg eax
.text:5D0AF8B5 jmp loc_5D0AED43

The jump at 5D0AED09 was being taken, resulting in eax getting set to -1 ([esi+4Ch]/-1 != edi/0). That one REALLY threw me. At first I thought that GetClipBox was succeeding, and as a result HandleTick was failing. But in fact GetClipBox has the following return values:
#define ERROR 0
#define NULLREGION 1

And everything becomes clear. If the animation's window is completely covered by another window, GetClipBox returns NULLREGION. If GetClipBox returns NULLREGION, HandleTick returns -1. If HandleTick returns -1, the timeout duration gets 4 s added to it.

Actually, there are two more pieces of information we still require to completely crack this case. The first was provided by Skywing and WinDbg - the event being waited on is actually a termination signal. When it's time for the animation control to stop playing, this event is set (among other things), short-circuiting the loop. This means that the value of [esi+44h] at 5D0AEC9B determines the rate at which frames are drawn. This is a bit of deviation from the most common use of events, where the event being set - not the timeout expiring - is the expected result.

It seems likely that the 4s addition is a backoff case. If the animation control window becomes completely obscured, there's no need to draw the frame, and the rendering thread stalls itself (excessively, if you ask me; I probably wouldn't have given more than a 1 second timeout).

So, now we know why the delay. That just leaves the question of why this timeout executes every time at the very beginning. After thinking about it a moment, the answer came to me: it's a consequence of how (or where) we're playing the animation - in the WM_CREATE (for windows) or WM_INITDIALOG (for dialogs) message handler. This is exactly where initialization that requires the window to already be created is supposed to go. Now, here's the trick: at this point, the window has been created - but it is still not visible (obviously - you want to do initialization BEFORE the window appears on screen). Since the rendering is done in a separate thread, this thread can execute concurrently with the UI thread. If the rendering thread gets to execute before the WM_CREATE/WM_INITDIALOG handler returns, and the window is shown, the rendering thread will go into timeout.


Smokinn said...

Wow, pretty convoluted. Did you find a solution?

wow power leveling said...

welcome to the wow power leveling, cheap service site,WoW Gold buy cheap wow gold,wow gold,world of warcraft buy wow power leveling

Anonymous said...

Use the Animate_Play() on handling a WM_ACTIVATE message and the delay disappears. Presumably because it is reinitializing something.