Search This Blog

Friday, October 21, 2005

Asynchronous I/O - APCs - Windows Implementation

So, I just wrote the Windows version of the APC system (the NT 4+ one). It was, as expected, trivial. The code is very straightforward, although I should mention one thing: I use WaitForSingleObjectEx rather than SleepEx. See, SleepEx(0, TRUE) has an undesirable behavior: if no APCs are executed, even with a timeout of 0 MS, SleepEx WILL sleep; specifically, it will surrender the rest of the thread's time slice. This can conceivably take hundreds of milliseconds, which is NOT what we want DispatchAsyncCalls to be doing.

To work around this, I used WaitForSingleObjectEx, waiting on an object that will never become signaled (at least not before the function returns) - the current thread. Unlike SleepEx, WaitForSingleObjectEx with a timeout of 0 MS will indeed return immediately.

inline bool QueueAsyncCall(PAPCFUNC lpCallProc, uword param)
{
assert(lpCallProc);

return (QueueUserAPC(lpCallProc, m_hThread, (ULONG_PTR)param) != 0);
}

// Returns true if APCs were dispatched before the timeout expired, otherwise false
static inline bool DispatchAsyncCalls(unsigned int nTimeoutMS)
{ return (WaitForSingleObjectEx(::GetCurrentThread(), nTimeoutMS, TRUE) == WAIT_IO_COMPLETION); }

// Returns true if APCs were dispatched, false if an error occurred
static inline bool DispatchAsyncCalls()
{ return (WaitForSingleObjectEx(::GetCurrentThread(), INFINITE, TRUE) == WAIT_IO_COMPLETION); }

However, there was a significant problem: by definition LibQ can't use any platform-specific definitions in the interface exposed to the user. PAPCFUNC, however, is a Win32 definition: the prototype for the APC function that Windows calls directly. So, we have what appears to be a paradox: we can't make the client use PAPCFUNC, yet we have no choice but to use PAPCFUNC. The solution: a bit of black magic; you know, the kind of thing that makes other programmers call you (or me, as is often the case) a pervert.

Three potential solutions occurred to me. After some time thinking about it, I decided one was significantly better than the alternatives. Specifically, this one (note that the typedef is platform-independent, while the two defines are the Windows versions of platform-independent macros):

// Prototype for asynchronous call functions
typedef void (*TAsyncCallPtr)(uword param);

// Windows macros for APC proxy generation and use. Must be used in the same module as the APC is queued.
#define DECLARE_ASYNC_CALL_PROC(function) static VOID CALLBACK APCProxy_##function(ULONG_PTR lpParam) { function ((uword)lpParam); }

#define MAKE_ASYNC_CALL_PROC(function) (PAPCFUNC)APCProxy_##function

This method works by generating proxy functions that conform to the OS APC prototype, while calling the user's APC function using the platform-independent prototype. Of course, all this is handled by two easy to use macros.

So, this was tested, and confirmed to work. But for me, the ultimate acid test of success with anything LibQ-related was efficiency of code generated. So, into release build we go, to look at the assembly generated in calls to these functions. Take a look:

CThread &thread = CThread::GetCurrentThread();
004018A0 mov eax,dword ptr [Q::CThread::s_curThread (40ECC8h)]
004018A5 push eax
004018A6 call dword ptr [__imp__TlsGetValue@4 (40B034h)]

thread.QueueAsyncCall(MAKE_ASYNC_CALL_PROC(AsyncFunc), 0);
004018AC mov ecx,dword ptr [eax+0Ch]
004018AF push 0
004018B1 push ecx
004018B2 push offset APCProxy_AsyncFunc (401860h)
004018B7 call dword ptr [__imp__QueueUserAPC@12 (40B038h)]

CThread::DispatchAsyncCalls(0);
004018BD push 1
004018BF push 0
004018C1 call dword ptr [__imp__GetCurrentThread@0 (40B040h)]
004018C7 push eax
004018C8 call dword ptr [__imp__WaitForSingleObjectEx@12 (40B03Ch)]


Isn't that pretty? The only way you can tell that this wasn't native Win32 API C code is that the program has to resort to thread-local storage to hold a pointer to the CThread, whereas a Win32 program would just call GetCurrentThread; but I'm quite pleased with the results, and this is a prime example of the LibQ philosophy of incurring the absolute minimum possible amount of overhead.

1 comment:

Anonymous said...

good info