So, supporting asynchronous I/O uniformly on a variety of platforms while making full use of OS specific features provides us (or me, at least) with a challenge. However, with a bit of clever object-oriented magic, the challenge is significantly reduced.
Apart from the classes I've already mentioned, two other classes form the core of LibQ's asynchronous I/O system. While I could have (and was originally planning on) making the features applicable to the asynchronous I/O system for internal use only, I ultimately decided they would be useful enough for public use that I'd put some extra care into them and make them part of the public API.
The first of these important features is asynchronous procedure calls (APCs). APCs can be queued to any thread via CThread::QueueAsyncCall, and will be held until the thread calls CThread::DispatchAsyncCalls to dispatch them; at that point, each queued APC function for that thread will be called, before the function returns.
Win32 (both Windows NT and 9x) supports this mechanism natively. APCs are queued to the specified thread with the QueueUserAPC function, and dispatched at any indeterminate point while the thread is in an alertable wait state. An alertable wait state is when the thread is suspended (i.e. sleeping or waiting on an object) but is flagged as alertable (this can only be specified in SleepEx, WaitForSingleObjectEx, and WaitForMultipleObjectsEx). All of those functions will sleep until one of three things happens: the object being waited on becomes signaled (not applicable to SleepEx), the timeout expires, or APCs are executed. CThread uses Win32 APCs on Windows.
Unfortunately, we do not have the luxury of the same decadence in a uniform cross-platform library. POSIX does not natively support APCs (at least not in a form that resembles the Win32 method); the closest thing to Win32 APCs that POSIX supports is message queues, which I chose not to use for the reason that there is no qualitative benefit (and a performance penalty) for using kernel-mode message queues over a user-mode implementation.
The POSIX implementation consists simply of a linked list (a queue) protected by a mutex, and a condition, for each thread. This allows us to approximate the Win32 APC by allowing waits - either timed or indefinite - for APCs. However, it still won't be possible to process APCs while waiting on a synchronization object (although you can simulate this by queuing APCs that do some particular task that would otherwise have been executed when a synchronization object became signaled).
UPDATE: I've just heard some very grave (and unexpected) news. NT pre-4.0 does NOT support QueueUserAPC. This puts a rather sizeable hurdle in the way of this thing, as it leaves two options.
First, I could drop support for NT before 4.0. While I wouldn't hesitate to drop support for NT 3.1 (back from 1993), NT 3.5 was around until 1996 or 1997, making it not THAT old. Of course, it could be argued that new programs will require the Explorer interface that wasn't introduced until NT 4 (it was first released in Windows 95, which preceded NT 4). While it's safe to assume that no new GUI program would use the Windows 3.1 GUI (which NT 3.1 and 3.5 had), this isn't the case for programs (or libraries) that don't have a GUI.
The other alternative is to create a hybrid list/APC system. NT has always supported APCs for asynchronous I/O notification; however, it wasn't until 4.0 that you could send your own APCs. In order to pull this off, I'd have to implement a hybrid condition variable-type-thingy that waits on the condition in an alertable state (and perhaps even throw a timeout in there for good measure). This would be messy, to say the least, and it could take 2 kernel mode transitions just to be sure all bases are covered (if WaitForSingleObjectEx returns WAIT_OBJECT_0 you can't be sure that there weren't APCs that didn't get executed, and if it returns WAIT_IO_COMPLETION you can't be sure that the object wasn't signaled), making it slower.
I'm leaning towards requiring NT 4.