Threads that need to wait will do so on a slow semaphore, as we can't do without kernel support. This slow semaphore will always be at a count of 0 or less, as threads will only try to grab it when they have to wait; when threads need to be released, this semaphore will be posted, but never increased above 0.
As we did with the slow semaphore, in addition to supporting a method that waits indefinitely for the semaphore to become available, we'll also support a try-wait method, which fails immediately if the semaphore cannot be obtained (the reason the slow semaphore did this was that POSIX semaphores, unlike Windows semaphores, don't support timed waits).
Finally, our fast semaphore supports spinning, in addition to waiting. When you call the constructor, you tell it the number of times it should try to spin before waiting on the slow semaphore - this number should be 0 for single-CPU systems, and something above that for multi-CPU systems (the exact number for a particular application is probably best found by trial and error). When the Wait function is called, the fast semaphore will spin as many times as it can; if it can't get the semaphore while spinning, it will wait on the slow semaphore. The try-wait function does not spin, it simply returns if the semaphore cannot be grabbed.
I think that's all the important explanation, so here's the code:
CSemaphore m_sem; // The slow semaphore that threads will wait on
unsigned int m_nSpinCount; // Constant. The number of times to attempt to grab the semaphore before waiting
long m_nSemCount; // Atomically accessed. Current count semaphore is at. Positive numbers indicate the number of threads can can still claim the semaphore; negative numbers indicate the number of threads waiting on the semaphore.
inline CFastSemaphore(long nInitialCount, unsigned int nSpinCount = 0) : m_sem(0)
assert(nInitialCount >= 0);
m_nSpinCount = nSpinCount;
m_nSemCount = nInitialCount;
inline bool TryWait()
nSemCount = m_nSemCount;
// If there are no free slots available, fail
if (nSemCount <= 0)
// Try to claim a slot
} while (AtomicCompareExchange(&m_nSemCount, nSemCount - 1, nSemCount) != nSemCount);
// Got it
// Spin first, in case we can get it without waiting
unsigned int nSpinCount = m_nSpinCount;
while (nSpinCount-- > 0)
long nSemCount = AtomicDecrement(&m_nSemCount);
if (nSemCount >= 0)
return; // We got it; no need to wait
// We have to wait
inline Post(long nCount = 1)
assert(nCount > 0);
long nSemCount = AtomicExchangeAdd(&m_nSemCount, nCount);
// Were there any waiters?
if (nSemCount < 0)
m_sem.Post(min(nCount, -nSemCount)); // Release as many as possible