While the first two mechanisms of DLL injection I've shown have used well documented Windows API functions, the third and final method is quite a bit more exotic. This method consists literally of hijacking a (the, to be exact) thread that already exists in the target process and making it execute code we injected using methods discussed previously.
The trick, here, is the fact that new processes can be created suspended. When CreateProcess is called with CREATE_SUSPENDED, Windows begins the usual way: creating the process' address space, loading the module, preparing the kernel for the new process, and creating the initial thread. In reality, processes are nothing more than an environment for threads to run it; what's really suspended is the initial thread. When run, this initial thread does several things, most notably preparing the executable for execution (including loading all required DLLs) calling the executable's entry point function (main or WinMain), and then calling ExitThread with the return value of the entry point (if there are no other threads running in the process, ExitThread has the effect of destroying the process).
While this thread is suspended, we have access to the process, allowing us to do any number of evil things. There are a number of possible ways to go about hijacking the thread, but I'll only present the best one (the most robust and with the highest reliability): overwriting the entry point. Here, we overwrite the first few bytes of the entry point with a JMP instruction, to jump to our injected code, which will load your DLL, call a patching function, and then jump back to the application.
There are numerous advantages to this technique over the others. Unlike CreateRemoteThread, this method does not mandate Windows NT (I should note, in case you don't realize, that "NT" refers to the NT platform, which includes NT Workstation/Server, 2000, XP, and Server 2003). As well, it is the only method that not only allows synchronous operation, but also allows your code to be executed before the target executable begins running.
This sounds fairly simple, but it turns out to be a major hassle to get right (I seriously doubt I could have gotten the code for this post working on the first try had I not been doing this kind of thing for years). This is especially true when you intend to create a version which works on both Windows 9x and NT, which is a very nice feature.
The first complication of this method is rather severe: you must be sure that you get EVERYTHING you need in your injected loader code into the process, both code and data. Among other things, that implies that you must write your loader code in assembly, and you may not call imported API functions (because your loader code doesn't have an import table). If you wish to call any API functions (which you will, considering that you'll at least need LoadLibrary), you must pass the address of the functions to your loader from the parent process.
There are also many numerous smaller complications. If you intended to support both 9x and NT, you must ensure that you can inject either via allocated memory (for NT) or a file mapping (for 9x). And in the case of 9x, you must ensure that the mapping does not get closed before the loader has finished executing (this is tricky because the mapping was created in the parent process, and if the parent process closes it, the mapping will disappear from the target process, as well).
I've been putting a LOT of effort into researching this method. As far as I've been able to tell, it has only one inherent limitation. As the loader code executes before main/WinMain, the executable will not have been initialized, and so you cannot call any functions in it. This may be worked around by hooking some function the executable imports, and then delaying your initialization until that function is called (this is what LMPQAPI does to create a server using MPQ editing functions in StarEdit.exe).
Two more limitations are imposed by my implementation. First, the executable must load at its preferred address (not be relocated), as that's where the injector expects it to be. Second, because the patching process is architecture-specific, it is limited to what I wrote: a 32-bit process patching a 32-bit process. It is likely that these problems can both be fixed, but I'm too lazy to do it, at the moment.
#define LOADER_MAX_SIZE 192
#define PATCHER_DATA_ALIGNMENT 16
#define ALIGN_PATCHER_DATA(x) (((UINT_PTR)x + PATCHER_DATA_ALIGNMENT - 1) & ~(PATCHER_DATA_ALIGNMENT - 1))
typedef LPVOID (WINAPI *VirtualAllocExPtr)
(
HANDLE hProcess,
LPVOID lpAddress,
SIZE_T dwSize,
DWORD flAllocationType,
DWORD flProtect
);
typedef BOOL (WINAPI *VirtualFreeExPtr)
(
HANDLE hProcess,
LPVOID lpAddress,
SIZE_T dwSize,
DWORD dwFreeType
);
#include <pshpack1.h>
struct JMP32
{
BYTE byOpcode;
DWORD nRelOffset;
inline JMP32()
{ byOpcode = 0xE9; }
};
#include <poppack.h>
struct LOADERFUNCTIONPARAMS
{
BOOL bCompleted;
DWORD nErrCode;
HANDLE hParamsSection;
FARPROC lpfnLoadLibraryA;
FARPROC lpfnMapViewOfFile;
FARPROC lpfnGetLastError;
FARPROC lpfnExitProcess;
UINT_PTR nReturnAddress;
JMP32 jmpOverwritten;
UINT_PTR nPatcherRVA;
size_t nPatcherDataLen;
char szDLLFilePath[MAX_PATH];
BYTE fnLoaderFunction[LOADER_MAX_SIZE];
BYTE byPatcherData[PATCHER_DATA_ALIGNMENT];
};
void __declspec(naked) __stdcall LoaderFunction86_32()
{
__asm {
; Use CALL to generate the return address we need to overwrite with the entry point's address
call Loader
Loader:
push ebp
mov ebp, esp
pushad
; int 3 ; Uncomment this for debugging the loader function
; Compute the address of the LOADERFUNCTIONPARAMS block. It will be at the page boundary beneath this code
mov ebx, [ebp+4]
and ebx, 0xFFFFF000
; If the parameter block is in a file mapping, lock it, first
mov edx, [ebx]LOADERFUNCTIONPARAMS.hParamsSection
test edx, edx
jz LoadDLL
push 0
push 0
push 0
push FILE_MAP_WRITE
push edx
call [ebx]LOADERFUNCTIONPARAMS.lpfnMapViewOfFile
test eax, eax
jz Failure
LoadDLL: ; Call LoadLibraryA to load DLL.
lea edx, [ebx]LOADERFUNCTIONPARAMS.szDLLFilePath
push edx
call [ebx]LOADERFUNCTIONPARAMS.lpfnLoadLibraryA
test eax, eax
jz Failure
LibraryLoaded: ; Now call the patcher entry point, if there is one
cmp [ebx]LOADERFUNCTIONPARAMS.nPatcherRVA, 0
je RewriteEntryPoint
lea ecx, [ebx]LOADERFUNCTIONPARAMS.byPatcherData
add ecx, (PATCHER_DATA_ALIGNMENT - 1)
and ecx, ~(PATCHER_DATA_ALIGNMENT - 1)
mov edx, [ebx]LOADERFUNCTIONPARAMS.nPatcherDataLen
add eax, [ebx]LOADERFUNCTIONPARAMS.nPatcherRVA
push edx
push ecx
call eax
test eax, eax
jz Failure
RewriteEntryPoint: ; Put the original bytes from the entry point back
mov edx, [ebx]LOADERFUNCTIONPARAMS.nReturnAddress
lea esi, [ebx]LOADERFUNCTIONPARAMS.jmpOverwritten
mov edi, edx
mov ecx, size JMP32
rep movsb
mov [ebp+4], edx ; Set the return address to the entry point
Done: ; Patching completed successfully. Acknowledge success and return to the entry point.
mov [ebx]LOADERFUNCTIONPARAMS.nErrCode, NO_ERROR
mov [ebx]LOADERFUNCTIONPARAMS.bCompleted, TRUE
popad
mov esp, ebp
pop ebp
ret
Failure: ; Save GetLastError value and call ExitProcess
call [ebx]LOADERFUNCTIONPARAMS.lpfnGetLastError
mov [ebx]LOADERFUNCTIONPARAMS.nErrCode, eax
push 0
;mov [ebx]LOADERFUNCTIONPARAMS.bCompleted, TRUE
call [ebx]LOADERFUNCTIONPARAMS.lpfnExitProcess
};
}
bool FindModuleEntryPoint(LPCSTR lpszFilePath, UINT_PTR &lpfnEntryPoint)
{
assert(lpszFilePath);
HMODULE hModule = LoadLibraryEx(lpszFilePath, NULL, LOAD_LIBRARY_AS_DATAFILE);
if (!hModule)
return false;
bool bSuccess = false;
__try
{
IMAGE_DOS_HEADER *lpDosHeader = (IMAGE_DOS_HEADER *)((UINT_PTR)hModule & ~(UINT_PTR)0xFFF);
if (lpDosHeader->e_magic == IMAGE_DOS_SIGNATURE && lpDosHeader->e_lfanew)
{
DWORD *lpNTSignature = (DWORD *)((UINT_PTR)lpDosHeader + lpDosHeader->e_lfanew);
IMAGE_FILE_HEADER *lpNTHeader = (IMAGE_FILE_HEADER *)((UINT_PTR)lpNTSignature + sizeof(DWORD));
IMAGE_OPTIONAL_HEADER32 *lpOptHeader = (IMAGE_OPTIONAL_HEADER32 *)((UINT_PTR)lpNTHeader + IMAGE_SIZEOF_FILE_HEADER);
if (*lpNTSignature == IMAGE_NT_SIGNATURE)
{
lpfnEntryPoint = lpOptHeader->AddressOfEntryPoint + lpOptHeader->ImageBase;
bSuccess = true;
}
}
}
__except (EXCEPTION_EXECUTE_HANDLER)
{ }
FreeLibrary(hModule);
return bSuccess;
}
bool HookModuleEntryPoint32(LPCSTR lpszFilePath, HANDLE hProcess, LOADERFUNCTIONPARAMS *lpParamsBlock, UINT_PTR &lpfnEntryPoint, JMP32 &jmpOverwritten)
{
assert(lpParamsBlock);
if (!FindModuleEntryPoint(lpszFilePath, lpfnEntryPoint))
return false;
__try
{
DWORD nOldProtect;
if (!VirtualProtectEx(hProcess, (void *)lpfnEntryPoint, sizeof(JMP32), PAGE_EXECUTE_READWRITE, &nOldProtect))
return false;
SIZE_T nBytesRead;
if (!ReadProcessMemory(hProcess, (void *)lpfnEntryPoint, &jmpOverwritten, sizeof(JMP32), &nBytesRead) || nBytesRead != sizeof(JMP32))
return false;
SIZE_T nBytesWritten;
JMP32 jmp;
DWORD nLoaderAddress = (DWORD)&lpParamsBlock->fnLoaderFunction;
jmp.nRelOffset = nLoaderAddress - (lpfnEntryPoint + sizeof(jmp));
if (!WriteProcessMemory(hProcess, (void *)lpfnEntryPoint, &jmp, sizeof(jmp), &nBytesWritten) || nBytesWritten != sizeof(jmp))
return false;
return true;
}
__except (EXCEPTION_EXECUTE_HANDLER)
{ return false; }
}
bool GetLoaderErrorCode(HANDLE hProcess, LOADERFUNCTIONPARAMS *lpParamsMemory, DWORD &nErrCode)
{
SIZE_T nBytesRead;
while (WaitForSingleObject(hProcess, 10) != WAIT_OBJECT_0)
{
BOOL bCompleted;
if (!ReadProcessMemory(hProcess, &lpParamsMemory->bCompleted, &bCompleted, sizeof(bCompleted), &nBytesRead) || nBytesRead != sizeof(bCompleted))
return false;
if (bCompleted)
break;
}
if (!ReadProcessMemory(hProcess, &lpParamsMemory->nErrCode, &nErrCode, sizeof(nErrCode), &nBytesRead) || nBytesRead != sizeof(nErrCode))
return false;
return true;
}
bool InjectDLLAndResumeProcessNT(HANDLE hProcess, HANDLE hThread, LPCSTR lpszFilePath, LOADERFUNCTIONPARAMS ¶ms, const void *lpPatcherData, size_t nPatcherDataLen, bool &bIsNT, DWORD &nErrCode)
{
if (nPatcherDataLen)
assert(lpPatcherData);
bIsNT = false;
HMODULE hKernel32 = GetModuleHandle("Kernel32");
VirtualAllocExPtr lpfnVirtualAllocEx = (VirtualAllocExPtr)GetProcAddress(hKernel32, "VirtualAllocEx");
VirtualFreeExPtr lpfnVirtualFreeEx = (VirtualFreeExPtr)GetProcAddress(hKernel32, "VirtualFreeEx");
if (!lpfnVirtualAllocEx || !lpfnVirtualFreeEx)
return false;
LOADERFUNCTIONPARAMS *lpParamsMemory = (LOADERFUNCTIONPARAMS *)lpfnVirtualAllocEx(hProcess, 0, sizeof(LOADERFUNCTIONPARAMS) + nPatcherDataLen, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (lpParamsMemory || GetLastError() != ERROR_CALL_NOT_IMPLEMENTED)
bIsNT = true;
if (!lpParamsMemory)
return false;
bool bSuccess = false;
if (HookModuleEntryPoint32(lpszFilePath, hProcess, lpParamsMemory, params.nReturnAddress, params.jmpOverwritten))
{
BYTE *lpPatcherDataMemory = (BYTE *)ALIGN_PATCHER_DATA(lpParamsMemory->byPatcherData);
SIZE_T nBytesWritten;
if (WriteProcessMemory(hProcess, lpParamsMemory, ¶ms, sizeof(params), &nBytesWritten) && nBytesWritten == sizeof(params))
{
if (!nPatcherDataLen || (WriteProcessMemory(hProcess, lpPatcherDataMemory, lpPatcherData, nPatcherDataLen, &nBytesWritten) && nBytesWritten == nPatcherDataLen))
{
if (ResumeThread(hThread) != (DWORD)-1)
bSuccess = GetLoaderErrorCode(hProcess, lpParamsMemory, nErrCode);
}
}
}
lpfnVirtualFreeEx(hProcess, lpParamsMemory, 0, MEM_RELEASE);
return bSuccess;
}
bool InjectDLLAndResumeProcess9x(HANDLE hProcess, HANDLE hThread, LPCSTR lpszFilePath, LOADERFUNCTIONPARAMS ¶ms, const void *lpPatcherData, size_t nPatcherDataLen, DWORD &nErrCode)
{
if (nPatcherDataLen)
assert(lpPatcherData);
HANDLE hMapping = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, sizeof(LOADERFUNCTIONPARAMS) + nPatcherDataLen, NULL);
if (!hMapping)
return false;
bool bSuccess = false;
LOADERFUNCTIONPARAMS *lpParamsMemory = (LOADERFUNCTIONPARAMS *)MapViewOfFile(hMapping, FILE_MAP_WRITE, 0, 0, 0);
if (lpParamsMemory)
{
if (HookModuleEntryPoint32(lpszFilePath, hProcess, lpParamsMemory, params.nReturnAddress, params.jmpOverwritten))
{
if (DuplicateHandle(GetCurrentProcess(), hMapping, hProcess, ¶ms.hParamsSection, 0, FALSE, DUPLICATE_SAME_ACCESS))
{
BYTE *lpPatcherDataMemory = (BYTE *)ALIGN_PATCHER_DATA(lpParamsMemory->byPatcherData);
memcpy(lpParamsMemory, ¶ms, sizeof(params));
memcpy(lpPatcherDataMemory, lpPatcherData, nPatcherDataLen);
if (ResumeThread(hThread) != (DWORD)-1)
bSuccess = GetLoaderErrorCode(hProcess, lpParamsMemory, nErrCode);
}
}
UnmapViewOfFile(lpParamsMemory);
}
CloseHandle(hMapping);
return bSuccess;
}
bool InjectDLLAndResumeProcess(HANDLE hProcess, HANDLE hThread, LPCSTR lpszExecPath, LPCSTR lpszDLLFilePath, UINT_PTR nPatcherRVA, const void *lpPatcherData, size_t nPatcherDataLen, DWORD &nErrCode)
{
assert(hProcess);
assert(lpszExecPath);
assert(lpszDLLFilePath);
assert(strlen(lpszDLLFilePath) < MAX_PATH);
HMODULE hKernel32 = GetModuleHandle("Kernel32");
LOADERFUNCTIONPARAMS params;
params.hParamsSection = NULL;
params.bCompleted = FALSE;
params.lpfnLoadLibraryA = GetProcAddress(hKernel32, "LoadLibraryA");
params.lpfnMapViewOfFile = GetProcAddress(hKernel32, "MapViewOfFile");
params.lpfnGetLastError = GetProcAddress(hKernel32, "GetLastError");
params.lpfnExitProcess = GetProcAddress(hKernel32, "ExitProcess");
params.nPatcherRVA = nPatcherRVA;
params.nPatcherDataLen = nPatcherDataLen;
strcpy(params.szDLLFilePath, lpszDLLFilePath);
#ifdef _DEBUG
JMP32 *pJmpStub = (JMP32 *)LoaderFunction86_32;
LPBYTE lpbyLoaderFunction = (LPBYTE)(pJmpStub->nRelOffset + (DWORD)LoaderFunction86_32 + sizeof(JMP32));
memcpy(¶ms.fnLoaderFunction, lpbyLoaderFunction, LOADER_MAX_SIZE);
#else
memcpy(¶ms.fnLoaderFunction, LoaderFunction86_32, LOADER_MAX_SIZE);
#endif
bool bIsNT = false;
if (InjectDLLAndResumeProcessNT(hProcess, hThread, lpszExecPath, params, lpPatcherData, nPatcherDataLen, bIsNT, nErrCode))
return true;
else if (!bIsNT && InjectDLLAndResumeProcess9x(hProcess, hThread, lpszExecPath, params, lpPatcherData, nPatcherDataLen, nErrCode))
return true;
return false;
}