Q & Stuff: The Art of Breaking and Entering

While the first two mechanisms of DLL injection I've shown have used well documented Windows API functions, the third and final method is quite a bit more exotic. This method consists literally of hijacking a (the, to be exact) thread that already exists in the target process and making it execute code we injected using methods discussed previously.

The trick, here, is the fact that new processes can be created suspended. When CreateProcess is called with CREATE_SUSPENDED, Windows begins the usual way: creating the process' address space, loading the module, preparing the kernel for the new process, and creating the initial thread. In reality, processes are nothing more than an environment for threads to run it; what's really suspended is the initial thread. When run, this initial thread does several things, most notably preparing the executable for execution (including loading all required DLLs) calling the executable's entry point function (main or WinMain), and then calling ExitThread with the return value of the entry point (if there are no other threads running in the process, ExitThread has the effect of destroying the process).

While this thread is suspended, we have access to the process, allowing us to do any number of evil things. There are a number of possible ways to go about hijacking the thread, but I'll only present the best one (the most robust and with the highest reliability): overwriting the entry point. Here, we overwrite the first few bytes of the entry point with a JMP instruction, to jump to our injected code, which will load your DLL, call a patching function, and then jump back to the application.

There are numerous advantages to this technique over the others. Unlike CreateRemoteThread, this method does not mandate Windows NT (I should note, in case you don't realize, that "NT" refers to the NT platform, which includes NT Workstation/Server, 2000, XP, and Server 2003). As well, it is the only method that not only allows synchronous operation, but also allows your code to be executed before the target executable begins running.

This sounds fairly simple, but it turns out to be a major hassle to get right (I seriously doubt I could have gotten the code for this post working on the first try had I not been doing this kind of thing for years). This is especially true when you intend to create a version which works on both Windows 9x and NT, which is a very nice feature.

The first complication of this method is rather severe: you must be sure that you get EVERYTHING you need in your injected loader code into the process, both code and data. Among other things, that implies that you must write your loader code in assembly, and you may not call imported API functions (because your loader code doesn't have an import table). If you wish to call any API functions (which you will, considering that you'll at least need LoadLibrary), you must pass the address of the functions to your loader from the parent process.

There are also many numerous smaller complications. If you intended to support both 9x and NT, you must ensure that you can inject either via allocated memory (for NT) or a file mapping (for 9x). And in the case of 9x, you must ensure that the mapping does not get closed before the loader has finished executing (this is tricky because the mapping was created in the parent process, and if the parent process closes it, the mapping will disappear from the target process, as well).

I've been putting a LOT of effort into researching this method. As far as I've been able to tell, it has only one inherent limitation. As the loader code executes before main/WinMain, the executable will not have been initialized, and so you cannot call any functions in it. This may be worked around by hooking some function the executable imports, and then delaying your initialization until that function is called (this is what LMPQAPI does to create a server using MPQ editing functions in StarEdit.exe).

Two more limitations are imposed by my implementation. First, the executable must load at its preferred address (not be relocated), as that's where the injector expects it to be. Second, because the patching process is architecture-specific, it is limited to what I wrote: a 32-bit process patching a 32-bit process. It is likely that these problems can both be fixed, but I'm too lazy to do it, at the moment.

// Amount of space to reserve for the loader function that gets injected
#define LOADER_MAX_SIZE 192
#define PATCHER_DATA_ALIGNMENT 16 // Alignment to use for the patcher data

// Rounds an offset up to the nearest PATCHER_DATA_ALIGNMENT boundary
#define ALIGN_PATCHER_DATA(x) (((UINT_PTR)x + PATCHER_DATA_ALIGNMENT - 1) & ~(PATCHER_DATA_ALIGNMENT - 1))

typedef LPVOID (WINAPI *VirtualAllocExPtr)
(
HANDLE hProcess,
LPVOID lpAddress,
SIZE_T dwSize,
DWORD flAllocationType,
DWORD flProtect
);

typedef BOOL (WINAPI *VirtualFreeExPtr)
(
HANDLE hProcess,
LPVOID lpAddress,
SIZE_T dwSize,
DWORD dwFreeType
);

// The JMP rel32 instruction
#include <pshpack1.h>
struct JMP32
{
BYTE byOpcode; // 0xE9
DWORD nRelOffset; // Offset relative to the instruction AFTER this JMP

inline JMP32()
{ byOpcode = 0xE9; }
};
#include <poppack.h>

// The parameters that will get injected into the target process
struct LOADERFUNCTIONPARAMS
{
BOOL bCompleted; // Whether the loader has finished
DWORD nErrCode; // GetLastError value when the loader succeeds/fails

HANDLE hParamsSection; // If the parameter block is in a file mapping, HANDLE of the mapping; NULL otherwise.

FARPROC lpfnLoadLibraryA; // Functions that the loader will call
FARPROC lpfnMapViewOfFile;
FARPROC lpfnGetLastError;
FARPROC lpfnExitProcess;

UINT_PTR nReturnAddress; // The address that our loader function will return to

JMP32 jmpOverwritten; // The data we overwrite in the WinMain function with the JMP to the loader

UINT_PTR nPatcherRVA; // RVA of patcher entry point in DLL
size_t nPatcherDataLen; // Length of data to be passed to patcher

char szDLLFilePath[MAX_PATH]; // Name of patcher DLL

BYTE fnLoaderFunction[LOADER_MAX_SIZE]; // Loader function code

BYTE byPatcherData[PATCHER_DATA_ALIGNMENT]; // Patcher data of variable length
};

// The loader function for x86-32. This function will return (on success) to the start function for the process' initial thread.
void __declspec(naked) __stdcall LoaderFunction86_32()
{
__asm {
; Use CALL to generate the return address we need to overwrite with the entry point's address
call Loader

Loader:
push ebp
mov ebp, esp
pushad
; int 3 ; Uncomment this for debugging the loader function

; Compute the address of the LOADERFUNCTIONPARAMS block. It will be at the page boundary beneath this code
mov ebx, [ebp+4]
and ebx, 0xFFFFF000

; If the parameter block is in a file mapping, lock it, first
mov edx, [ebx]LOADERFUNCTIONPARAMS.hParamsSection

test edx, edx
jz LoadDLL

push 0
push 0
push 0
push FILE_MAP_WRITE
push edx
call [ebx]LOADERFUNCTIONPARAMS.lpfnMapViewOfFile

test eax, eax
jz Failure

LoadDLL: ; Call LoadLibraryA to load DLL.
lea edx, [ebx]LOADERFUNCTIONPARAMS.szDLLFilePath
push edx
call [ebx]LOADERFUNCTIONPARAMS.lpfnLoadLibraryA

test eax, eax
jz Failure

LibraryLoaded: ; Now call the patcher entry point, if there is one
cmp [ebx]LOADERFUNCTIONPARAMS.nPatcherRVA, 0
je RewriteEntryPoint

lea ecx, [ebx]LOADERFUNCTIONPARAMS.byPatcherData
add ecx, (PATCHER_DATA_ALIGNMENT - 1) // Align the data on a 16 byte boundary
and ecx, ~(PATCHER_DATA_ALIGNMENT - 1)
mov edx, [ebx]LOADERFUNCTIONPARAMS.nPatcherDataLen
add eax, [ebx]LOADERFUNCTIONPARAMS.nPatcherRVA
push edx
push ecx
call eax

test eax, eax
jz Failure

RewriteEntryPoint: ; Put the original bytes from the entry point back
mov edx, [ebx]LOADERFUNCTIONPARAMS.nReturnAddress
lea esi, [ebx]LOADERFUNCTIONPARAMS.jmpOverwritten
mov edi, edx
mov ecx, size JMP32
rep movsb
mov [ebp+4], edx ; Set the return address to the entry point

Done: ; Patching completed successfully. Acknowledge success and return to the entry point.
mov [ebx]LOADERFUNCTIONPARAMS.nErrCode, NO_ERROR
mov [ebx]LOADERFUNCTIONPARAMS.bCompleted, TRUE

popad
mov esp, ebp
pop ebp
ret

Failure: ; Save GetLastError value and call ExitProcess
call [ebx]LOADERFUNCTIONPARAMS.lpfnGetLastError
mov [ebx]LOADERFUNCTIONPARAMS.nErrCode, eax
push 0
;mov [ebx]LOADERFUNCTIONPARAMS.bCompleted, TRUE
call [ebx]LOADERFUNCTIONPARAMS.lpfnExitProcess
};
}

// Get the entry point for a module from its file path
bool FindModuleEntryPoint(LPCSTR lpszFilePath, UINT_PTR &lpfnEntryPoint)
{
assert(lpszFilePath);

// Map the module as a data file (essentially as a memory mapped file)
HMODULE hModule = LoadLibraryEx(lpszFilePath, NULL, LOAD_LIBRARY_AS_DATAFILE);
if (!hModule)
return false;

bool bSuccess = false;

// Wrap code in a try-except block, since we're going to be working with unverified pointers
__try
{
// Find the DOS header. An HMODULE is a pointer to the module in memory, but LoadLibrary stores flags in the lower bits of the HMODULE.
IMAGE_DOS_HEADER *lpDosHeader = (IMAGE_DOS_HEADER *)((UINT_PTR)hModule & ~(UINT_PTR)0xFFF);

if (lpDosHeader->e_magic == IMAGE_DOS_SIGNATURE && lpDosHeader->e_lfanew)
{
// Locate the NT headers
DWORD *lpNTSignature = (DWORD *)((UINT_PTR)lpDosHeader + lpDosHeader->e_lfanew);
IMAGE_FILE_HEADER *lpNTHeader = (IMAGE_FILE_HEADER *)((UINT_PTR)lpNTSignature + sizeof(DWORD));
IMAGE_OPTIONAL_HEADER32 *lpOptHeader = (IMAGE_OPTIONAL_HEADER32 *)((UINT_PTR)lpNTHeader + IMAGE_SIZEOF_FILE_HEADER);

if (*lpNTSignature == IMAGE_NT_SIGNATURE)
{
lpfnEntryPoint = lpOptHeader->AddressOfEntryPoint + lpOptHeader->ImageBase;

bSuccess = true;
}
}
}
__except (EXCEPTION_EXECUTE_HANDLER)
{ }

FreeLibrary(hModule);

return bSuccess;
}
// Finds the entry point of the target executable, saves the entry point data, and overwrites the entry point with the JMP instruction
bool HookModuleEntryPoint32(LPCSTR lpszFilePath, HANDLE hProcess, LOADERFUNCTIONPARAMS *lpParamsBlock, UINT_PTR &lpfnEntryPoint, JMP32 &jmpOverwritten)
{
assert(lpParamsBlock);

// Find the entry point for the module
if (!FindModuleEntryPoint(lpszFilePath, lpfnEntryPoint))
return false;

// Protect against access violations
__try
{
// Unprotect where we need to read/write
DWORD nOldProtect;
if (!VirtualProtectEx(hProcess, (void *)lpfnEntryPoint, sizeof(JMP32), PAGE_EXECUTE_READWRITE, &nOldProtect))
return false;

// Get the old entry point
SIZE_T nBytesRead;

if (!ReadProcessMemory(hProcess, (void *)lpfnEntryPoint, &jmpOverwritten, sizeof(JMP32), &nBytesRead) || nBytesRead != sizeof(JMP32))
return false;

// Write the JMP to the entry point
SIZE_T nBytesWritten;
JMP32 jmp;

// Compute the relative offset of the loader function
DWORD nLoaderAddress = (DWORD)&lpParamsBlock->fnLoaderFunction;

jmp.nRelOffset = nLoaderAddress - (lpfnEntryPoint + sizeof(jmp));

if (!WriteProcessMemory(hProcess, (void *)lpfnEntryPoint, &jmp, sizeof(jmp), &nBytesWritten) || nBytesWritten != sizeof(jmp))
return false;

return true;
}
__except (EXCEPTION_EXECUTE_HANDLER)
{ return false; }
}

// Wait until the loader function, for better or worse, has finished. Return value is the error code from the process
bool GetLoaderErrorCode(HANDLE hProcess, LOADERFUNCTIONPARAMS *lpParamsMemory, DWORD &nErrCode)
{
// The plan is very simple: poll the parameter block every 10 ms to check for completion. Also watch the process HANDLE for termination.
SIZE_T nBytesRead;

while (WaitForSingleObject(hProcess, 10) != WAIT_OBJECT_0)
{
// Read the completion indicator flag
BOOL bCompleted;

if (!ReadProcessMemory(hProcess, &lpParamsMemory->bCompleted, &bCompleted, sizeof(bCompleted), &nBytesRead) || nBytesRead != sizeof(bCompleted))
return false;

if (bCompleted)
break;
}

// Read the error code and return
if (!ReadProcessMemory(hProcess, &lpParamsMemory->nErrCode, &nErrCode, sizeof(nErrCode), &nBytesRead) || nBytesRead != sizeof(nErrCode))
return false;

return true;
}

// May fail for two reasons: unable to allocate the memory, or this is a Windows 9x machine. If the latter, bIsNT will be false
bool InjectDLLAndResumeProcessNT(HANDLE hProcess, HANDLE hThread, LPCSTR lpszFilePath, LOADERFUNCTIONPARAMS &params, const void *lpPatcherData, size_t nPatcherDataLen, bool &bIsNT, DWORD &nErrCode)
{
if (nPatcherDataLen)
assert(lpPatcherData);

// We don't know if we're on NT or 9x, and the version APIs can be easily fooled. Do it by trial and error: try to use VirtualAllocEx, and fall back to file mappings if VirtualAllocEx isn't available.
bIsNT = false;

HMODULE hKernel32 = GetModuleHandle("Kernel32");

VirtualAllocExPtr lpfnVirtualAllocEx = (VirtualAllocExPtr)GetProcAddress(hKernel32, "VirtualAllocEx");
VirtualFreeExPtr lpfnVirtualFreeEx = (VirtualFreeExPtr)GetProcAddress(hKernel32, "VirtualFreeEx");

if (!lpfnVirtualAllocEx || !lpfnVirtualFreeEx)
return false;

// Windows 9x usually has stubs for VirtualAllocEx and VirtualFreeEx, so we still don't know if they're really there. Try to allocate the memory.
LOADERFUNCTIONPARAMS *lpParamsMemory = (LOADERFUNCTIONPARAMS *)lpfnVirtualAllocEx(hProcess, 0, sizeof(LOADERFUNCTIONPARAMS) + nPatcherDataLen, MEM_COMMIT, PAGE_EXECUTE_READWRITE);

// The moment of truth: NT or 9x?
if (lpParamsMemory || GetLastError() != ERROR_CALL_NOT_IMPLEMENTED)
bIsNT = true;

if (!lpParamsMemory)
return false;

bool bSuccess = false;

// This is Windows NT
// Hook the entry point
if (HookModuleEntryPoint32(lpszFilePath, hProcess, lpParamsMemory, params.nReturnAddress, params.jmpOverwritten))
{
// Compute the offset to write the patcher data at.
BYTE *lpPatcherDataMemory = (BYTE *)ALIGN_PATCHER_DATA(lpParamsMemory->byPatcherData);

// Write the parameters and patcher data
SIZE_T nBytesWritten;

if (WriteProcessMemory(hProcess, lpParamsMemory, &params, sizeof(params), &nBytesWritten) && nBytesWritten == sizeof(params))
{
if (!nPatcherDataLen || (WriteProcessMemory(hProcess, lpPatcherDataMemory, lpPatcherData, nPatcherDataLen, &nBytesWritten) && nBytesWritten == nPatcherDataLen))
{
// It's all set. Let it run until the loader function finishes.
if (ResumeThread(hThread) != (DWORD)-1)
bSuccess = GetLoaderErrorCode(hProcess, lpParamsMemory, nErrCode);
}
}
}

// Free the memory
lpfnVirtualFreeEx(hProcess, lpParamsMemory, 0, MEM_RELEASE);

return bSuccess;
}

bool InjectDLLAndResumeProcess9x(HANDLE hProcess, HANDLE hThread, LPCSTR lpszFilePath, LOADERFUNCTIONPARAMS &params, const void *lpPatcherData, size_t nPatcherDataLen, DWORD &nErrCode)
{
if (nPatcherDataLen)
assert(lpPatcherData);

// We're on 9x. Use a file mapping.
HANDLE hMapping = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, sizeof(LOADERFUNCTIONPARAMS) + nPatcherDataLen, NULL);
if (!hMapping)
return false;

bool bSuccess = false;

// Map the file mapping so we can write to it
LOADERFUNCTIONPARAMS *lpParamsMemory = (LOADERFUNCTIONPARAMS *)MapViewOfFile(hMapping, FILE_MAP_WRITE, 0, 0, 0);
if (lpParamsMemory)
{
// Overwrite the entry point and get the old one
if (HookModuleEntryPoint32(lpszFilePath, hProcess, lpParamsMemory, params.nReturnAddress, params.jmpOverwritten))
{
// Duplicate the file mapping HANDLE into the target process
if (DuplicateHandle(GetCurrentProcess(), hMapping, hProcess, &params.hParamsSection, 0, FALSE, DUPLICATE_SAME_ACCESS))
{
BYTE *lpPatcherDataMemory = (BYTE *)ALIGN_PATCHER_DATA(lpParamsMemory->byPatcherData);

// Copy the patcher data
memcpy(lpParamsMemory, &params, sizeof(params));
memcpy(lpPatcherDataMemory, lpPatcherData, nPatcherDataLen);

// Let the loader run
if (ResumeThread(hThread) != (DWORD)-1)
bSuccess = GetLoaderErrorCode(hProcess, lpParamsMemory, nErrCode);
}
}

// Unmap the view
UnmapViewOfFile(lpParamsMemory);
}

// Close the file mapping
CloseHandle(hMapping);

return bSuccess;
}

// Allocates the parameter struct in the foreign process and sets the members
bool InjectDLLAndResumeProcess(HANDLE hProcess, HANDLE hThread, LPCSTR lpszExecPath, LPCSTR lpszDLLFilePath, UINT_PTR nPatcherRVA, const void *lpPatcherData, size_t nPatcherDataLen, DWORD &nErrCode)
{
assert(hProcess);
assert(lpszExecPath);
assert(lpszDLLFilePath);
assert(strlen(lpszDLLFilePath) < MAX_PATH);

HMODULE hKernel32 = GetModuleHandle("Kernel32");

// Construct a local copy of the param block and initialize it
LOADERFUNCTIONPARAMS params;

params.hParamsSection = NULL;

params.bCompleted = FALSE;

params.lpfnLoadLibraryA = GetProcAddress(hKernel32, "LoadLibraryA");
params.lpfnMapViewOfFile = GetProcAddress(hKernel32, "MapViewOfFile");
params.lpfnGetLastError = GetProcAddress(hKernel32, "GetLastError");
params.lpfnExitProcess = GetProcAddress(hKernel32, "ExitProcess");

params.nPatcherRVA = nPatcherRVA;
params.nPatcherDataLen = nPatcherDataLen;

strcpy(params.szDLLFilePath, lpszDLLFilePath);

#ifdef _DEBUG
// In debug build in VC++, "LoaderFunction86_32" is actually a JMP stub. Find the real function.
JMP32 *pJmpStub = (JMP32 *)LoaderFunction86_32;
LPBYTE lpbyLoaderFunction = (LPBYTE)(pJmpStub->nRelOffset + (DWORD)LoaderFunction86_32 + sizeof(JMP32));

memcpy(&params.fnLoaderFunction, lpbyLoaderFunction, LOADER_MAX_SIZE);
#else
memcpy(&params.fnLoaderFunction, LoaderFunction86_32, LOADER_MAX_SIZE);
#endif

// The patcher data will be written directly into the process, because it occupies extra data after the struct

// Try to patch using the NT method first. If it's not NT, use the 9x method.
bool bIsNT = false;

if (InjectDLLAndResumeProcessNT(hProcess, hThread, lpszExecPath, params, lpPatcherData, nPatcherDataLen, bIsNT, nErrCode))
return true; // Successfully patched with the NT method
else if (!bIsNT && InjectDLLAndResumeProcess9x(hProcess, hThread, lpszExecPath, params, lpPatcherData, nPatcherDataLen, nErrCode))
return true;

return false; // Patching failed
}

2 comments:

Anonymous said...: Have you seen Unsanity's Application Enhancer for Mac OS X? It exploits the some of the same ideas that you've covered here, using derivatives of Jonathan Rentzsch's mach_inject and mach_override.

I think something like APE for Windows would be pretty cool, and you've pretty much done all the work. Have you considered it?

On an unrelated note, you seem to use inline code and assembly frequently. Due to the proportional font in use and the narrow Blogger template, it can be a little awkward to read. Have you considered using a monospaced font for code, and possibly putting it in non-wrapping, scrollable text form element?; 9:49 PM
Anonymous said...: Interesting read... However, there are some things which can be done much more efficiently. I have written my version which has the following differences:
- you CAN get the base address of the executable in memory, even if it is relocated (you just need to get the module handle, use EnumProcessModules)
- i don't use inline assembly, and my assembly code is about 6 lines of assembly code

But, thank you a lot for the idea of just writing the code you overwrite back, as i first went on to use a disassembler so i could figure the correct opcode sizes (which is not needed).
If you are interested in my version you can drop me an e-mail on mrbrdo at email dot si; 3:36 AM

Q & Stuff

Search This Blog

Thursday, July 21, 2005

The Art of Breaking and Entering - Thread Hijacking

2 comments:

About Me

Tweets

Currently Watching

Labels

Blog Archive

Links

Q & Stuff

Search This Blog

Thursday, July 21, 2005

The Art of Breaking and Entering - Thread Hijacking

2 comments:

About Me

Tweets

Currently Watching

Labels

Blog Archive

Links

Subscribe To