woensdag 12 maart 2014

Memory editing on mac os x

Since Apple decided to partially disable trace on os x, I searched for a replacement using the mach API. This is the result, useful for debugging or modifying external memory.

woensdag 27 november 2013

Hooking explained: detouring library calls and vtable patching in Windows/Linux/MAC-OSX

Oftentimes, it can be useful to modify the behavior of an application without making extensive changes to the source code of the application. Specifically, one might want to intercept calls of certain functions to execute custom code before or after the execution of the original code, or one might want to retrieve or modify the parameters passed to a function. For example, it might be necessary to instrument the application for performance analysis or to add additional features to a program. In these cases when one does not have the source code available for the program, it is still possible to modify the code. Here i will present the techniques i use for the different operating systems.
Please note that i don't claim that these techniques are the best solutions for all cases.

Appendix A: Windows DLL Injection 
Appendix B: Import Address Table Hooking (IAT) 

Appendix C: MS-Detours 1.5 (Direct3d) 
Appendix D: virtual table patching 
Appendix E: Example : hiding process(es) under windows

§1 Shared Libraries & Injection/Loading

Shared libraries are code objects that may be loaded during execution into the memory space associated with a process. Library code may be shared in memory by multiple processes as well as on disk. If virtual memory is used, processes execute the same physical page of RAM, mapped into the different address spaces of each process. This has advantages. For instance on some systems, applications were often only a few hundred kilobytes in size and loaded quickly; the majority of their code was located in libraries that had already been loaded for other purposes by the operating system.

To change code in another process we must load our own shared library in the address space of the other process. On UNIX platforms (Linux/MAC-OSX) this can be achieved using the LD_PRELOAD environment variable, which instructs the loader to load the specific shared libraries. Function and other symbol definitions in the specified libraries will be used instead of the original ones.
However on Windows systems there is no such thing as LD_PRELOAD, to achieve the same result we must use a little exploit called DLL Injection (On Windows shared libraries are .DLL's, on Linux .so's and on MAC-OSX .dylib's). See Appendix A below for more information.

§2 Hooking/Detouring function calls

§2.1 UNIX/Linux

UNIX offers a simple way to override functions in a shared library with the LD_PRELOAD environment variable. When you make a twin brother of a function that is defined in an existing shared library, put it in your shared library, and you register your shared library name in DYLD_INSERT_LIBRARIES, your function is used instead of the original one. It is exactly the same as MAC-OSX (see below) but use LD_PRELOAD instead of DYLD_INSERT_LIBRARIES .

§2.2 MAC-OSX

Since MAC-OSX is also UNIX based it's almost exactly the same as in Linux, only they have renamed LD_PRELOAD to DYLD_INSERT_LIBRARIES and .so to .dylib. In this example I've detoured fopen from a test program. In 2003 Jonathan Rentzsch showed ways of detouring in MAC-OSX and released mach_star, but this method is way easier.

The dummy program:
Just a simple program that calls fopen.

int main(int argc, char** argv) {
 printf("original program start\n");
 FILE* fileptr = fopen("hey.txt", "w"); // create a new file
 printf("original program quit");
 return 0;

Our detour library:
This function will get called instead of the original one (see the intro), but we still need to call the original afterwards that's what we use dlsym for.
 #include <stdio.h>  
 #include <dlfcn.h>  
 FILE* fopen(const char *path, const char *mode)  
 printf("Detoured fopen\n");  
 FILE* (*real_fopen)(const char*, const char*) =  
 (FILE* (*)(const char*, const char*)) dlsym(RTLD_NEXT, "fopen");  
 return real_fopen (path, "r"); // note r instead of w, this will prevent the program from creating files  

Compiling the library:
 gcc -fno-common -c fopenwrap.c  
 gcc -dynamiclib -o libhook.dylib fopenwrap.o  

Running the program with DYLD_INSERT_LIBRARIES.
You also need to define DYLD_FORCE_FLAT_NAMESPACE (doesn't matter what value it has).You can use the same technique to override a method in a class. Say there's a method named "libfff" in a class AAA.

 class AAA   
   int m;   
   AAA(){m = 1234;}   
   void libfff(int a);   
To override it, you first need to know the mangled symbol name of the method.

 $ nm somelibrary.dylib | grep "T "   
 00000ed6 T __ZN3AAA3fffEi    

Then what you need to define is _ZN3AAA3fffEi. Don't forget removing the first '_'. If you see multiple symbols in the shared library and not sure which one to override, you can check it by de-mangling a symbol.
 $ c++filt __ZN3AAA3fffEi   

Now we can detour it like this.
 #include <stdio.h>  
 #include <dlfcn.h>  
 #include <unistd.h>  
 typedef void (*AAAlibfffType)(AAA*, int);  
 static void (*real_AAAlibfff)(AAA*, int);  
 extern "C"{  
 void _ZN3AAA3fffEi(AAA* a, int b){  
   printf("%d, %d", b, a->m);  
   void* handle = dlopen("sharedlib.dylib", RTLD_NOW);  
   real_AAAlibfff = (AAAfffType)dlsym(handle, "_ZN3AAA3fffEi");  
  if (real_AAAlibfff) printf("OK");  
  real_AAAlibfff(a, b);  

§2.3 Microsoft Windows

This is the framework of a standard API hook. All of this resides in a DLL that will be injected into a process. For this example, I chose to hook the MessageBoxW function. Once this DLL is injected, it will get the address of the MessageBoXW function from user32.dll, and then the hooking begins. In the BeginRedirect function, an unconditional relative jump (JMP) opcode (0xE9) instruction will contain the distance to jump to. The source is fully commented.

 #include <windows.h>  
 #define SIZE 6  
 typedef int (WINAPI *pMessageBoxW)(HWND, LPCWSTR, LPCWSTR, UINT);  // Messagebox protoype
 int WINAPI MyMessageBoxW(HWND, LPCWSTR, LPCWSTR, UINT);            // Our detour
 void BeginRedirect(LPVOID);                                        
 pMessageBoxW pOrigMBAddress = NULL;                                // address of original
 BYTE oldBytes[SIZE] = {0};                                         // backup
 BYTE JMP[SIZE] = {0};                                              // 6 byte JMP instruction
 DWORD oldProtect, myProtect = PAGE_EXECUTE_READWRITE;  
   case DLL_PROCESS_ATTACH:                                        // if attached
     pOrigMBAddress = (pMessageBoxW)                      
       GetProcAddress(GetModuleHandle("user32.dll"),               // get address of original 
     if(pOrigMBAddress != NULL)  
       BeginRedirect(MyMessageBoxW);                               // start detouring
     memcpy(pOrigMBAddress, oldBytes, SIZE);                       // restore backup
   return TRUE;  
 void BeginRedirect(LPVOID newFunction)  
   BYTE tempJMP[SIZE] = {0xE9, 0x90, 0x90, 0x90, 0x90, 0xC3};         // 0xE9 = JMP 0x90 = NOP oxC3 = RET
   memcpy(JMP, tempJMP, SIZE);                                        // store jmp instruction to JMP
   DWORD JMPSize = ((DWORD)newFunction - (DWORD)pOrigMBAddress - 5);  // calculate jump distance
   VirtualProtect((LPVOID)pOrigMBAddress, SIZE,                       // assign read write protection
           PAGE_EXECUTE_READWRITE, &oldProtect);  
   memcpy(oldBytes, pOrigMBAddress, SIZE);                            // make backup
   memcpy(&JMP[1], &JMPSize, 4);                              // fill the nop's with the jump distance (JMP,distance(4bytes),RET)
   memcpy(pOrigMBAddress, JMP, SIZE);                                 // set jump instruction at the beginning of the original function
   VirtualProtect((LPVOID)pOrigMBAddress, SIZE, oldProtect, NULL);    // reset protection
 int WINAPI MyMessageBoxW(HWND hWnd, LPCWSTR lpText, LPCWSTR lpCaption, UINT uiType)  
   VirtualProtect((LPVOID)pOrigMBAddress, SIZE, myProtect, NULL);     // assign read write protection
   memcpy(pOrigMBAddress, oldBytes, SIZE);                            // restore backup
   int retValue = MessageBoxW(hWnd, lpText, lpCaption, uiType);       // get return value of original function
   memcpy(pOrigMBAddress, JMP, SIZE);                                 // set the jump instruction again
   VirtualProtect((LPVOID)pOrigMBAddress, SIZE, oldProtect, NULL);    // reset protection
   return retValue;                                                   // return original return value

The reason why we restore the backup before getting the return value is because if we don't do it we will get an infinite loop, we call a function that jumps to the function that calls the function again etc etc.. If you change the parameters of the call to MessageBoxW inside MyMessageBoxW every messagebox that the DLL is injected to will have those parameters. See appendix C for the MS-Detours method which is way easier and recommended.
See the diagram:

Appendix A: Windows DLL injection

NOTE: the easy way is at the end of this appendix, i will start with the hardcore method first.
Welcome to appendix A, here i will explain how to make another process load our DLL. What we do is allocate a chunk of memory in the target process with our assembly function which calls LoadLibrary, we also need to allocate space for our DLL path name. Next we suspend the main thread of our target and modify the register that holds the next instruction to be executed. Than we patch our allocated function to return/call the right addresses. When we are done we resume the main thread.

 #define PROC_NAME lorem.exe                          // block A-1
 #define DLL_NAME  ipsum.dll

  // main()
  void *dllString, *vfunc;
  unsigned long ulproc_id, threadID, funcLen, oldIP, oldprot, loadLibAddy;
  HANDLE hProcess, hThread;
  CONTEXT ctx;
  funcLen = (unsigned long)loadDll_end - (unsigned long)loadDll;
  loadLibAddy = (unsigned long)GetProcAddress(GetModuleHandle("kernel32.dll"), "LoadLibraryA");

  // This code is pretty straightforward
  ulproc_id  = GetProcIdFromName(PROC_NAME);  //  see A-4   
  hProcess   = OpenProcess((PROCESS_VM_WRITE | PROCESS_VM_OPERATION), false, ulproc_id);  
  vdllString = VirtualAllocEx(hProcess, NULL, (strlen(DLL_NAME) + 1), MEM_COMMIT, PAGE_READWRITE);  
  vfunc      = VirtualAllocEx(hProcess, NULL, funclen, MEM_COMMIT, PAGE_EXECUTE_READWRITE);  
  WriteProcessMemory(hProcess, vdllString, DLL_NAME, strlen(DLL_NAME), NULL);   

To continue we'll need a handle to a thread in the process, to achieve this one can use this function show in block A-2.

unsigned long GetThreadFromProc(char *procName) {  // block A-2
  HANDLE thSnapshot, hProcess;  
  BOOL retval, ProcFound = false;  
  unsigned long pTID, threadID;  
  thSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);  
  if(thSnapshot == INVALID_HANDLE_VALUE)  {  
    MessageBox(NULL, "Error: unable to create toolhelp snapshot", "Loader", NULL);  
    return false;  
  pe.dwSize = sizeof(PROCESSENTRY32);  
  retval = Process32First(thSnapshot, &pe);  
  while(retval)  {  
    if(StrStrI(pe.szExeFile, procName) )  {  
     ProcFound = true;  
    retval  = Process32Next(thSnapshot,&pe);  
    pe.dwSize = sizeof(PROCESSENTRY32);  
   _asm {  
    mov eax, fs:[0x18]  
    add eax, 36  
    mov [pTID], eax  
   hProcess = OpenProcess(PROCESS_VM_READ, false, pe.th32ProcessID);  
   ReadProcessMemory(hProcess, (const void *)pTID, &threadID, 4, NULL);  
  return threadID;  

This is a prototype for the function we are going to allocate in the target process which will call loadlibrary, the addresses are left blank because we patch them later on when we have the right values.

 __declspec(naked) loadDll(void) {  // prototype function
    push 0xFFFFFFFF        //  Placeholder for the return address  
    //  Save the flags and registers  
    push 0xFFFFFFFF       //  Placeholder for the string address  
    mov eax, 0xFFFFFFFF   //  Placeholder for loadlibrary  
    call eax              //  Call LoadLibrary with the string parameter  
    //  Restore the registers and flags  
__declspec(naked) loadDll_end(void){

Now, we need to pause the thread in order to get it's "context". The context of a thread is the current state of all of it's registers, as well as other peripheral information. However, we're mostly concerned with the EIP register, which points to the next instruction to be executed. So, if we don't suspend the thread before retrieving its context information, it'll continue executing and by the time we get the information, it'll be invalid. Once we've paused the thread, we'll retrieve it's context information using the GetThreadContext() function. We'll grab the value of the current next instruction to be executed, so that we know where our function should return to. Then it's just a matter of patching up the function to have all of the proper pointers, and forcing the thread to execute it. (A-3)

 unsigned long threadID;   // block A-3
 HANDLE hThread;  
 threadID = GetThreadIdFromProc(PROC_NAME);  

 ctx.ContextFlags = CONTEXT_CONTROL;
 GetThreadContext(hThread, &ctx);
 oldIP  = ctx.Eip;
 //Set the EIP of the context to the address of our stub
 ctx.Eip = (DWORD)stub;
 ctx.ContextFlags = CONTEXT_CONTROL;
 // patch the prototype
 VirtualProtect(loadDll, funclen, PAGE_EXECUTE_READWRITE, &oldprot);   
 //Patch the first push instruction
 memcpy((void *)((unsigned long)loadDll + 1), &oldIP, 4);   
 //Patch the 2nd push instruction
 memcpy((void *)((unsigned long)loadDll + 8), &dllString, 4);   
 //Patch the mov eax, 0xDEADBEEF to mov eax, LoadLibrary
 memcpy((void *)((unsigned long)loadDll + 13), &loadLibAddy, 4); 

 WriteProcessMemory(hProcess, vfunc, loadDll, funcLen, NULL);  
 //Set the new context of the target's thread
 SetThreadContext(hThread, &ctx);
 //Let the target thread continue execution, starting at our function
 // clean up

 VirtualFreeEx(hProcess, dllString, strlen(DLL_NAME), MEM_DECOMMIT);
 VirtualFreeEx(hProcess, stub, stubLen, MEM_DECOMMIT);

unsigned long GetProcIdFromName(char *procName) {  // block A-4
  HANDLE thSnapshot;  
  BOOL retval, ProcFound = false;  
  thSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);  
  if(thSnapshot == INVALID_HANDLE_VALUE)  {  
    MessageBox(NULL, "Error: unable to create toolhelp snapshot", "Loader", NULL);  
    return false;  
  pe.dwSize = sizeof(PROCESSENTRY32);  
   retval = Process32First(thSnapshot, &pe);  
  while(retval)  {  
    if(StrStrI(pe.szExeFile, procName) )  {  
     ProcFound = true;  
    retval  = Process32Next(thSnapshot,&pe);  
    pe.dwSize = sizeof(PROCESSENTRY32);  
  return pe.th32ProcessID;  

There is another way using the CreateRemoteThread call. It is extremely easy, and relatively efficient. Before starting though, it is important to actually find the process to inject into. The Windows API provides a great function for doing this – CreateToolhelp32Snapshot.

 #undef UNICODE  
 #include <vector>  
 #include <string>  
 #include <windows.h>  
 #include <Tlhelp32.h>  
 using std::vector;  
 using std::string;  
 int main(void)  {  
   PROCESSENTRY32 pe32;  
   pe32.dwSize = sizeof(PROCESSENTRY32);  
   HANDLE hTool32 = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, NULL);  
   BOOL bProcess = Process32First(hTool32, &pe32);  
   if(bProcess == TRUE)  {  
     while((Process32Next(hTool32, &pe32)) == TRUE)  
       processNames.push_back(pe32.szExeFile);                   // store every process name
   return 0;  

I didn’t bother storing the value after I called Process32First because that will always be “[System Process]”, so there’s really no need. Process32Next returns TRUE on success, so just simply putting it in a loop and pushing the name of the process it received in a vector is what is needed. Once the loop is finished, every single process should be stored in processNames. This is great and all, but where does the DLL injection come in? Well, the PROCESSENTRY32 structure also has a member that holds the Process ID. Inside that loop, while we’re pushing the process names in our vector, we’re also going to inject the DLL.

 while((Process32Next(hTool32, &pe32)) == TRUE)  {  
   if(strcmp(pe32.szExeFile, "notepad.exe") == 0)                        // if we found our target process
     char* DirPath = new char[MAX_PATH];                    
     char* FullPath = new char[MAX_PATH];  
     GetCurrentDirectory(MAX_PATH, DirPath);                             // get current directory
     sprintf_s(FullPath, MAX_PATH, "%s\\testdll.dll", DirPath);          // append our dll to the current directory
       PROCESS_VM_WRITE, FALSE, pe32.th32ProcessID);                     // open process with extended access
     LPVOID LoadLibraryAddr = (LPVOID)GetProcAddress(GetModuleHandle("kernel32.dll"),  
       "LoadLibraryA");                                                  // get address of loadlibrary
     LPVOID LLParam = (LPVOID)VirtualAllocEx(hProcess, NULL, strlen(FullPath),  
       MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);                        // allocate some space for the dll path we made
     WriteProcessMemory(hProcess, LLParam, FullPath, strlen(FullPath), NULL);   // write path to process
     CreateRemoteThread(hProcess, NULL, NULL, (LPTHREAD_START_ROUTINE)LoadLibraryAddr,  // create new thread and call loadlibrary with our dll path as parameter
       LLParam, NULL, NULL);  
     delete [] DirPath;   // clean up
     delete [] FullPath;  

The code above is pretty straightforward, we first get the current directory and append our dll name to it so we can later allocate it in the target process memory. Then we create a new thread which calls loadlibrary with our dll path as parameter.

Appendix B: Import Address Table (IAT) Hooking

Before we jump in the Import Address Table you first need to know a bit background information, I'll start with the PE format. The Portable Executable (PE) format is a file format for executables, object code, DLLs, FON Font files, and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data.

One section of note is the import address table (IAT), which is used as a lookup table when the application is calling a function in a different module. It can be in the form of both import by ordinal and import by name. Because a compiled program cannot know the memory location of the libraries it depends upon, an indirect jump is required whenever an API call is made. As the dynamic linker loads modules and joins them together, it writes actual addresses into the IAT slots, so that they point to the memory locations of the corresponding library functions. Though this adds an extra jump over the cost of an intra-module call resulting in a performance penalty, it provides a key benefit: The number of memory pages that need to be copy-on-write changed by the loader is minimized, saving memory and disk I/O time. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimized code that simply results in an indirect call opcode.

IAT hooking has pros and cons:
- The method you are hooking must be imported from another module, you can't just hook a certain address in memory. This is not optimal for directx hooks, since you will only find createdevice (you can use that to get the device tho) but for Opengl and such this is handy.
- Less detectable, you can make this into a fully external hook, that should be undetected for any antivirus/cheat because it also doesn't use any malicious calls.
This will be the procedure for internal (dll must be injected in target process) hooking:
- Retrieve DOS/NT Headers
- loop through the import descriptors

So first we get a handle to our main module:
 int ip = 0;  
 if (module == 0)  
   module = GetModuleHandle(0);    

then we retrieve the headers (warning:Whoever wrote the header file for the PE format is certainly a believer in long, descriptive names, along with deeply nested structures and macros. When coding with WINNT.H, it's not uncommon to have mind blowing expressions):
 // get the DOS header   
  // get the NT header from the dos header   
 PIMAGE_NT_HEADERS pImgNTHeaders = (PIMAGE_NT_HEADERS)((LPBYTE)pImgDosHeaders + pImgDosHeaders-&gt;e_lfanew);   
 // get the import_descriptor from the NT header (its all relative so we keep adding (LPBYTE)pImgDosHeaders PIMAGE_IMPORT_DESCRIPTOR pImgImportDesc = (PIMAGE_IMPORT_DESCRIPTOR)((LPBYTE)pImgDosHeaders + pImgNTHeaders-&gt;OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress); // the size also from the NT header int size = (int)((LPBYTE)pImgDosHeaders + pImgNTHeaders-&gt;OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].Size); // check if the DOS header is a valid dos header  
   if (pImgDosHeaders-&gt;e_magic != IMAGE_DOS_SIGNATURE)  
     printf("e_magic is no valid DOS signature\n");    
Now we basicly have enough information to start making the loops to the function pointer, note that every DLL has its own IMAGE_IMPORT_DESCRIPTOR that's why we loop through all of them:
for (IMAGE_IMPORT_DESCRIPTORiid pImgImportDesciid->Name != NULLiid++){}   

And inside this loop, we loop through the functions, if you add an int to the firsthunk you get to the next thunk and so on.

for (int funcIdx 0; *(funcIdx + (LPVOID*)(iid->FirstThunk + (SIZE_T)module)) != NULLfuncIdx++){}   

Now if you look in the import_desciptor structure you can see the name is on firsthunk +2 so

charname = (*(funcIdx + (SIZE_T*)(iid->OriginalFirstThunk + (SIZE_T)module)) + 

when we have the name we can compare it with our target and patch the address.the function will look like this:
 void** ninehook::IATfind(const char* function, HMODULE module){  
   int ip = 0;  
   if (module == 0)  
     module = GetModuleHandle(0);  
   PIMAGE_DOS_HEADER pImgDosHeaders = (PIMAGE_DOS_HEADER)module;  
   PIMAGE_NT_HEADERS pImgNTHeaders = (PIMAGE_NT_HEADERS)((LPBYTE)pImgDosHeaders + pImgDosHeaders-&gt;e_lfanew);   
   PIMAGE_IMPORT_DESCRIPTOR pImgImportDesc = (PIMAGE_IMPORT_DESCRIPTOR)((LPBYTE)pImgDosHeaders + pImgNTHeaders-&gt;OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);  
   int size = (int)((LPBYTE)pImgDosHeaders + pImgNTHeaders-&gt;OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].Size);  
   if (pImgDosHeaders-&gt;e_magic != IMAGE_DOS_SIGNATURE)  
     printf("e_magic is no valid DOS signature\n");  
   for (IMAGE_IMPORT_DESCRIPTOR* iid = pImgImportDesc; iid-&gt;Name != NULL; iid++){  
     for (int funcIdx = 0; *(funcIdx + (LPVOID*)(iid-&gt;FirstThunk + (SIZE_T)module)) != NULL; funcIdx++){  
       char* modFuncName = (char*)(*(funcIdx + (SIZE_T*)(iid-&gt;OriginalFirstThunk + (SIZE_T)module)) + (SIZE_T)module + 2);  
       if (!_stricmp(function, modFuncName))  
         return funcIdx + (LPVOID*)(iid-&gt;FirstThunk + (SIZE_T)module);  
   return 0;  

And that's it! now we can just patch it:
 DWORD oldrights, newrights = PAGE_READWRITE;  
     VirtualProtect(funcptr, sizeof(LPVOID), newrights, &oldrights);  
     oldfunctionptr = *funcptr;  
     *funcptr = newfunction;  
     VirtualProtect(funcptr, sizeof(LPVOID), oldrights, &newrights);   

Appendix C: MS-Detours 1.5 (Direct3d)
First of all you need to make sure you have MS-Detours 1.5 downloaded and added the corresponding files to your project. I am using version 1.5 because it's the simplest to use, and it does the job nicely.
There is one important function we are going to use, its called DetourFunction. First we are going to need a typedef of the function we are going to hook (endscene in this case, since it gets called AFTER the drawing so we can add code right before that).

#pragma comment(lib, "d3d9.lib")
#pragma comment(lib, "d3dx9.lib")
// not the device is a parameter you can check this by reversing the calls of a real d3d program
typedef HRESULT(WINAPI* tEndScene)(LPDIRECT3DDEVICE9 pDevice);
tEndScene oEndScene = NULL;

Now to actually hook endscene we need to retrieve the address of the original function, this can be done in two ways, the first way is to reverse a sample direct3d program to find the address of the endscene call and add that to the module base of d3d9.dll. And the second way is to use the GetProcAddress function. The problem with the first way is that it is platform dependent, the address is different on 64bit Windows from the 32bit version.

 HMODULE hd3d9 = GetModuleHandle("d3d9.dll");  
 // detourfunction from ms-detours, the first parameter is the original address and the second is our detour function  
 oEndScene =  (tEndScene)DetourFunction( (LPBYTE)GetProcAddress(hd3d9,  "EndScene" ), (LPBYTE)&mEndScene);  
 // where our detour function would look something like this  
     // do evil  
      return oEndScene(pDevice);  

What we did here is retrieve the address with GetProcAddress and pass it as the first parameter, the second parameter is a pointer to our own detour function (hkEndScene). Now you can add drawing function to the original program, benchmarking programs make good use of this.

Appendix D: Virtual Table (Vtable) Patching

Whenever a class defines a virtual function (or method), most compilers add a hidden member variable to the class which points to a so called virtual method table (VMT or Vtable). This VMT is basically an array of pointers to (virtual) functions. At runtime these pointers will be set to point to the right function, because at compile time, it is not yet known if the base function is to be called or a derived one implemented by a class that inherits from the base class. The code below shows an example of a VMT hook, if you want to implement this in direct3d you need to create a new device, and use that to replace the original function in the original device.
 class VirtualTable {  // example class  
 virtual void VirtualFunction01( void );  
 void VirtualTable::VirtualFunction01( void )  {  // just a function as example
 cout << "VirtualFunction01 called" << endl;  
 // pointer to original function  
 typedef void ( __thiscall* VirtualFunction01_t )( void* thisptr );  
 VirtualFunction01_t g_org_VirtualFunction01;  
 //our detour function  
 void __fastcall hk_VirtualFunction01( void* thisptr, int edx )  {  
 cout << "Custom function called" << endl;  
 //call the original function  
 int _tmain(int argc, _TCHAR* argv[])  {  
 VirtualTable* myTable = new VirtualTable();  
 //get the pointer to the actual virtual method table from our pointer to our class instance  
 void** base = *(void***)myTable;  
 DWORD oldProtection;  
 // protection 
 VirtualProtect( &base[0], 4, PAGE_EXECUTE_READWRITE, &oldProtection );  
 //save the original function  
 g_org_VirtualFunction01 = (VirtualFunction01_t)base[0];  
 base[0] = &hk_VirtualFunction01;  
 //restore page protection  
 VirtualProtect( &base[0], 4, oldProtection, 0 );  
 //call the virtual function (now hooked) from our class instance  
 return 0;  

Appendix E: Example : Hiding process under Windows

In this example i will show how one can hook the system call that retrieves the list of processes and modify it so it will skip our process. For this i will use the mhook library but you can also use any other hooking method described in this article. The system call that the task manager uses to retrieve the list of processes is called NtQuerySystemInformation msdn. On msdn we can also find the appropriate structures needed for this call.

      SystemProcessInformation = 5  
  ULONG          NextEntryOffset; // next entry
  ULONG          NumberOfThreads;  
  LARGE_INTEGER      Reserved[3];  
  LARGE_INTEGER      CreateTime;  
  LARGE_INTEGER      UserTime;  
  LARGE_INTEGER      KernelTime;  
  UNICODE_STRING     ImageName; // Process name  
 NTSTATUS (__stdcall *origNtQuerySystemInformation)(SYSTEM_INFORMATION_CLASS, PVOID, ULONG, PULONG); // original functon pointer 

Now all is left is define our detour function and use mhook to hook it.
First i will show our detour function.

 NTSTATUS WINAPI myNtQuerySystemInformation(SYSTEM_INFORMATION_CLASS SysInfoClass, PVOID SysInfo, ULONG SysInfoLength, PULONG RetLength)  
      NTSTATUS Return = origNtQuerySystemInformation(SysInfoClass, SysInfo, SysInfoLength, RetLength);  
      if((SysInfoClass == SystemProcessInformation) && (Return == STATUS_SUCCESS))  
           SYS_PROCESS_INFO* CurrentStructure = (SYS_PROCESS_INFO*)SysInfo;  
           SYS_PROCESS_INFO* NextStructure  = (SYS_PROCESS_INFO*)((int)CurrentStructure + CurrentStructure->NextEntryOffset);  
           while(CurrentStructure->NextEntryOffset != 0){  
                if((wcsncmp(NextStructure->ImageName.Buffer, L"explorer.exe", NextStructure->ImageName.Length) == 0) || ((wcsncmp(NextStructure->ImageName.Buffer, L"notepad.exe", NextStructure->ImageName.Length) == 0)))  
                     if(NextStructure->NextEntryOffset == 0)   {  
                          CurrentStructure->NextEntryOffset = 0;  
                     else  {  
                          CurrentStructure->NextEntryOffset = CurrentStructure->NextEntryOffset + NextStructure->NextEntryOffset;  
                          NextStructure = CurrentStructure;  
                CurrentStructure = NextStructure;  
                NextStructure  = (SYS_PROCESS_INFO*)((int)CurrentStructure + CurrentStructure->NextEntryOffset);  
      return Return;  

What we basically do here is create a loop that checks every process name, once we found our process name we skip our process and return the original call (without our process). Now we hook it using mhook.

      while(hNTDLL == NULL)  {  
           hNTDLL = GetModuleHandle("ntdll.dll");  
      origNtQuerySystemInformation = (NTSTATUS (__stdcall*)(SYSTEM_INFORMATION_CLASS, PVOID, ULONG, PULONG))GetProcAddress(hNTDLL, "NtQuerySystemInformation");  
      Mhook_SetHook((PVOID*)&origNtQuerySystemInformation, myNtQuerySystemInformation);