Grooming the iOS Kernel Heap

Part 2: Heap Overflows and the iOS Kernel Heap

In my previous posts, I talked about the general strategy used in an iOS exploit to turn a heap overflow vulnerability into a use after free vulnerability. The reason the exploit developer did this was because the attacker had little control over the heap overflow itself; the data that spilled past the end of the allocation would corrupt a neighboring object by overwriting the beginning of it with a kernel heap pointer that the exploit developer can’t control with any precision. The exploit developer therefore needed to find a good victim object that could be corrupted with this kernel pointer. They could then free the underlying data belonging to that pointer, and any further use of the victim object’s pointer would immediately be a use-after-free.

While it is a great strategy to use vulnerability conversion to turn hard-to-exploit vulnerabilities into easier-to-exploit vulnerabilities, exploiting this bug still requires a bit of work to get the victim object to sit just after the overflow object. Only then will the overflow definitely corrupt the victim object and not some other random object on the heap. The theory may be great, but getting it to actually happen in practice requires a “heap groom”.

The need for heap grooms is one of the big differences between heap exploitation and stack exploitation, and a reason why heap overflows tend to be much more complex to exploit. Let’s take a look in more detail at how it works.

Heap overflows vs stack-overflows

In traditional stack-based buffer overflows, data spills past the end of a stack-allocation, corrupting data at higher addresses on the stack. The data that gets overwritten is the local variables and temporary data of the current function and parent functions, as well as control-flow metadata such as return addresses. With stack overflows, the layout of this data is very predictable. The exact layout of these stack frames is chosen at compile-time and burned into the binary, and the order of those stack frames above the vulnerable function is directly related to the call-stack at the point where the vulnerability triggers, which is usually fixed for any given exploit.

Heap buffer overflows are a bit less straightforward to exploit than an equivalent stack overflow. The data that lies after a heap allocation is not a compiled-in stack frame, but some other heap object. The exact heap object that is is dependent on runtime factors rather than compile-time factors. These runtime factors are hard for an exploit writer to predict, and not only vary from device to device, but even consecutive runs against the same target will end up with different initial heap states.

To combat this non-determinism and build a reliable exploit, the exploit developer must first get the target process’ heap into a reliable state. In this case, we need a victim object allocation to sit with very high probability immediately after an overflowing object’s allocation. Only then will it be safe for the overflow to occur, corrupting our victim object into a useful primitive. In exploit terminology, this heap normalizing step is called a “heap groom”.

For this exploit, most of the heap allocations we care about come from the iOS kernel’s kalloc memory allocator, which is based on zalloc. Zalloc groups similarly-sized allocations into “zones” which, for the most part, operate independently of each other. In this particular exploit, the allocations we care about occur in a zone called kalloc.4096. It is in this zone that our overflow will occur.

Taking out the trash: filling the heap gaps

Like most heap allocators, zalloc prioritizes recycling memory when free chunks are available, and only tries to allocate “new” memory as a last resort when recycled memory is not available. To do this, every zalloc zone maintains three singly-linked lists, one of which consists of “free” chunks to be recycled, and this list operates as a first-in-last-out allocator (this specifically applies to this exploit which targeted iOS 10!).

If our victim and overflow object allocations come directly from this initially-disordered free list, there is a very good chance that our overflow and victim objects will end up using recycled allocations far away from each other. If we triggered our overflow at that point we would end up destroying a random kernel object, very likely triggering a device crash.

Thankfully, we can “clean out” this list fairly easily. We just need to perform lots of allocations, and quickly enough the free list will be emptied, leaving the kalloc.4096 heap with no free “gaps” left.

Once the kalloc.4096 free list is empty, zalloc has no choice but to service new memory allocation requests by bringing new memory into the zone itself. In iOS’ zalloc, this is done first via the operating system’s low-level kernel_memory_allocate (and ultimately vm_map_find_space) page allocator. This expands kalloc.4096 zone by one “kernel page”. On older devices this is 4KB at a time, and on the iPhone 6S and above, this is 16KB at a time.

Having obtained a brand new kernel page from the low-level allocator, this “new memory” is broken up into 4096-byte “free” chunks, which are then added to the free list. In iOS up to version 9.2 this was done in a predictable order, however in iOS 9.2 and above, these new chunks are inserted into the free list in a randomizing order as a specific defense against heap grooming during exploits.

The Heap Groom Strategy

Regardless of the exact device and OS version, the top-level strategy of the heap groom can be seen as follows: First, we fill up the gaps to remove any unpredictable initially-free allocations in the kalloc.4096 zone. We don’t know how many holes there are, but overallocating here does no real harm. Eventually, all of the gaps in the kalloc.4096 zone are filled, and the allocations that follow then come from new pages brought in and carved up from kernel_memory_allocate, which are much more predictably clustered.

Once the gaps are full, the exploit developer sets about creating several what I’ll call “exploit zones”. Each “exploit zone” is a collection of victim candidate objects and a single “placeholder object”. The victim candidate is the recv_msg_elem array with the length chosen so that it gets allocated in the kalloc.4096 zone. The basic goal with these “exploit zones” is to create a group of allocations where the victim object follows a placeholder object with a good probability. Just before the exploit is triggered, the exploit developer releases all of the placeholder objects back to zalloc. This releases all of the blocking objects to the kalloc.4096 free list so that when the overflowing object is allocated, it finds itself immediately before a victim object with high likelihood.

The exact object used for the placeholder allocation is not important; it just matters that we can control the allocation’s size and have fine-grained control over when the object is released back to zalloc. It turns out that an out-of-line allocation associated with an iOS mach port is an ideal candidate for this, and this is what it is used in this exploit. By crafting the size of the message, we can exactly specify the size of kalloc allocation and when the corresponding mach port is destroyed, the corresponding allocation will be immediately released back to kfree.

For old devices, the exploit creates 17 “exploit zones”, each with 7 victim candidates and one placeholder. For newer devices, the exploit creates 20 “exploit zones”, each with 15 victim candidates and one placeholder. Having created these several exploit zones, the exploit developer “activates” them by freeing all of the corresponding placeholder objects very quickly. The exploit developer activates these arenas in allocation-order, which means that the kalloc.4096 zone’s free list fills up as “holes” in the exploitation arenas where the placeholder objects were. Since the kalloc.4096 free list releases elements in a last-in-first-out order, this means the next 4096-byte allocation will come from the last exploitation arena, the one after will come from the second-to-last arena, and so on. This has the effect of creating a sudden series of kalloc.4096 zone free entries, each of which is likely followed by a victim object candidate.

Now all the exploit developer needs to do is arrange for the vulnerable (overflowing) object to be allocated as quickly as possible while the kalloc.4096 heap is in this state. If all goes according to plan, the overflowing object will be allocated from one of these free entries and therefore lands inside an “exploit zone” immediately before one of these recv_msg_elem array victim candidate objects. Now when the overflow occurs, it will overflow onto one of the adjacent victim objects on the heap.

The victim object and converting the overflow to a UAF

In the previous post we saw that the exploit developer’s strategy was to turn this heap overflow into a use-after-free. Each element in the recv_msg_elem array begins with a pointer to a UIO structure, which is kalloc’d when the array is created and kfree’d when the array is destroyed.

The final step taken by the exploit developer is to simply arrange for all the recv_msg_elem objects in all of the exploit zones to be released, but keep a reference to the overflowing object. Each of the non-corrupted entries in each of the recv_msg_elem arrays is released in the traditional boring way; their corresponding UIO being released back to the system via kfree. But for the corrupted victim object, something more interesting happens: when it tries to release its UIO buffer, the pointer passed to kfree is the pointer that overflowed the array! This causes the IOAccelResource object to become freed, even though a reference to it is still accessible.

In the next post we’ll look at use-after-frees in more detail, and see how the exploit developer can do some more exploit gymnastics to turn this use-after-free into an information leak exploit primitive to de-ASLR the kernel heap and code, and in the post after that we’ll look at how the exploit developer puts all of these primitives together to take full control of the iOS kernel.

ARM Exploit Development

New ARM Assembly Cheat Sheet