Some folks lost their touch

In an earlier post, I introduced CHAOTICMARCH - simple tool for simulating a user’s interaction with an App for blackbox testing. The tool worked well and has helped me a lot with testing. However, all was not well. Every now and then, too often for my comfort, the tool’s requests for touch were getting ignored. For example, CHAOTICMARCH would find a button and try to click it. The event would get logged and the little circle would show up on the screen. However, the App would ignore the request as if nothing happened. This become very frustrating to me and I was determined to find the root cause. Investigating this behavior took me down a deep rabbit hole. To find my way out, I built LLDB scripts, learned about iOS IPC and read lots of code. With this post, I would like to share my insights, lessons and scripts.

Not so TL;DR;

iPhones are relatively small devices and, to provide a smooth user experience, Apple has to be really careful with task scheduling. Prioritized task queues are used for this. The queuing system is nicely explained in the Run, RunLoop, Run blog post by Nicolas Bouilleaud.

I had this missing touch problem to solve. After doing a lot of debugging and scripting, I eventually realized that the events were being ignored because the device was busy animating the fading circles. These circles are used by CHAOTICMARCH to show where the clicks have occurred. This theory was validated by reordering drawing and clicking events.

In the process of analyzing and debugging, I built a sniffer for mach ports - machshark. Then I found a mild bug in the simulate touch library that could be used to crash backboardd which will cause SpringBoard to restart.

The rest of this write up is about how I collected the mach messages and analyzed to confirm that the IPC mechanism is working as expected.

Where is the touch?

It’s important to note that the quick solution mentioned in TL;DR; was not obvious initially. So, the first thing I did was to start reversing the libsimulatetouch library. My hope was that the more I learn about the library internals the more I will understand its properties. The source is in the iolate/SimulateTouch repository.

There is a “client” side, which is the application that wants to trigger an event - such as /sbin/stouch, and there is the “server” side. The server side is the injected DYLIB within the backboardd process that actually generates the HID events on behalf of the client. The client library is the libsimulatetouch.dylib - implemented by STLibrary.mm and the server is SimulateTouch.dylib mobile substrate library - implemented by SimulateTouch.mm.

The communications between the client and the server are via mach ports, the goto IPC mechanism of iOS and OS X. In concept, mach ports are very basic. A message is sent on a port object (something like a socket), a receiver on the other side responds on the same port. Very similar to UDP with an added benefit of being synchronous. When mach_msg function returns, the response will be in the client supplied response buffer. The basic mach_msg function is quite primitive and requires quite a lot of infrastructure to use properly. So, it’s natural that there are several higher level IPC abstractions built on top of this mechanisms. Ian Beer does a great job summarizing them in his Auditing and Exploiting Apple IPC talk.

Last thing about these mach messages. Services like the touch server will start listing ports which clients could look up via bootstrap_lookup function calls. They work similar to DNS where the client specifies a name and receives a numeric port value. The touch library specifically uses the CFMessagePort abstraction for IPC which is explained very nicely in the Interprocess communication on iOS with Mach messages blog post by Damien DeVille. The libsimulatetouch client library uses the CFMessagePortSendRequest function to send messages to the server side.

Sniffing the IPC

The problem we are trying solve is the mystery of why touch events have been disappearing. My first intuition was that perhaps these port messages were not getting to the server for some reason. Probably not because the kernel was messing up. But, likely because either the client wasn’t sending the messages or the server wasn’t processing them. So, I’ve decided to sniff the messages in the same way that I would with network traffic. After much googling, I found almost nothing for sniffing mach messages except for an old blog post about mach_shark which unfortunately was not released (and, on the last check the blog site was down – here’s a web archive link).

What are we looking for?

What I’m looking for are the the messages that are sent to a port by name kr.iolate.simulatetouch. These messages have the following structure:

typedef enum {  // sent as part of the 'type' field below
    STTouchMove = 0,
    STTouchDown,
    STTouchUp,

    // For these types, (int)point_x denotes button type
    STButtonUp,
    STButtonDown
} STTouchType;

typedef struct {
    int type;       // STTouchType values (Up, down, move, etc)
    int index;      // pathIndex holder in message
    float point_x;  // X coordinate
    float point_y;  // Y coordinate
} STEvent;

Super simple messages! Just 16 bytes long. As we mentioned earlier, each call to send a message returns with a response. The response for each of the client’s request will be an integer which gives the path index. The path index is used to identify one continuous touch sequence. For example, if I request a touch DOWN, I will get an ID. Then I will use this ID to issue a touch UP which could be at a different location. The size of the response message is four bytes. The path index in necessary to support multi-finger capability i.e. a pinch zoom.

The message processing pattern is very simple, SimulateTouch.mm:

static CFDataRef messageCallBack(CFMessagePortRef local, 
                                 SInt32 msgid, 
                                 CFDataRef cfData, 
                                 void *info)
{
   ...
   int pathIndex = touch->index;
   
   if (pathIndex == 0) {
        pathIndex = getExtraIndexNumber();
   }
   
   SimulateTouchEvent(port, pathIndex, touch->type, POINT(touch));
   
   ...             
   
   return (CFDataRef)[[NSData alloc] initWithBytes:&pathIndex 
                                     length:sizeof(pathIndex)];
   
   ...
}

...

CFMessagePortRef local = CFMessagePortCreateLocal(NULL, 
                            CFSTR(MACH_PORT_NAME), 
                            messageCallBack, NULL, NULL);

...

CFRunLoopSourceRef source = CFMessagePortCreateRunLoopSource(
                                  NULL, local, 0);
CFRunLoopAddSource(CFRunLoopGetCurrent(), 
                   source, kCFRunLoopDefaultMode);

...

The server which is a library that is injected into backboardd will start a local port and register a name to the port. Then it will use the CF abstraction to specify a callback function for the messages it receives. Once a message is received, the server will trigger the event then it will allocate a path index and return that number to the client. The client will be blocked until the message is returned. Quite a simple and common pattern for processing messages.

Tangent: The bug

Let’s go on a little tangent. While analyzing this code, I noticed that there is a bug in the path index allocation procedure. getExtraIndexNumber function works in a funny way.

static int getExtraIndexNumber()
{
    int r = arc4random()%14;
    r += 1; //except 0
    
    NSString* pin = Int2String(r);
    
    if ([[STTouches allKeys] containsObject:pin]) {
        return getExtraIndexNumber();
    }else{
        return r;
    }
}

The function will get a random number between zero and thirteen, inclusive. If that path was already allocated, the function will attempt to get another number, randomly (!), by calling itself recursively. Who does that?! Maybe, this is just some remnants of old code.

Basically, this means that if I call a whole bunch of touch down events, I can allocate all fourteen paths and getExtraIndexNumber will be forced to run out of stack space as it looks for an unallocated path index. The impact is that backboardd will crash forcing SpringBoard to restart. I suppose you can call it a DoS attack, but the significance is so mild. In order to trigger this you’d have to be running within a process on a jailbroken device with the simulate touch library installed – if that code is malicious, you’ve got bigger problems to deal with than some crashed GUI service.

Finding the port

Moving on! The first thing we need to do is find the port number. Why do we need this number? Apps will usually use many ports. Particularly, GUI libraries are heavy users. So, knowing the port number isolates your collection to the messages you’re interested in. Also, every time the App runs, port numbers will be different. Even though the name remains the same, when the ports are created, the numbers are allocated dynamically. So, we need to know the mapping at runtime.

I prefer minimally intrusive methods of introspection. For that reason I’ve chosen to use LLDB. Setting up a debugging session on a JailBroken iPhone is not trivial. However, I will leave it as an exercise to the reader to follow the setup instructions found on the iPhoneWiki.

LLDB is a really great debugger. One of my favorite features is its Python API interface. Using this interface we are able to script the debugger to automatically process memory in the context of a breakpoint. Essentially, LLBD conveniently provides a method for automating the manual work of analyzing function inputs and outputs.

To find out the name to port number mapping, we’ll set a breakpoint on the look up functions. There are three functions: bootstrap_look_up which is a wrapper for bootstrap_look_up2. There is also bootstrap_look_up3 which looks to be a private function, but used by several libraries. So, we will try to break on the latter two.

# break on bootstrap_look_up2 start
bs_look2 = target.BreakpointCreateByName('bootstrap_look_up2', 
                        'libxpc.dylib')
bs_look2.SetScriptCallbackFunction(
                        'mach_sniff.rocketbootstrap_look_up')

# find the end of the function
for bp in bs_look2:
    insts = target.ReadInstructions(bp.GetAddress(), 100)
    first_ret = [i.GetAddress().GetLoadAddress(target) 
                  for i in insts if i.GetMnemonic(target) == 'ret']

    # Just look for the first RET instruction
    if(len(first_ret) > 0):
        bs_look2_end = target.BreakpointCreateByAddress(
                                                   first_ret[0])
        bs_look2_end.SetScriptCallbackFunction(
                        'mach_sniff.rocketbootstrap_look_up_end')

        print bs_look2_end

We don’t need to break on bootstrap_look_up because bootstrap_look_up2 is enough, the former is a wrapper for the latter. You can see the source code for those functions on Apple Open Source.

# set on rocket if available, otherwise regular crashes.
bs_look3 = target.BreakpointCreateByName('rocketbootstrap_look_up', 
                        'librocketbootstrap.dylib')
if(not bs_look3.IsValid()):
    bs_look3 = target.BreakpointCreateByName('bootstrap_look_up3', 
                         'libxpc.dylib')

bs_look3.SetScriptCallbackFunction(
                        'mach_sniff.rocketbootstrap_look_up')

# look for the end of function
for bp in bs_look3:
    insts = target.ReadInstructions(bp.GetAddress(), 200)
    first_ret = [i.GetAddress().GetLoadAddress(target) 
                  for i in insts if i.GetMnemonic(target) == 'ret']

    if(len(first_ret) > 0):
        bs_look3_end = target.BreakpointCreateByAddress(
                                                  first_ret[0])
        bs_look3_end.SetScriptCallbackFunction(
                         'mach_sniff.rocketbootstrap_look_up_end')

        print bs_look3_end

We also want to break on bootstrap_look_up3, however something about how breakpoints work and how librocket_bootstrap hooks the function clashes with catastrophic results. So, to handle this use case we just support breaking on the rocket_bootstrap version which is rocketbootstrap_look_up.

In both cases we set a handler function that will analyze the function parameters to extract the name and look for the user specified port name. mach_sniff.rocketbootstrap_look_up is used for the start of the function and mach_sniff.rocketbootstrap_look_up_end for the end. The first will analyze the parameters and initiate the state. Then the second function will close the state and report the mapping to the user and follow on functions (i.e. sniffing on the messages).

Once the breakpoints for the look up function begins and ends are set, it becomes pretty easy to track port numbers and names. When the look up is first called, registers X1 and X2 point to the name and the return buffer, respectively. So, all we have to do is save off those values. We create a state at the start of the function and look it up at the end of the function to create the mapping.

look_up_states = {}

def rocketbootstrap_look_up(frame, bp_loc, dict):
    tid = thread.GetThreadID()

    # name of port to be looked up
    x1_name = long(registers[0].GetChildAtIndex(1).GetValue(), 16)

    # destination of the port number
    x2_ret_addr = long(registers[0].GetChildAtIndex(2).GetValue(), 16)

    error = lldb.SBError()
    port = process.ReadCStringFromMemory(x1_name, 256, error)
    if error.Success():
        if(port == port_name):
            # create state if it's the port we are looking for
            look_up_states[tid].append({
                'port': port,
                'ret_addr': x2_ret_addr
            })
    else:
        print 'port name error: ', error

At the end of the function we look up the state information by thread ID and match up the name with the port number.

def rocketbootstrap_look_up_end(frame, bp_loc, dict):
    tid = thread.GetThreadID()
    
    # logically confirms that the name matched to the port we want to sniff
    if(tid in look_up_states):
        state = look_up_states[tid].pop()

        error = lldb.SBError()

        # read port number from the return buffer
        port_id = process.ReadUnsignedFromMemory(state['ret_addr'], 4, error)

        if error.Success():
            print "FOUND PORT: %s=%x" % (state['port'], port_id)

            # start sniffing for messages on this port.
            if(len(look_up_states[tid]) == 0):
                start_sniff_port(debugger, port_id)
        else:
            print 'port id error: ', error
    else:
        print "end with no state"

Once we find the port name and number we are interested in, we initiate the sniffing mechanisms. Keeping the port number finding and sniffing of the messages separate is nice because it allows the user to potentially sniff on just a port number rather than by name - especially in the case where we miss the look up calls before the debugger is attached.

Sniffing the mach messages

To find the messages of interest is the same basic process as finding ports - we just need the port number. Initially, I wanted to get all the messages and then do post processing to filter out only the ones I’m interested in. However, breakpoints are expensive and the App would run beyond slow! So, it became necessary to only sniff on the ports of interest.

To be selective on the port we have to specify a breakpoint condition. This condition will check that register X0 contains our port number. Doing this is still expensive but it sped things up to a reasonable threshold.

def start_sniff_port(debugger, port_number):
    target = debugger.GetSelectedTarget()

    msg_bp = target.BreakpointCreateByName('mach_msg', 
                                 'libsystem_kernel.dylib')

    msg_bp.SetScriptCallbackFunction('mach_sniff.print_mach_msg')
    msg_bp.SetCondition("*(uint32_t*)($x0 + 8) == %d" % port_number)

Our breakpoint is set on the mach_msg function in the libsystem_kernel.dylib library. This function is a wrapper for the actual system call. Although, it does a little more than just pass through the parameters.

def print_mach_msg(frame, bp_loc, dict):
    tid = thread.GetThreadID()

    x0_data = long(registers[0].GetChildAtIndex(0).GetValue(), 16)
    x1_opt = registers[0].GetChildAtIndex(1).GetValue()
    x2_len = long(registers[0].GetChildAtIndex(2).GetValue(), 16)
    x3_recv_len = long(registers[0].GetChildAtIndex(3).GetValue(), 16)
    x4_recv_name = long(registers[0].GetChildAtIndex(4).GetValue(), 16)
    x5_timeout = long(registers[0].GetChildAtIndex(5).GetValue(), 16)
    x5_notify = long(registers[0].GetChildAtIndex(6).GetValue(), 16)

    output = {
        'type': 'msg_send_start',
        'time': int(time.time()*1000),
        'frame': str(frame),
        'tid': tid,
        'send_msg_size': x2_len,
        'recv_msg_size': x3_recv_len,
        'msg_options': x1_opt,
        'rcv_name': x4_recv_name,
        'timeout': x5_timeout,
        'notify': x5_notify
    }

    data = None

    if(x2_len > 0):
        err = lldb.SBError()
        data = process.ReadMemory(x0_data, x2_len, err)

        output['msg'] = binascii.hexlify(data)

    output = json.dumps(output)
    output_file.write(output)
    output_file.write('\n')

    print output

On each message send call for our port of interest we collect information from the registers and record the data into a file for later processing. In this case we just take the buffer at X0 and read the amount of bytes specified in X2 which is the length of the buffer. Other data is recorded as well but we’ll find its usefulness sometime later.

Each entry of the output file will look like this:

{"frame": "frame #0: 0x0000000197054c40 
           libsystem_kernel.dylib`mach_msg", 
 "tid": 135127, 
 "notify": 0, 
 "msg_options": "0x0000000000000011", 
 "rcv_name": 0, 
 "recv_msg_size": 0, 
 "send_msg_size": 76, 
 "timeout": 1000, 
 "time": 1471562568339, 
 "msg": "131500004c0000001b6c00000bc20000000
         00000010000000000000000000000000000
         000000000000000000f8f4f2f0010000000
         10000001000000001000000000000000080
         2f430000f041", 
 "type": "msg_send_start"}

In essence each line is a JSON record of the arguments to the mach_msg function. Kind of like strace.

Making sense of it all

Recording the messages is just the first step. We also need to be able to understand them. Not surprisingly, they are layered in a similar way as network protocols. This is due to the various abstractions available to provide IPC mechanisms. As I mentioned earlier, the use case of the libsimulatetouch uses CF for their abstraction.

The first part is easy, it’s just a standard message header for all mach messages:

typedef struct
{
       mach_msg_bits_t          msgh_bits;
       mach_msg_size_t          msgh_size;
       mach_port_t       msgh_remote_port;
       mach_port_t        msgh_local_port;
       mach_msg_size_t      msgh_reserved;
       mach_msg_id_t              msgh_id;
} mach_msg_header_t;

This takes up 24 bytes of the message. Next is the header for __CFMessagePortMachMessage which comes with a nice magic four byte marker - 0xF0F2F4F8 - to help us identify the message.

struct __CFMessagePortMachMessage {
    mach_msg_base_t           base;
    mach_msg_ool_descriptor_t  ool;

    struct innards {
        int32_t    magic;
        int32_t    msgid;
        int32_t   convid;
        int32_t byteslen;
        uint8_t bytes[0];
    } innards;
};

After that, the bytes are just the message generated by the touch library. That message is 16 bytes:

typedef struct {
    int type;       // STTouchType values (Up, down, move, etc)
    int index;      // pathIndex holder in message
    float point_x;  // X coordinate
    float point_y;  // Y coordinate
} STEvent;

All said and done, the entire message sent via mach_msg is 76 bytes long. Once parsed the message looks something like this:

{ '_payload': [ { 'ool_address': '0x0',
                  'ool_bytes': '0000000000000000',
                  'ool_copy': 0,
                  'ool_deallocate': 0,
                  'ool_pad1': 0,
                  'ool_size': '0x0',
                  'ool_type': 0},
                { 'inards_byteslen': '0x10',
                  'inards_convid': '0x12',
                  'inards_magic': '0xf0f2f4f8',
                  'inards_msgid': '0x1'},
                { 'index': 6, 
                  'point_x': 312.5, 
                  'point_y': 551.5, 
                  'type': 2}],
  'msgh_bits': 5395,
  'msgh_id': 1,
  'msgh_local_port': '0xc20b',
  'msgh_remote_port': '0x6c1b',
  'msgh_reserved': 0,
  'msgh_size': 76}

As you can see the coordinates and the type of touch are clearly visible and traceable. This is what I used to confirm that my touch events were sent as expected to the server side - backboardd.

LLDB is quite a heavy tool. I chose it because I wanted minimal intrusion into the client process. However, as we saw earlier, with breakpointing on the bootstrap_look_up3, that is not entirely the case. There are other tools available to the reader. One can build a MobileSubstrate injectable library and hook on all those function using the method I previously described in Tracing Objective-C method calls post.

It’s hard to talk about function tracing without mentioning Frida. It is an excellent tool for dynamic analysis. It can enable the user to hook all the required functions. I decided not to use it because, like the first alternative method I mentioned, Frida will modify the binary in memory to implement the hooks. These hooks could potentially clash with the RocketBootstrap library and anything else that modifies the binary dynamically. Also, for some reason, I could not get it to reliably hook the mach_msg function for me - maybe a post for the future.

Conclusion

Debugging is a bit of an art form which can send you down very deep and sometimes interesting rabbit holes. In this case I followed the rabbit hole to the bottom because I thought the outcome will be a good learning experience and the output will be useful in the future.

Debugging is an art form because one cannot see everything they want to see and the tools used for inspection are themselves faulty. So, it takes experience and good intuition to know what to look for and how to interpret the view. Much of the process is just validating the data flow, reassuring yourself that parts are working correctly. It is the hope that in that process one can learn more about the target which will help to narrow down on the problem. Here, I show my process for a specific use case and provide the tools to build upon or learn from.