Thursday, May 21, 2009

Experiment: An alternative way to extract hidden symbols

A library usually have symbols defined functions and variables to allow external code access its functions, or help debugging. There are external (public) and hidden symbols. The external symbols can be picked up by dyld to broadcast the correct place of the functions. The hidden ones will be ignored, however, for performance reasons, or just to hide some interesting calls.

And the hidden ones are often interesting. For example, WinterBoard needed 7 hidden symbols to work.

Fortunately, the address each symbol points to, hidden or not, is defined alongside the library, so one can read the library file to get all the symbols.

In fact, Apple provided the nlist function for this purpose. But nlist is sloooooow. So slow that saurik has improved the backend of nlist the next version of MobileSubstrate (which delivered more than 10 times speed improvement.)

But how did I step in this? Originally I tried to create a transition filter for QuartzCore. This requires me to access the C++ backend of the QC, which are all hidden symbols. So I use nlist as usual. And I've written a wrapper function using va_arg to make it simpler to use. Then it crashed. Why? There's one caveat — on x86 the QuartzCore is slided. The address returned from nlist — which is just being read statically — is invalid because a run-time offset must be added to it.

This slide can be obtained using the dyld function _dyld_get_image_vmaddr_slide. But it takes an integer! What integer? This is an index of all images loaded by dyld. To check which image it is pointed to, one needs to call _dyld_get_image_name. That's troublesome, and adds extra slowness to the nlist method.

But there's also a _dyld_get_image_header function. This function returns the Mach-O header, which can guide you to do the same thing nlist does, but on the loaded image. Because it is loaded, every read/write will be on the RAM, and it will be much faster than file access. That's good! Except it crashed again — the loaded image is not a direct copy-and-paste. Some part (the __DATA, __OBJC and __IMPORT segments, to be precise) are skipped from the object file. And the same solution for treating QuartzCore cannot be used on UIKit.

So I started to dig the source code of dyld, which I found that these 3 APIs are just C wrapper to a various (i.e. hidden) C++ function. The function, dyld::getIndexedImage, returns a class pointer that has everything I need. Everything. It has the symbol table, the strings table, and the Mach-O header, and that's enough for exploring hidden symbols.

Of course, since dyld::getIndexedImage is hidden, to call it we need to nlist it too... Or not. saurik has already shown that rewriting nlist improves the speed by 10x, why not do it here also? The result, therefore, is this monster:

http://pastie.org/485348.

It has been tested and works on ARM and i386. Experimentally, running the main() function gives the timing in 5 runs:

nlistlookup_function_pointers
0.517996s0.026339s
0.498724s0.013721s
0.498352s0.013623s
0.501114s0.013798s
0.498718s0.015770s


So my method gives up to 30x performance than nlist! I'm happy to include saurik's nlist fix too, but it Bus Errored on strcmp too many times that I've gave up on that. Hopefully later it will be fixed.

2 comments:

  1. I found this post very confusing and somewhat unconstructive :(.

    For the record, I didn't reimplement nlist(): the code I have in MobileSubstrate's trunk is the official Apple code (from Apple's website, with Apple's license at the top) for nlist() with the minimal possible modifications to make it use a memory map rather than seeking through files.

    As far as I can tell, there is nothing ARM-specific about Apple's code (which I definitely got from a source that tends to not have anything of interest for ARM in it). I include an ARM header file, but only to get a definition that Apple's original general code seemed to require and I didn't know the more general header to include to get it.

    I didn't do it "in 3.0". In fact, the version of MobileSubstrate that includes this enhancement doesn't even work on 3.x yet (and definitely didn't when I coded it), as I chose to insert it with MSHookFunction() (it will just immediately crash and get killed if you try to use it), so the one .deb I offer for it only works on 2.x.

    Finally, I will say that I haven't even released this to anyone but a few people on a forum (macthemes2) yet, so while I'm not surprised there might be a few bus errors, the fact that I didn't modify the underlying premise of how Apple's code works means I'm quite surprised that it is failing... are you willing to tell me where the bus errors occurred?

    Considering this post is so specifically directed at being better than "saurik's solution", it would be really awesome if it were a little more constructive, given that MobileSubstrate is open source (and if you had wanted to provide this nlist to it I would have been more than happy to include it).

    (On that note, do you mind if I use this? It is licensed in a way that I can use, but I'd prefer asking first. Also, there's a comment that mentions that you got code from my version: you might should include Apple's copyright notice and license on your file, as that's what "my" code is.)

    ReplyDelete
  2. @saurik:

    Thanks for comment. I don't mean to direct you but I don't know how to call that so I just use "saurik's solution" :)

    I read from the MacThemes comments that you've replaced nlist's backend, and I haven't studies your and Apple's code in detail except the fat architecture part, so it causes confusion. I've edited the post a bit to reflect the truth.

    And by "only works on ARM" I mean MSHookFunction() only works on ARM.

    For the bus error, it's the code in http://pastie.org/485348, lines 299 -- 320. I have the comments that explains the cases the cause the errors.

    And yes, you can use it as you like. For the Apple license, I think the pastie is too old that I cannot edit any more, so maybe you can stick it in. But I wonder if it's OK to include APSL 2 in BSD code.

    ReplyDelete