Thursday, July 9, 2009

Analyzing Objective-C's for-in loop (fast enumeration)

Objective-C 2.0 supports for-in loop like this:

for (XXSomeClass* obj in someArray) {
[obj doSomething];

it's nice to the programmer, but painful for the disassemblers because the simple loop above is translated into this bulky ASM:

            ldr               r1,[pc,#0xe8]
mov r3,#0x0
str r3,[sp,#0x44] // state
str r3,[sp,#0x48] // itemsPtr
str r3,[sp,#0x4c] // mutationsPtr
str r3,[sp,#0x50] // extra[0]
str r3,[sp,#0x54] // extra[1]
str r3,[sp,#0x58] // extra[2]
str r3,[sp,#0x5c] // extra[3]
str r3,[sp,#0x60] // extra[4]
ldr sl,[pc,r1]
add r3,r3,#0x10
str r3,[sp]
mov r1,sl
add r2,sp,#0x44
add r3,sp,#0x4
bl objc_msgSend (stub) ; [someArray countByEnumeratingWithState: objects:count:]
subs r5,r0,#0x0
beq loc_000100
ldr r3,[sp,#0x4c]
ldr r3,[r3]
mov r6,r3
loc_000094: mov r4,#0x0
b loc_0000a4
loc_00009c: ldr r3,[sp,#0x4c]
ldr r3,[r3]
loc_0000a4: cmp r6,r3
beq loc_0000b4
mov r0,r8
bl objc_enumerationMutation (stub)
loc_0000b4: nop
add r4,r4,#0x1
cmp r5,r4
bhi loc_00009c
mov r3,#0x10
str r3,[sp]
mov r0,r8
mov r1,sl ; "countByEnumeratingWithState:objects:count:"
add r2,sp,#0x44
add r3,sp,#0x4
bl objc_msgSend (stub) ; [? ?]
subs r5,r0,#0x0
ldrne r3,[sp,#0x4c]
ldrne r3,[r3]
bne loc_000094
loc_000100: ...

The disassembled code is
NSFastEnumerationState enumState;
enumState.state = 0;
enumState.itemsPtr = NULL;
enumState.mutationsPtr = 0;
enumState.extra[0] = 0;
enumState.extra[1] = 0;
enumState.extra[2] = 0;
enumState.extra[3] = 0;
enumState.extra[4] = 0;
NSUInteger count = 16;
id buffer[10];
do {
NSUInteger copiedItemsCount = [someArray countByEnumeratingWithState:&enumState objects:buffer count:count];
for (unsigned long i = 0; i < copiedItemsCount; ++ i) {
if (*(enumState.mutationsPtr) != 0)
[enumState.itemsPtr[i] doSomething];
} while(copiedItemsCount != 0);

What does the state and extra do? Up to the implementor. (And objc_enumerationMutation simply kills the program and reports that a mutation happened.)


  1. FYI, from the gcc 4.2 source:

    It synthesizes:
    for ( type elem in collection) { stmts; }

    type elem;
    __objcFastEnumerationState enumState = { 0 };
    id items[16];

    unsigned long limit = [collection countByEnumeratingWithState:&enumState objects:items count:16];
    if (limit) {
    unsigned long startMutations = *enumState.mutationsPtr;
    do {
    unsigned long counter = 0;
    do {
    if (startMutations != *enumState.mutationsPtr) objc_enumerationMutation(collection);
    elem = enumState.itemsPtr[counter++];
    } while (counter < limit);
    } while (limit = [collection countByEnumeratingWithState:&enumState objects:items count:16]);
    elem = nil;

    This, or something close to it, used to be in the published documentation.. but it seems to have vanished now... along with the Obj-C grammar. There used be a section in the ObjC language document about how to add fast enumeration to your own objects, but that's completely gone now (other than my now useless and dead bookmark link).

    I've got two problems with the way this feature was implemented:

    1) You can only return ObjC objects. That's it. I've had a number of times that I would have killed to be able to return something else, like a NSRange. Wouldn't it be nice to be able to for(NSRange range in [string rangesBySplittingStringWith:@"/"]) { } ? And as part of the loop start up, the compiler could pass over the @encoding() of the expected return type for dynamic, run time error checking, or better yet, automagical auto-boxing / manifest typing. 'int'? 'double'? 'NSNumber *'? 'NSValue *'? No problem.

    2) The structure declaration of NSFastEnumerationState (in NSEnumerator.h) is broken, in particular the 'unsigned long extra[5];' line. In the documentation that no longer exists, that bit of space is meant to be where you can batch up your temporary, intermediate results. "Just type cast extra to the type you want" is what the documentation said, if I remember correctly.

    This is wrong on so many levels, made all the more wronger by the fact that this was obviously hand hacked directly in to the compiler by someone. I think it goes without saying that if you're going to be bolting a new language feature by hardwiring it directly in to the compiler, you should have a firm command of the nuances of the language you're about to permanently alter and be forced to live with the consequences. Small details like... type-punning! And why the C99 standard (99 as in 1999, nearly ten years ago) says doing things like this will result in undefined behavior.

    The proper way to do it is with 'char'. Or, since you can only return 'id' or 'id like' objects.... ? Readers who would like to know more are encouraged to look for the reason why '-fstrict-aliasing' can't be enabled (at all) in the versions of gcc that Apple ships.

  2. @Anonymous:

    Thanks for the source! BTW there's some sample implementation in, but it's unofficial.

    For (1), you can return any pointer-sized objects, except the compiler will "error" you if the type to enumerate isn't id. Yeah, returning non-id-types would be nice. The best we could do now is return an NSValue and use -getValue: inside the loop (ugly :( ), or return a pointer to your type and use a lot of casts (oh type punning!).

    For (2), the Apple's doc said it's "A C array that you can use to hold returned values." Of course, it doesn't matter if it's sensible or not. I think Apple never expected people to implement the NSFastEnumeration protocol. (There's no comments at all in iPhone SDK's NSEnumerator.h)