networkpx Project Blog: Analyzing Objective-C's for-in loop (fast enumeration)

Thursday, July 9, 2009

Analyzing Objective-C's for-in loop (fast enumeration)

Objective-C 2.0 supports for-in loop like this:

for (XXSomeClass* obj in someArray) {
  [obj doSomething];
}

it's nice to the programmer, but painful for the disassemblers because the simple loop above is translated into this bulky ASM:

            ldr               r1,[pc,#0xe8]
            mov               r3,#0x0
            str               r3,[sp,#0x44] // state
            str               r3,[sp,#0x48] // itemsPtr
            str               r3,[sp,#0x4c] // mutationsPtr
            str               r3,[sp,#0x50] // extra[0]
            str               r3,[sp,#0x54] // extra[1]
            str               r3,[sp,#0x58] // extra[2]
            str               r3,[sp,#0x5c] // extra[3]
            str               r3,[sp,#0x60] // extra[4]
            ldr               sl,[pc,r1]
            add               r3,r3,#0x10
            str               r3,[sp]
            mov               r1,sl
            add               r2,sp,#0x44
            add               r3,sp,#0x4
            bl                objc_msgSend (stub)               ; [someArray countByEnumeratingWithState: objects:count:]
            subs              r5,r0,#0x0
            beq               loc_000100
            ldr               r3,[sp,#0x4c]
            ldr               r3,[r3]
            mov               r6,r3
loc_000094: mov               r4,#0x0
            b                 loc_0000a4
loc_00009c: ldr               r3,[sp,#0x4c]
            ldr               r3,[r3]
loc_0000a4: cmp               r6,r3
            beq               loc_0000b4
            mov               r0,r8
            bl                objc_enumerationMutation (stub)
loc_0000b4: nop               
            nop               
            nop               
            nop               
            nop               
            add               r4,r4,#0x1
            cmp               r5,r4
            bhi               loc_00009c
            mov               r3,#0x10
            str               r3,[sp]
            mov               r0,r8
            mov               r1,sl                             ; "countByEnumeratingWithState:objects:count:"
            add               r2,sp,#0x44
            add               r3,sp,#0x4
            bl                objc_msgSend (stub)               ; [? ?]
            subs              r5,r0,#0x0
            ldrne             r3,[sp,#0x4c]
            ldrne             r3,[r3]
            bne               loc_000094
loc_000100: ...

The disassembled code is

NSFastEnumerationState enumState;
enumState.state = 0;
enumState.itemsPtr = NULL;
enumState.mutationsPtr = 0;
enumState.extra[0] = 0;
enumState.extra[1] = 0;
enumState.extra[2] = 0;
enumState.extra[3] = 0;
enumState.extra[4] = 0;
NSUInteger count = 16;
id buffer[10];
do {
 NSUInteger copiedItemsCount = [someArray countByEnumeratingWithState:&enumState objects:buffer count:count];
 for (unsigned long i = 0; i < copiedItemsCount; ++ i) {
  if (*(enumState.mutationsPtr) != 0)
   objc_enumerationMutation(someArray);
  [enumState.itemsPtr[i] doSomething];
 }
} while(copiedItemsCount != 0);

What does the state and extra do? Up to the implementor. (And objc_enumerationMutation simply kills the program and reports that a mutation happened.)

2 comments:

AnonymousJul 17, 2009, 9:53:00 AM
FYI, from the gcc 4.2 source:

It synthesizes:
for ( type elem in collection) { stmts; }

Into:
{
type elem;
__objcFastEnumerationState enumState = { 0 };
id items[16];

unsigned long limit = [collection countByEnumeratingWithState:&enumState objects:items count:16];
if (limit) {
unsigned long startMutations = *enumState.mutationsPtr;
do {
unsigned long counter = 0;
do {
if (startMutations != *enumState.mutationsPtr) objc_enumerationMutation(collection);
elem = enumState.itemsPtr[counter++];
stmts;
} while (counter < limit);
} while (limit = [collection countByEnumeratingWithState:&enumState objects:items count:16]);
}
else
elem = nil;

This, or something close to it, used to be in the published documentation.. but it seems to have vanished now... along with the Obj-C grammar. There used be a section in the ObjC language document about how to add fast enumeration to your own objects, but that's completely gone now (other than my now useless and dead bookmark link).

I've got two problems with the way this feature was implemented:

1) You can only return ObjC objects. That's it. I've had a number of times that I would have killed to be able to return something else, like a NSRange. Wouldn't it be nice to be able to for(NSRange range in [string rangesBySplittingStringWith:@"/"]) { } ? And as part of the loop start up, the compiler could pass over the @encoding() of the expected return type for dynamic, run time error checking, or better yet, automagical auto-boxing / manifest typing. 'int'? 'double'? 'NSNumber *'? 'NSValue *'? No problem.

2) The structure declaration of NSFastEnumerationState (in NSEnumerator.h) is broken, in particular the 'unsigned long extra[5];' line. In the documentation that no longer exists, that bit of space is meant to be where you can batch up your temporary, intermediate results. "Just type cast extra to the type you want" is what the documentation said, if I remember correctly.

This is wrong on so many levels, made all the more wronger by the fact that this was obviously hand hacked directly in to the compiler by someone. I think it goes without saying that if you're going to be bolting a new language feature by hardwiring it directly in to the compiler, you should have a firm command of the nuances of the language you're about to permanently alter and be forced to live with the consequences. Small details like... type-punning! And why the C99 standard (99 as in 1999, nearly ten years ago) says doing things like this will result in undefined behavior.

The proper way to do it is with 'char'. Or, since you can only return 'id' or 'id like' objects.... ? Readers who would like to know more are encouraged to look for the reason why '-fstrict-aliasing' can't be enabled (at all) in the versions of gcc that Apple ships.
ReplyDelete
Replies
KennyTM~Jul 18, 2009, 6:13:00 AM
@Anonymous:

Thanks for the source! BTW there's some sample implementation in http://cocoawithlove.com/2008/05/implementing-countbyenumeratingwithstat.html, but it's unofficial.

For (1), you can return any pointer-sized objects, except the compiler will "error" you if the type to enumerate isn't id. Yeah, returning non-id-types would be nice. The best we could do now is return an NSValue and use -getValue: inside the loop (ugly :( ), or return a pointer to your type and use a lot of casts (oh type punning!).

For (2), the Apple's doc said it's "A C array that you can use to hold returned values." Of course, it doesn't matter if it's sensible or not. I think Apple never expected people to implement the NSFastEnumeration protocol. (There's no comments at all in iPhone SDK's NSEnumerator.h)
ReplyDelete
Replies

networkpx Project Blog

Thursday, July 9, 2009

Analyzing Objective-C's for-in loop (fast enumeration)

2 comments:

Links

About Me

Blog Archive

Followers