The LC_DYLD_INFO[_ONLY] commands
These load commands are numerically 0x22 and 0x80000022. The only difference between them are LC_DYLD_INFO_ONLY will abort loading when dyld doesn't understand the new format.
The structure of this load command has been described before. It refers to 5 chunks of data in the __LINKEDIT segment, which are called rebase, bind/weak_bind/lazy_bind, and export.
These 3 chunks are encoded with the same format. Please think of the data in these chunks as a tiny assembly language, which the only purpose is to "bind" (map) VM addresses to a symbol.
The encoding of each data is of the form:
|opcode||imm operand||(extra data)|
So there are at most 16 different opcodes can be used, and the immediate operand can hold a value 0 to 15. But most of the times a value >15, or even non-numeric data is needed. In these cases, extra data will be appended after this byte.
A large number is encoded in the "LEB128" format. In this format, each byte is separated into a "continue bit" (bit 7) and the "digits" (bit 0-6).
Suppose we want to encode the number 123456 in LEB128. Firstly, we write 123456 in binary, and separated into groups of 7 digits: 0000111,1000100,1000000. Then we insert the "continue bit" as 1, except the most significant one, which is 0 to signal the end of the number: 00000111,11000100,11000000. Finally, it should be in little endian, so we flip it around and write out the result: 0xC0 0xC4 0x07.
Apple so far defined 13 opcodes:
|0||DONE||Finished defining a symbol.|
|1||SET_DYLIB_ORDINAL_IMM||Set the library ordinal of the current symbol to the imm operand.|
|2||SET_DYLIB_ORDINAL_ULEB||Same as above, but the library ordinary is read from the unsigned LEB128-encoded extra data.|
|3||SET_DYLIB_SPECIAL_IMM||Same as above, but the ordinary as set as negative of imm. Typical values are:|
|4||SET_SYMBOL_TRAILING_FLAGS_IMM||Set flags of the symbol in imm, and the symbol name as a C string in the extra data. The flags are:|
|5||SET_TYPE_IMM||Set the type of symbol as imm. Known values are:|
|6||SET_ADDEND_SLEB||Set the addend of the symbol as the signed LEB128-encoded extra data. Usage unknown.|
|7||SET_SEGMENT_AND_OFFSET_ULEB||Set that the symbol can be found in the imm-th segment, at an offset found in the extra data.|
|8||ADD_ADDR_ULEB||Increase the offset (as above) by the LEB128-encoded extra data.|
|9||DO_BIND||Define a symbol from the gathered information. Increase the offset by 4 (or 8 on 64-bit targets) after this operation.|
|A||DO_BIND_ADD_ADDR_ULEB||Same as above, but besides the 4 byte increment, the extra data is also added.|
|B||DO_BIND_ADD_ADDR_IMM_SCALED||Same as DO_BIND, but an extra imm*4 bytes is also added.|
|C||DO_BIND_ULEB_TIMES_SKIPPING_ULEB||This is a very complex operation. Two unsigned LEB128-encoded numbers are read off from the extra data. The first is the count of symbols to be added, and the second is the bytes to skip after a symbol is added. In pseudocode, all it does is:|
for i = 1 to count
For example, we want to bind the address 0x2020 (of the __DATA section, starting at 0x2000) to the symbol _XXHello, which is defined in the 9th loaded dylib, Hello.dylib. We would perform the following operations:
SET_SEGMENT_AND_OFFSET_ULEB(2, 0x20) ; usually __DATA is the 2nd segment.
So in binary it will be
0x19 0x40 "_XXHello\0" 0x72 0x20 0x90
I don't think rebase is useful, and rebase uses a similar approach to code rebase info as bind, so I'm ignoring it here.
Unlike bind, export is an entirely different beast. The content of the export chunk defines a trie, or a prefix tree. A node in this trie is encoded as:
Node = «uint8_t terminal_size» [Terminal] «uint8_t child_count» [Child] [Child] [Child] ...
Child = «char* suffix» «uleb128 offset»
Terminal = «uleb128 flags» «uleb128 address»
Known flags are:
- 1 = THREAD_LOCAL
- 4 = WEAK_DEFINITION
- 8 = INDIRECT_DEFINITION
- 0x10 = HAS_SPECIALIZATIONS
For example, if a dylib exported _XXHello at 0x1022 and _XXWorld at 0x1064, and _XXHelloWorld2 at 0x1558. A trie that represent these symbols would be:
_XX - [Hello] - [World2]
So we encode our root node as
00 01 "_XX\0" (offset to _XX)
and _XX as
00 02 "Hello\0" (offset to Hello) "World\0" (offset to World)
if we place the _XX node right after the root node the offset would be 7, so the root node is
00 01 "_XX\0" 07
The offset of the rest can be obtained like this. Now for the Hello node, since it defined a symbol, we have to fill in the Terminal info:
05 00 A2 20 /*=0x1022*/ 01 "World2\0" (offset to World2)