The LC_DYLD_INFO[_ONLY] commands
These load commands are numerically 0x22 and 0x80000022. The only difference between them are LC_DYLD_INFO_ONLY will abort loading when dyld doesn't understand the new format.
The structure of this load command has been described before. It refers to 5 chunks of data in the __LINKEDIT segment, which are called rebase, bind/weak_bind/lazy_bind, and export.
bind/weak_bind/lazy_bind
These 3 chunks are encoded with the same format. Please think of the data in these chunks as a tiny assembly language, which the only purpose is to "bind" (map) VM addresses to a symbol.
The encoding of each data is of the form:
Bit 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
opcode | imm operand | (extra data) |
So there are at most 16 different opcodes can be used, and the immediate operand can hold a value 0 to 15. But most of the times a value >15, or even non-numeric data is needed. In these cases, extra data will be appended after this byte.
A large number is encoded in the "LEB128" format. In this format, each byte is separated into a "continue bit" (bit 7) and the "digits" (bit 0-6).
Suppose we want to encode the number 123456 in LEB128. Firstly, we write 123456 in binary, and separated into groups of 7 digits: 0000111,1000100,1000000. Then we insert the "continue bit" as 1, except the most significant one, which is 0 to signal the end of the number: 00000111,11000100,11000000. Finally, it should be in little endian, so we flip it around and write out the result: 0xC0 0xC4 0x07.
Apple so far defined 13 opcodes:
opcode | Symbol | Meaning |
---|---|---|
0 | DONE | Finished defining a symbol. |
1 | SET_DYLIB_ORDINAL_IMM | Set the library ordinal of the current symbol to the imm operand. |
2 | SET_DYLIB_ORDINAL_ULEB | Same as above, but the library ordinary is read from the unsigned LEB128-encoded extra data. |
3 | SET_DYLIB_SPECIAL_IMM | Same as above, but the ordinary as set as negative of imm. Typical values are:
|
4 | SET_SYMBOL_TRAILING_FLAGS_IMM | Set flags of the symbol in imm, and the symbol name as a C string in the extra data. The flags are:
|
5 | SET_TYPE_IMM | Set the type of symbol as imm. Known values are:
|
6 | SET_ADDEND_SLEB | Set the addend of the symbol as the signed LEB128-encoded extra data. Usage unknown. |
7 | SET_SEGMENT_AND_OFFSET_ULEB | Set that the symbol can be found in the imm-th segment, at an offset found in the extra data. |
8 | ADD_ADDR_ULEB | Increase the offset (as above) by the LEB128-encoded extra data. |
9 | DO_BIND | Define a symbol from the gathered information. Increase the offset by 4 (or 8 on 64-bit targets) after this operation. |
A | DO_BIND_ADD_ADDR_ULEB | Same as above, but besides the 4 byte increment, the extra data is also added. |
B | DO_BIND_ADD_ADDR_IMM_SCALED | Same as DO_BIND, but an extra imm*4 bytes is also added. |
C | DO_BIND_ULEB_TIMES_SKIPPING_ULEB | This is a very complex operation. Two unsigned LEB128-encoded numbers are read off from the extra data. The first is the count of symbols to be added, and the second is the bytes to skip after a symbol is added. In pseudocode, all it does is:for i = 1 to count |
For example, we want to bind the address 0x2020 (of the __DATA section, starting at 0x2000) to the symbol _XXHello, which is defined in the 9th loaded dylib, Hello.dylib. We would perform the following operations:
SET_DYLIB_ORDINAL_IMM(9)
SET_SYMBOL_TRAILING_FLAGS_IMM(0, "_XXHello")
SET_SEGMENT_AND_OFFSET_ULEB(2, 0x20) ; usually __DATA is the 2nd segment.
DO_BIND()
So in binary it will be
0x19 0x40 "_XXHello\0" 0x72 0x20 0x90
rebase
I don't think rebase is useful, and rebase uses a similar approach to code rebase info as bind, so I'm ignoring it here.
export
Unlike bind, export is an entirely different beast. The content of the export chunk defines a trie, or a prefix tree. A node in this trie is encoded as:
Node = «uint8_t terminal_size» [Terminal] «uint8_t child_count» [Child] [Child] [Child] ...
Child = «char* suffix» «uleb128 offset»
Terminal = «uleb128 flags» «uleb128 address»
Known flags are:
- 1 = THREAD_LOCAL
- 4 = WEAK_DEFINITION
- 8 = INDIRECT_DEFINITION
- 0x10 = HAS_SPECIALIZATIONS
For example, if a dylib exported _XXHello at 0x1022 and _XXWorld at 0x1064, and _XXHelloWorld2 at 0x1558. A trie that represent these symbols would be:
_XX - [Hello] - [World2]
\
[World]
So we encode our root node as
00 01 "_XX\0" (offset to _XX)
and _XX as
00 02 "Hello\0" (offset to Hello) "World\0" (offset to World)
if we place the _XX node right after the root node the offset would be 7, so the root node is
00 01 "_XX\0" 07
The offset of the rest can be obtained like this. Now for the Hello node, since it defined a symbol, we have to fill in the Terminal info:
05 00 A2 20 /*=0x1022*/ 01 "World2\0" (offset to World2)
etc.
No comments:
Post a Comment