For AdKiller to work properly, Regular expression matching should be supported. There are various RegExp solutions for the iPhoneOS:
- regex.h
- ICU's regular expression engine
- PCRE
- RegexKitLite, which is an ObjC wrapper of the ICU engine.
- JavaScriptCore's regular expression engine.
- Do it yourself!
For Ad blocking, we only need to test if the Regular Expression matches, so I've limited the test case to a simple RegExp.test only.
In this part, I'll test for the AdBlockPlus "wildcard rules", where a * matches anything, and optional | at the beginning or end to indicate an anchor. Because of the simplicity, we can actually do the test without using RegExp (e.g. using CFStringFind).
From the complexity of the RegExp engines, we would expect the time taken follow this order:
CFStringFind << regex.h << JSRegExp < ICU < RegexKitLite < PCRE
But somewhat surprisingly, the actual time taken is like this:
ICU PCRE regex.h RKLite JSRegExp CFStrFind
-----------------------------------------------------------
0.253456s 0.069918s 1.726371s 1.651787s 0.139437s 0.078097s
0.275680s 0.067150s 1.715131s 1.652074s 0.112813s 0.075818s
0.251526s 0.067994s 1.716483s 1.637005s 0.113267s 0.075716s
0.250806s 0.067551s 1.721203s 1.626927s 0.113274s 0.075726s
0.251498s 0.067168s 1.716587s 1.651783s 0.113605s 0.076064s
So the ordering would be:
PCRE < CFStringFind < JSRegExp < ICU << RegexKitLite < regex.h
The most complex engine is the fastest! CFStringFind ranks the second probably because my code is not optimized, but this shows the PCRE engine is really efficient. JSRegExp was originally a PCRE with everything irrelevant to Javascript stripped out, but that is slower than PCRE by 2 times. The WebKit devs should think of what's wrong. RegexKitLite's performance is pretty disappointing. Although it is advertised as very efficient, it is still 6~7 times slower than bare-bone ICU. The problem is probably directly using NSString as the RegExp holder, instead of a dedicated RegExp class. And regex.h's result is really a *WTF*.
Edit: I have redone the test to eliminate some inconsistency (case sensitivity) and convert the regex.h's syntax form BRE to ERE. The result is impressive: ERE is faster than BRE by 7 fold, make in in par with ICU. The new results are:
Methods
ICU PCRE regex.h RKLite JSRegExp CFStrFind
-----------------------------------------------------------
0.253736s 0.069918s 0.238111s 1.651787s 0.139437s 0.070956s
0.254308s 0.067150s 0.235963s 1.652074s 0.112813s 0.068302s
0.252483s 0.067994s 0.235714s 1.637005s 0.113267s 0.068710s
0.251647s 0.067551s 0.234380s 1.626927s 0.113274s 0.068353s
0.253102s 0.067168s 0.235367s 1.651783s 0.113605s 0.068280s
and the new ordering is:
PCRE ≈ CFStringFind < JSRegExp < regex.h < ICU << RegexKitLite
So PCRE is still the fastest, but with a smaller margin. And RegexKitLite is now win the title of the slowest RegEx implementation.
To conclude this part, we should use PCRE for wildcard testing, if speed and clarity are important.
The source code of the test can be found in http://pastie.org/495990. Basically, 323 URLs collected from 3 different sources are matched against 16 wildcard rules. The file is compiled with:
/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/gcc-4.0 -arch armv6 -std=c99 -isysroot /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.2.1.sdk -I/Developer/Platforms/iPhoneOS.platform/Developer/usr/include/gcc/darwin/default -I/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.2.1.sdk/usr/lib/gcc/arm-apple-darwin9/4.0.1/include -L/usr/local/lib -Wall -mcpu=arm1176jzf-s -O2 -L. -framework CoreFoundation -framework Foundation -licucore -lpcre main.m RegexKitLite.m ; ldid -S a.out
and run on a jailbroken iPod Touch 1G, firmware version 2.2.1.