One thing I really like about blogs is that you can exchange ideas and learn something you naturally won't know for a lifetime because you are not an expert. (And I hate you, Xanga Friends Lock users.)
Firstly, I have added the 3 options
-DNS_BLOCK_ASSERTIONS=1 -DRKL_CACHE_SIZE=307 -DRKL_FAST_MUTABLE_CHECK=1 as instructed and able to reduce the running time from 1.7s to 0.7s, but still quite slow compared with ICU at 0.25s. Then I increased the cache size to 709, and the running time is now 0.4s. Further increasing to 1619 starts make the running time longer probably due to excessive memory use.
RKL,307 RKL,709
-------------------
0.691226s 0.384044s
0.692440s 0.381194s
0.684274s 0.381550s
0.689568s 0.382489s
0.683985s 0.384499s
Then I consider rolling all 16 pattern into one big RegExp like
RE1|RE2|RE3|.... One problem with this is each pattern must not have back-references, but this is fine for wildcard rules since back-references do not appear at all. But another nasty thing is that AdBlock Plus rules has the so called filter options which prevents simple merging to take place. Anyway, the running time after merging are:
ICU PCRE regex.h RKLite JSRegExp
-------------------------------------------------------------
0.260404s 0.189028s 2.526485s 0.268121s 0.141875s AllInOne
0.256532s 0.160290s 2.526872s 0.264368s 0.141586s
0.257263s 0.160135s 2.536937s 0.266934s 0.142434s
0.256900s 0.160222s 2.532003s 0.265274s 0.141467s
0.256744s 0.160232s 2.528987s 0.264851s 0.142299s
-------------------------------------------------------------
0.253102s 0.067551s 0.235714s 0.382489s 0.113274s SeveralInst (median of 5)
So while RegexKitLite does become faster,
all other methods are dragged down, esp. the POSIX ERE. This confirms johne's statement.
There are some other developments: (1) I've found the
fnmatch function that does wildcard matching, but the running time is 0.8s, probably due to the use of recursion and implicit calls on locales. (2) Then I've written a specific routine that checks using pure C string, and the result is dramatic -- the function only takes 0.02 seconds! This restored some of my logical beliefs :D.
fnmatch5
---------
0.021076s
0.021167s
0.021065s
0.021229s
0.021238s
The new source code can be found in:
http://pastie.org/496783.