Author: Jordan Ritter <email@example.com>
Date: Tue, 22 Feb 2005 06:01:09 +0000
updated with tests against pcre 5.0
|M||doc/PCRE.txt|| | ||69||++++++++++++++++++++++++++++++++++++++++++++++++---------------------|
1 file changed, 48 insertions(+), 21 deletions(-)
diff --git a/doc/PCRE.txt b/doc/PCRE.txt
@@ -1,32 +1,59 @@
-A quick note on PCRE vs GNU regex:
- I ran several tests comparing GNU regex to the PCRE library
- using 10 million loop iterations: optimized vs. non-optimized,
- match vs. non-match. The conclusion I came to was that an
- unoptimized PCRE program is almost double the match and
- non-match times of the GNU regex library yields, and when using
- optimization a PCRE program would perform almost the same in the
- non-match case, but again almost twice that of the match case.
+A note about PCRE vs. GNU regex:
+ I ran several tests comparing GNU regex 0.12 to the PCRE 5.0
+ library using 100 million loop iterations: optimized
+ vs. non-optimized, match vs. non-match. The obvious conclusion
+ is that GNU regex is the reigning king of speed, and that with
+ regular expression engines optimization matters significantly.
+ (Please note that I tried other third-party regex libraries like
+ RxSpencer's and libhackerlab's, and none came close to
The test subject was "how now brown cow", and the pattern we
were searching for in the match case was "now brown", and in the
non-match case "not brown". Obviously, the speed of matches is
directly related to the actual regex itself, and a
well-formulated regex certainly performs more efficiently than a
- simple substring match. However, this test is indicative of how
- most people use ngrep, so the test results are still important.
+ simple substring match. However, this test is reasonably
+ indicative of how most people use ngrep, so the test results are
+ still important.
Granted, on the single-match level the time difference is
- absolutely unnoticeable (it took 10 million loop iterations to
- compute it), so this may not mean anything to you. Likewise,
- the stripped binary sizes are also within 10k of each other on
- the test compile box.
- If absolute speed is not the issue, then compile against PCRE
- since it has better licensing. If you're after the fastest you
- can get (for you netops and netadmins out there, you know who
- you are), then compile against GNU regex. The speed really
- helps when piping those 500MB pcap dump files through ngrep over
- and over.
+ absolutely unnoticeable (it took 100 million loop iterations to
+ compute something worthwhile), so this may not mean anything to
+ you. Likewise, the stripped binary sizes are also within 10k of
+ each other on the test compile box.
+ If licensing terms are more sensitive for you than speed, then
+ compile against PCRE which is available under the Artistic
+ License (Free as in Beer). Otherwise, in all other cases the
+ GNU regex library is the best candidate, and the speed can
+ really helps when piping those 500MB pcap dump files through
+ ngrep over and over for analysis.
+ CPU: Intel Pentium-M 2GHz
+ L1 I cache: 32K, L1 D cache: 32K
+ L2 cache: 2048K
+ Iterations: 100M
+ match nomatch
+ GNU regex-0.12 17.369s/17.385s 32.656s/32.069s
+ PCRE-5.0 35.840s/35.795s 25.340s/25.344s
+ GNU regex-0.12 12.240s/12.280s 19.512s/19.489s
+ PCRE-5.0 24.580s/24.578s 17.235s/17.238s