commit 1874e13047af2dc8724bc261be5f544c81ed3b20
parent e147909a9d67f323e4f4e61fa37362196e38be88
Author: Jordan Ritter <jpr5@darkridge.com>
Date: Tue, 22 Feb 2005 06:01:09 +0000
updated with tests against pcre 5.0
Diffstat:
M | doc/PCRE.txt | | | 69 | ++++++++++++++++++++++++++++++++++++++++++++++++--------------------- |
1 file changed, 48 insertions(+), 21 deletions(-)
diff --git a/doc/PCRE.txt b/doc/PCRE.txt
@@ -1,32 +1,59 @@
$Id$
-A quick note on PCRE vs GNU regex:
+Date: 2/21/05
- I ran several tests comparing GNU regex to the PCRE library
- using 10 million loop iterations: optimized vs. non-optimized,
- match vs. non-match. The conclusion I came to was that an
- unoptimized PCRE program is almost double the match and
- non-match times of the GNU regex library yields, and when using
- optimization a PCRE program would perform almost the same in the
- non-match case, but again almost twice that of the match case.
+A note about PCRE vs. GNU regex:
+
+ I ran several tests comparing GNU regex 0.12 to the PCRE 5.0
+ library using 100 million loop iterations: optimized
+ vs. non-optimized, match vs. non-match. The obvious conclusion
+ is that GNU regex is the reigning king of speed, and that with
+ regular expression engines optimization matters significantly.
+
+ (Please note that I tried other third-party regex libraries like
+ RxSpencer's and libhackerlab's, and none came close to
+ comparing.)
The test subject was "how now brown cow", and the pattern we
were searching for in the match case was "now brown", and in the
non-match case "not brown". Obviously, the speed of matches is
directly related to the actual regex itself, and a
well-formulated regex certainly performs more efficiently than a
- simple substring match. However, this test is indicative of how
- most people use ngrep, so the test results are still important.
+ simple substring match. However, this test is reasonably
+ indicative of how most people use ngrep, so the test results are
+ still important.
Granted, on the single-match level the time difference is
- absolutely unnoticeable (it took 10 million loop iterations to
- compute it), so this may not mean anything to you. Likewise,
- the stripped binary sizes are also within 10k of each other on
- the test compile box.
-
- If absolute speed is not the issue, then compile against PCRE
- since it has better licensing. If you're after the fastest you
- can get (for you netops and netadmins out there, you know who
- you are), then compile against GNU regex. The speed really
- helps when piping those 500MB pcap dump files through ngrep over
- and over.
+ absolutely unnoticeable (it took 100 million loop iterations to
+ compute something worthwhile), so this may not mean anything to
+ you. Likewise, the stripped binary sizes are also within 10k of
+ each other on the test compile box.
+
+ If licensing terms are more sensitive for you than speed, then
+ compile against PCRE which is available under the Artistic
+ License (Free as in Beer). Otherwise, in all other cases the
+ GNU regex library is the best candidate, and the speed can
+ really helps when piping those 500MB pcap dump files through
+ ngrep over and over for analysis.
+
+
+Test results:
+
+ CPU: Intel Pentium-M 2GHz
+ L1 I cache: 32K, L1 D cache: 32K
+ L2 cache: 2048K
+
+ Iterations: 100M
+
+ match nomatch
+
+ [-O0]
+ GNU regex-0.12 17.369s/17.385s 32.656s/32.069s
+ PCRE-5.0 35.840s/35.795s 25.340s/25.344s
+
+ [-O2]
+ GNU regex-0.12 12.240s/12.280s 19.512s/19.489s
+ PCRE-5.0 24.580s/24.578s 17.235s/17.238s
+
+
+