RegEx-CLAWK-Lexer

Michael Parker has written a set of tools for regular expression matching, AWK functions, and lexical generator. They can be used separately or together. Download links are on his site.


libregex clisp REGEXP module michael parker's regex
gcc -O3 clisp-2.33.2(#) clisp-2.33.2(#) sbcl-0.8.14 regexp
3236/SEC 47893/SEC 10289/SEC 148810/SEC A*BD
1412/SEC 1685/SEC 10233/SEC 136986/SEC (A|A)*BD
1848/SEC 2061/SEC 9012/SEC 140449/SEC (A|B)*BD
1821/SEC 1993/SEC 7159/SEC 128041/SEC (B|A)*BD
3921/SEC 4049/SEC 10100/SEC 131234/SEC ((A*B)|(AC))D
3322/SEC 3545/SEC 10122/SEC 130890/SEC ((A*B)|(A*C))D
3831/SEC 47870/SEC 9106/SEC 145773/SEC [Aa]*[Bb][Dd]
1000000/SEC 840336/SEC 826446/SEC 253807/SEC STRING=
83333/SEC 151745/SEC 143678/SEC 150150/SEC STRING-EQUAL(*)
(*) not using a library stricmp.
(#) bytecode compiled.
All tests on Athlon 1200MHz, 1GB DDR2700.

The second column is retest adapted to use the REGEXP module of clisp instead of Parker's REGEX package. This module is a FFI to the libc regex, so it should give a better comparison of a C implementation of regex with a Common-Lisp one. The faster results in the second column vs. the first are even stranger.

It would be nice to see this regular expression library benchmarked against cl-ppcre -AK

RegEx seems to be slower by a factor of 4 than cl-ppcre on patterns like "[0-9]+ [0-9]+ [a-z]+ [0-9]+" tested on "23424 3242324 ab 234432432 22334242 23 232 23", for example.

Just added the start condition feature to cl-lexer. Could I submit a patch to this? -Haiwei

A Text Library.


This page is linked from: CLAWK   Cut and shut   Debian   Macro Characters   Regex   scsh   text  

CLiki pages can be edited by anyone at any time. Imagine a fearsomely comprehensive disclaimer of liability. Now fear, comprehensively