[Netarchivesuite-users] Regular expression (in Java) slowness fixed?

Peter Svanberg Peter.Svanberg at kb.se
Mon Jul 8 17:43:25 CEST 2019


I heard in Madrid something about Java regular expression handling being slow and that this problem was solved in some way.

*         In which NAS version was it fixed? (I searched in vain in release notes.)

*         How?

*         I read that using non-capturing groups ( "(?:foo|bar)" instead of " (foo|bar) ") could save time and memory in intensive regex handling, have anyone considered that, or other type of optimization? Or is regex checking (in the fixed version) a negligible aspect of the crawling time for an URI - even if there are hundreds of crawler trap regexes?



Peter Svanberg

National Library of Sweden
Phone: +46 10 709 32 78

E-mail: peter.svanberg at kb.se
Web: www.kb.se

