Regex performance

Carlos compares the performance of different Java Regex engines, and then compares the .NET Regex class against the best performing Java one. Result: Java is 20 times faster than C#.

While I think the comparison should be between the standard Java Regex vs. the standard .NET Regex, even in that case Java is still over 3 times faster than C#. Could the difference really be that big? Maybe Carlos (who definitely knows more about Java than he does about C#) didn't know about RegexOptions.Compiled? I decided to run a little benchmark of my own and came to the following results:

Java C# C# using RegexOptions.Compiled
851ms 2984ms 1822ms

With RegexOptions.Compiled it's not quite a factor 3, but Java is still over twice as fast as C#. I'm not sure how much of this difference is due to the fact that HotSpot is a better VM, and how much is due to a better Regex implementation in Java, but either way it's a major victory for Java.

I'd be interested to see how .NET 1.1 performs, but just have 1.0 installed right now. Unlike Java, having multiple versions of .NET on one system seems to be a bit more complicated...

TrackBack URL for this entry: http://www.hutteman.com/scgi-bin/mt/mt-tb.cgi/4
Comments

Read this about RegexOptions.Compiled

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconcompilationreuse.asp

The MSIL generated can not be unloaded, that can be extremely problematic.

Posted by Carlos Perez at February 17, 2003 5:39 PM

Yes, I know that RegExs that use RegexOptions.Compiled cannot be unloaded, and therefore this option cannot be applied all the time.

Many (if not most) times you use a RegEx, the expression is a constant though so the RegEx can be created as a static field with RegExOptions.Compiled turned on. Also, if there are only a limited number of variations of dynamically created RegExs, you could potentially store them in a Hashtable to avoid recreating the same one more than once.

Since Java does not have a Compiled option, I had expected the .NET RegEx to be at least as fast as the Java implementation. The fact that it was still over twice as slow really shows the maturity of the Java libraries and the HotSpot VM.

Posted by Luke Hutteman at February 25, 2003 12:39 AM
This discussion has been closed. If you wish to contact me about this post, you can do so by email.