In programming, manipulating strings can have serious impacts on your program performance, especially when using regular expression (regex). From what I found (links at the end), C# is not the best programming language to do this kind of operations.
I am not a regex expert but I know some tricks to increase performance when using them in .NET with C#. And this is what I’ll show you in this blog post. I started with the following piece of code :
public static class RegexComparer { private const string REGEX_PATTERN = @"^[\w\d]+$"; private const string REGEX_INPUT = "azertyuiop7894561230qsdfghjklm7894561230wxcvbn7894561230"; public static bool StaticRegexMatch() { var match = Regex.Match(REGEX_INPUT, REGEX_PATTERN); return match.Success; } }
The StaticRegexMatch() method will verify that the input is only made of alphanumeric characters by using the Regex class in a static way. I made a program that executes this function 500,000 times in a row and this is the result I get when looking at the execution time :
Static Regex Match Executing action 500000 times Elapsed time (ms) : 2713
Result is True
After this, I used the following code :
public static bool StaticCompiledRegexMatch() { var match = Regex.Match(REGEX_INPUT, REGEX_PATTERN, RegexOptions.Compiled); return match.Success; }
Static Compiled Regex Match Executing action 500000 times Elapsed time (ms) : 2404
Result is True
I gain in performance because knowing that the regex will be used several time I used the Compiled value of the RegexOptions parameter to specify the application to store the regex in an assembly.
Last but not least solution :
private static Regex CompiledRegexInstance = new Regex(REGEX_PATTERN, RegexOptions.Compiled); public static bool CompiledRegexInstanceMatch() { var match = CompiledRegexInstance.Match(REGEX_INPUT); return match.Success; }
Compiled Regex Instance Match Executing action 500000 times Elapsed time (ms) : 2317 Result is True
This time I used a specific instance in my class to match the input instead of the static Match() method and I kept the Compiled option. The performance gap is not as important as the first time but it is still better.
The pattern to match I chose in this example is quite simple and I can improve the performance of my code by using a different technique :
public static bool LinqMatch() { return REGEX_INPUT.All(c => char.IsLetterOrDigit(c)); }
Linq Match Executing action 500000 times Elapsed time (ms) : 749 Result is True
As you can see, this approach is much more effective. The .NET char has several useful methods to validate data and combined with LINQ it can be very helpful for string operations.
Regular expressions are powerful features but are costly regarding execution time. Yet there are ways to improve their performance and sometimes the best answer is to avoid them if you can, depending on the pattern you want to match.
EDIT : You can find a link to a version of these examples with improvements on a gist made by Cybermaxs which I really thank (his answer is in the comments section).
Programming languages benchmark links :
As always, do not hesitate to share your ideas/remarks regarding this topic.
See you next time !
As I’m little concerned by performance :) … challenge accepted !
First, You example doesn’t cover all the use cases: REGEX_INPUT contains only alphanumeric characters, it’s only half of the job.
The regex is not optimal too because \w equals to [A-Za-z0-9_]. You don’t need to add \d and underscore is accepted in your version.
Testing incorrect inputs is faster than correct inputs. So you can also test the inverse : try to find a non-alphanumeric character.
Performance is not only about execution time but also about memory allocation. Excessive allocs add pressure on GC.
Concerning LinqMatch, it’s even better to use us linq but a basic while loop.
I’ve uploaded a gist here with some of these ideas. https://gist.github.com/Cybermaxs/654414d3cdd2634e925a
To summarize, performance tunning is hard. Does it matter to test 500 000 string in your project ?
LikeLiked by 1 person
Hello Cybermaxs,
First of all, thank you for your comment on my article. Second, sorry for the delay of this response of mine.
I only scratch the surface of this topic just to give small tips. I am glad that you went deeper, I actually learned a few things that I was not aware of.
I will edit the post to incorporate the gist you took the time to create.
Thank you again for your interest regarding this topic and for sharing tips.
Julien
LikeLike