Coding Tips – Regex optimization in C#

coding-tipIn programming, manipulating strings can have serious impacts on your program performance, especially when using regular expression (regex). From what I found (links at the end), C# is not the best programming language to do this kind of operations.

I am not a regex expert but I know some tricks to increase performance when using them in .NET with C#. And this is what I’ll show you in this blog post. I started with the following piece of code :

public static class RegexComparer
{
    private const string REGEX_PATTERN = @"^[\w\d]+$";
    private const string REGEX_INPUT = "azertyuiop7894561230qsdfghjklm7894561230wxcvbn7894561230";
 
    public static bool StaticRegexMatch()
    {
        var match = Regex.Match(REGEX_INPUT, REGEX_PATTERN);
        return match.Success;
    }
}

The StaticRegexMatch() method will verify that the input is only made of alphanumeric characters by using the Regex class in a static way. I made a program that executes this function 500,000 times in a row and this is the result I get when looking at the execution time :

Static Regex Match
Executing action 500000 times
Elapsed time (ms) : 2713
Result is True

After this, I used the following code :

public static bool StaticCompiledRegexMatch()
{
    var match = Regex.Match(REGEX_INPUT, REGEX_PATTERN, RegexOptions.Compiled);
    return match.Success;
}
Static Compiled Regex Match
Executing action 500000 times
Elapsed time (ms) : 2404
Result is True

I gain in performance because knowing that the regex will be used several time I used the Compiled value of the RegexOptions parameter to specify the application to  store the regex in an assembly.

Last but not least solution :

private static Regex CompiledRegexInstance = new Regex(REGEX_PATTERN, RegexOptions.Compiled);
 
public static bool CompiledRegexInstanceMatch()
{
    var match = CompiledRegexInstance.Match(REGEX_INPUT);
    return match.Success;
}
Compiled Regex Instance Match
Executing action 500000 times
Elapsed time (ms) : 2317
Result is True

This time I used a specific instance in my class to match the input instead of the static Match() method and I kept the Compiled option. The performance gap is not as important as the first time but it is still better.

The pattern to match I chose in this example is quite simple and I can improve the performance of my code by using a different technique :

public static bool LinqMatch()
{
    return REGEX_INPUT.All(c => char.IsLetterOrDigit(c));
}
Linq Match
Executing action 500000 times
Elapsed time (ms) : 749
Result is True

As you can see, this approach is much more effective. The .NET char has several useful methods to validate data and combined with LINQ it can be very helpful for string operations.

Regular expressions are powerful features but are costly regarding execution time. Yet there are ways to improve their performance and sometimes the best answer is to avoid them if you can, depending on the pattern you want to match.

EDIT : You can find a link to a version of these examples with improvements on a gist made by Cybermaxs which I really thank (his answer is in the comments section).

Programming languages benchmark links :

As always, do not hesitate to share your ideas/remarks regarding this topic.

See you next time !

2 thoughts on “Coding Tips – Regex optimization in C#

  1. As I’m little concerned by performance :) … challenge accepted !

    First, You example doesn’t cover all the use cases: REGEX_INPUT contains only alphanumeric characters, it’s only half of the job.

    The regex is not optimal too because \w equals to [A-Za-z0-9_]. You don’t need to add \d and underscore is accepted in your version.

    Testing incorrect inputs is faster than correct inputs. So you can also test the inverse : try to find a non-alphanumeric character.

    Performance is not only about execution time but also about memory allocation. Excessive allocs add pressure on GC.

    Concerning LinqMatch, it’s even better to use us linq but a basic while loop.

    I’ve uploaded a gist here with some of these ideas. https://gist.github.com/Cybermaxs/654414d3cdd2634e925a

    To summarize, performance tunning is hard. Does it matter to test 500 000 string in your project ?

    Liked by 1 person

    • Hello Cybermaxs,

      First of all, thank you for your comment on my article. Second, sorry for the delay of this response of mine.
      I only scratch the surface of this topic just to give small tips. I am glad that you went deeper, I actually learned a few things that I was not aware of.

      I will edit the post to incorporate the gist you took the time to create.

      Thank you again for your interest regarding this topic and for sharing tips.

      Julien

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s