Rick Strahl's Weblog  

Wind, waves, code and everything in between...
.NET • C# • Markdown • WPF • All Things Web
Contact   •   Articles   •   Products   •   Support   •   Advertise
Sponsored by:
West Wind WebSurge - Rest Client and Http Load Testing for Windows

How many ways to do a String Replace?


:P
On this page:

You ever do this stunt? You're in the middle of working and you realize that you need a string function. .NET makes string manipulation fairly easy but it can often be frustrating to find just the right function. The issue is that .NET string functions are scattered over several different objects. The string class is what is used most frequently but it's not a terrible complete set of string manipulation functions.

So I just needed a routine to replace the 2nd instance of a particular string match in a string. So how many ways can I do this? Actually just using a base library function - nothing directly.

So you might think about:

  • string.Replace()
  • StringBuilder.Replace()
  • Regex.Replace()
  • Regex.Replace()  (Instance)

Lots of choices for string functionality in general and Replace functionality in particular. String.Replace() works for plain string replacements, but it won't help with case sensitivity or replacement of a specific instance of text.

StringBuilder.Replace() is a little better - it lets you specify WHERE to replace a string optionally, but you have to provide an index into the string. Since IndexOf() doesn't let you jump to a specific instance that's not really all that useful either. Regex.Replace() - well anything's possible with pattern matches, but in order to deal with a specific instance replacement you'd have to use a MatchEvaluator which is not the greatest way for building say a generic routine. In that scenario Regex always feels to application level.

Then there's also the Regex.Replace at the instance level. Yup you can create a Regex instance and you get a different set of functions than the static ones (another odd choice for API design). This version of regExInstance.Replace() lets you control the max number of replacements that occur and a start position, but not the actual start instance.

So lots of options but all close, but no cigar.

So after rummaging through all this I finally just created a couple of generic routines that do what I need.

[updated: 4/30/07 with feedback from comments]

 
/// <summary>
/// String replace function that support
/// </summary>
/// <param name="OrigString">Original input string</param>
/// <param name="FindString">The string that is to be replaced</param>
/// <param name="ReplaceWith">The replacement string</param>
/// <param name="Instance">Instance of the FindString that is to be found. if Instance = -1 all are replaced</param>
/// <param name="CaseInsensitive">Case insensitivity flag</param>
/// <returns>updated string or original string if no matches</returns>
public static string ReplaceStringInstance(string OrigString, string FindString, 
                                           string ReplaceWith, int Instance, 
                                           bool CaseInsensitive)
{
    if (Instance == -1)
        return ReplaceString(OrigString, FindString, ReplaceWith, CaseInsensitive);
 
    int at1 = 0;            
    for (int x = 0; x < Instance; x++)
    {
 
        if (CaseInsensitive)
            at1 = OrigString.IndexOf(FindString, at1, OrigString.Length - at1,StringComparison.OrdinalIgnoreCase);
        else
            at1 = OrigString.IndexOf(FindString, at1);
 
        if (at1 == -1)
            return OrigString;
 
        if (x < Instance-1)
            at1 += FindString.Length;
    }            
 
    return OrigString.Substring(0, at1) + ReplaceWith + OrigString.Substring(at1 + FindString.Length);
 
    //StringBuilder sb = new StringBuilder(OrigString); 
    //sb.Replace(FindString, ReplaceString, at1, FindString.Length); 
    //return sb.ToString();
}
 
/// <summary>
/// Replaces a substring within a string with another substring with optional case sensitivity turned off.
/// </summary>
/// <param name="OrigString">String to do replacements on</param>
/// <param name="FindString">The string to find</param>
/// <param name="ReplaceString">The string to replace found string wiht</param>
/// <param name="CaseInsensitive">If true case insensitive search is performed</param>
/// <returns>updated string or original string if no matches</returns>
public static string ReplaceString(string OrigString, string FindString, 
                                   string ReplaceString, bool CaseInsensitive)
{
    int at1 = 0;
    while(true)
    {
        if (CaseInsensitive)
            at1 = OrigString.IndexOf(FindString,at1,OrigString.Length-at1,StringComparison.OrdinalIgnoreCase);
        else
            at1 = OrigString.IndexOf(FindString,at1);
 
        if (at1 == -1)
            return OrigString;
 
        OrigString = OrigString.Substring(0, at1) + ReplaceString + OrigString.Substring(at1 + FindString.Length);
 
        at1 += ReplaceString.Length;
    }
 
    return OrigString;
}

I'm sure some of the Regex wiz's are going to have a better way, but hey I'm glyph challenged and this works.

I also wonder frequently how efficient performance of RegEx is compared to using procedural code to do stuff like this. I would imagine writing code like this for simple string manipulations like the above is pretty efficient - I suspect RegEx would probably have some overhead especially for smaller strings and after all Regex has to run some sort of code to do its matching and replacement as well.

Thoughts?

Posted in CSharp  .NET  

The Voices of Reason


 

Steve from Pleasant Hill
April 30, 2007

# re: How many ways to do a String Replace?

ReplaceStringInstance('this is cool', 'is very', 1, false);

Juma
April 30, 2007

# re: How many ways to do a String Replace?

Hey Rick,

You may want to include Culture info in your string replace functions and Microsoft's recommendation is to use ToUpperInvariant() instead of ToLowerInvariant().

Ref:http://msdn2.microsoft.com/en-us/library/ms973919.aspx#stringsinnet20_topic6

Luke Breuer
April 30, 2007

# re: How many ways to do a String Replace?

You seem to have missed the instance method IndexOf(int, StringComparison) of the String class. Anyways, here's the start of some testing code. I compiled in debug mode so that nothing was optimized away; I'd have to come up with some scheme for the optimized version, perhaps involving storing the results in arrays.
delegate void Action();

static void Tester(string msg, Action a)
{
    Stopwatch sw = Stopwatch.StartNew();

    a();

    sw.Stop();
    Console.WriteLine("{0,-68}{1:0.000} ms", msg, sw.Elapsed.TotalMilliseconds);
}

static void Main(string[] args)
{
    string source = "This is a really long sentence with several uses of a character, " +
        "' a ', surrounded by spaces.";
    string find = "A";
    string replace = "b";
    const int Iterations = 100000;

    Tester("String.IndexOf(, StringComparison.OrdinalIgnoreCase)", delegate
    {
        for (int i = 0; i < Iterations; i++)
        {
            int n = source.IndexOf(find, StringComparison.OrdinalIgnoreCase);
        }
    });

    Tester("Regex.Match.Index", delegate
    {
        for (int i = 0; i < Iterations; i++)
        {
            int n = Regex.Match(source, find, RegexOptions.IgnoreCase).Index;
        }
    });

    Tester("String.IndexOf(, StringComparison.OrdinalIgnoreCase) + replace", delegate
    {
        for (int i = 0; i < Iterations; i++)
        {
            int idx = 0;
            string copy = source;

            while ((idx = copy.IndexOf(find, idx, StringComparison.OrdinalIgnoreCase)) >= 0)
            {
                copy = copy.Substring(0, idx) + replace + copy.Substring(idx + replace.Length);
                idx += replace.Length;
            }
        }
    });

    Tester("Regex.Replace(,,, RegexOptions.IgnoreCase)", delegate
    {
        for (int i = 0; i < Iterations; i++)
        {
            string s = Regex.Replace(source, find, replace, RegexOptions.IgnoreCase);
        }
    });
}

Tom Pester
April 30, 2007

# re: How many ways to do a String Replace?

If I understand your requirments it's realy easy to come up with a regex :

string ResultString = null;
try {
ResultString = Regex.Replace(SubjectString, "(b)(.*?)(b)(.*$)", "$1$2c$4");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}


given input "aaaaabaaaaaaabaaa" it will replace the second b with a c

The idea is too match the second b by first matching the previous b (so this solution wont scale easily to xth match) and after the second b match to the end of the string/text (depedning on regex option) so it does not match again.


I don't have the time to do some performance testing and I'm intrested as well in how they compare.

Regards, Tom

Rick Strahl
April 30, 2007

# re: How many ways to do a String Replace?

Randy, the VFP Toolkit StrConvert functions actually don't work correctly. sb.Replace doesn't actually work of an occurrance, but of a location index and therefor in most cases will not give you the desired result.

Juma - re: ToLowerInvariant(), great point. Actually was thinking about that later that evening (funny how that works sometimes after I've 'switched off') but didn't put it in. Thanks.

Luke, thanks for creating this - guess I was too lazy to do something like this while in the middle of coding <g>...

IndexOf: Hmmm... yup, looked at that before but couldn't quite get the Count parameter right. Looking at this again though I see how this is supposed to work, ahem. I've updated the code above to reflect.

Tom Pester
April 30, 2007

# re: How many ways to do a String Replace?

This regex will scale better :

ResultString = Regex.Replace(SubjectString, "((b.*?){5})b(.*$)", "$1c$3");

It replaces the 6th (5+1) b with a c

Regards, Tom

Rick Strahl
April 30, 2007

# re: How many ways to do a String Replace?

Tom,

The problem with RegEx is that you run into character encoding issues in strings. What if you need to match a string that contains a quote or backslash or period? How do you represent that as a pattern string and figure out how to dynamically encode that as the string you're searching for is part of the pattern?

So if in your scenario I have a string like "Filename is c:\MyType.gif" how would you get that transformed into an explicit Regex pattern string (giving that \ and . have special meaning)?

I'm not a Regex wiz (and I may very well be overlooking something obvious), but I've found that to be a problem in generic scenarios for Regex. Regex works well for me if I'm applying fixed patterns to text. But patterns dynamically created I've had no luck with.

Tom Pester
April 30, 2007

# re: How many ways to do a String Replace?

I think an answer to your question is the Regex.Escape Method :
http://msdn2.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape.aspx

Regexes seem daunting but that's because if the intro almost all programmers get to it. The come to a solution through trial and error and it will get them into trouble later of course when the input changes.

I think I messed around with regexes for 5 years if not more untill I discovered the book from Friedl http://www.oreilly.com/catalog/regex/ which is awesome.
Together with the tool Regexbuddy I can tacle almost every pattern problem that I face professionaly. Its very nice to write a concise expression instead of doing string manipulation.

I find that string manipulation if used sparingly is more understandable to all programmers but it has its limits. If you agree with this you should pick up a copy of Friendl's book but first unlearn all the bad tutorials you read about regexes and embrace the glyph :)

# A Continuous Learner's Weblog: Links (4/30/2007)


Josh Stodola
May 03, 2007

# re: How many ways to do a String Replace?

Hi Rick, I agree. I love the string manipulation functions and I almost use them exclusively over RegEx.

# A Continuous Learner's Weblog: April 2007


Rick Strahl's Web Log
June 23, 2007

# Rick Strahl's Web Log


Steven Smith
July 05, 2007

# re: How many ways to do a String Replace?

Rick - Don't forget to check out RegExLib.com for sample regular expressions, an online testing tool, and a Regex Cheat Sheet (that prints on 1 page) that is great for the glyph challenged. And you can escape special characters in regexes the same way you do in C# -- use the \ character, generally.

Adam Emrick
September 01, 2007

# re: How many ways to do a String Replace?

Rick, Great job!! Very well written and helpful. Thanks

Shaggie
November 10, 2009

# re: How many ways to do a String Replace?

I know this may be language dependent or require referencing a dll, and may be frowned upon by many, but what about the Microsoft.VisualBasic.Strings.Replace() function?
(May need to reference Microsoft.VisualBasic.Compatibility)

Kurt Koller
May 30, 2011

# re: How many ways to do a String Replace?

regex seems significantly faster. check the benchmark here

http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx

Rick Strahl
May 30, 2011

# re: How many ways to do a String Replace?

@Kurt - if I'm reading those results right RegEx is the SLOWEST. Manual manipulation the fastest in that article in all the examples given. I'm not surprised. RegEx has to do a lot of parsing and reparsing of a string where an optimized search and replace can be much more efficient.

Surprised though that the VB version doesn't use some native code for that somewhere which likely would be even faster.

West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2024