VBScript.RegExp and the . Operator on multi-line Content

December 05, 2007 • from Maui, Hawaii • 6 comments

On this page:

Note to self: Remember that the COM RegEx parser doesn't deal with the . operator the same way in multi-line content as .NET or most other RegEx parsers do. I've just spent 20 minutes troubleshooting a RegEx expression that works just fine in RegEx Buddy and .NET code, but failed in one of my FoxPro apps here using the COM VBScript.RegEx parser.

The code I was working on required stripping out @Register tags from an ASP.NET style markup document:

TEXT TO lcText  NoShow
<!-- 
     * Set the name of your class in the ID property
     * Set the GeneratedSourceFile at a PRG file in your FoxPro project directory
     * NOTE: the path is relative to your executing directory (CURDIR())
     * Remove this block of comment text
-->
<%@ Page Language="C#"          
         GeneratedSourceFile="devDemo/BasePage.prg"
         ID="BasePage_Page"
         AuthenticationMode="Basic"         
%>
<%@ Register Assembly="WebConnectionWebControls" 
    Namespace="Westwind.WebConnection.WebControls"
    TagPrefix="ww" %>

<%@ Register Assembly="WebConnectionWebControls" 
    Namespace="Westwind.WebConnection.WebControls.Customization"
    TagPrefix="ww" %>

... more HTML here

<form id="form1" runat="server">           
     
ENDTEXT

LOCAL loRegEx as VBScript.RegExp 
loRegEx = CREATEOBJECT("VBScript.RegExp") 

loRegEx.IgnoreCase = .T.
loRegEx.Global = .T.
loRegEx.MultiLine = .T.

loRegEx.Pattern = '<%@\s{0,}Register.*?%>\s{0,}'

? loRegEx.Replace(lcText,"")

RETURN

So I started out with the above expression to match and then remove the entire @Register tags:

loRegEx.Pattern = '<%@\s{0,}Register.*?%>\s{0,}'

using the . to specify any character in a multi-line expression to parse. This doesn't work because apparently the . operator in the VBScript RegEx parser doesn't match the newline and so only effective matches on the first line. This is despite the multi-line option, which only affects how the ^ and $ (beginning and end of line) characters are parsed by the RegEx parser.

There are a couple of ways around this. What I used here since I just replace the . with [\s,\S] which is essential every character:

loRegEx.Pattern = '<%@\s{0,}Register[\s,\S]*?%>\s{0,}'

Or to be more explict [.|\n] also works to provide the same results.

My short term memory is going bad. Just as I got this working I ran into some older code (in the same program file even!) where I had apparently done exactly the same thing previously using [\s,\S] instead of the .. Nothing like solving the same problem twice, eh? Hopefully this time after writing it up I'll remember. <g>

In general I wish I could remember more of the little bit of RegEx work I do. Even better some of that what other people do, he he. I appreciate the power of RegEx, but it seems whenever I do anything with RegEx it takes forever to do even simple things and once it's done I immediately and completely forget the syntax and process that went into figuring it out. No retention there for me. Case in point here. Next time maybe I'll remember.

The Voices of Reason

Richard Deeming
December 05, 2007

# re: VBScript.RegExp and the . Operator on multi-line Content

I've always found Ultrapico's Expresso very useful for any Regex work:
http://www.ultrapico.com/Expresso.htm

If you want to test expressions for javascript, you can select the ECMA Script option under the Design Mode tab, which is equivalent to specifying RegexOptions.ECMAScript with your expression.

As for the VBScript RegExp object, so long as you're using 5.5 or higher, it uses the same engine as the JScript RegExp object.

Rick Strahl
December 05, 2007

# re: VBScript.RegExp and the . Operator on multi-line Content

@Richard - Thanks! That's a big help actually. I use RegEx Buddy (http://www.regexbuddy.com/) and it too lets you test with various different parsers including the JavaScript parser. Choosing the JavaScript parser yields the same results with the . not working across multi-lines.

Steve Smith
December 06, 2007

# re: VBScript.RegExp and the . Operator on multi-line Content

There's also http://regexlib.com/, a regular exprsession library, where you can find hundreds of regular expressions and contribute your own.

Luke Breuer
December 06, 2007

# re: VBScript.RegExp and the . Operator on multi-line Content

RegexOptions.Multiline [or (?m)] does not cause .NET regex to match . to newlines. RegexOptions.Singleline [or (?s)] does that.

Mathieu
October 02, 2011

# re: VBScript.RegExp and the . Operator on multi-line Content

So many thanks for the tip! I was going mad on this problem. :)

Peter
January 23, 2013

# re: VBScript.RegExp and the . Operator on multi-line Content

Thanx for the hint.

WTF: "." dont work on Multiline !

Rick Strahl's Weblog