Note to self: Remember that the COM RegEx parser doesn't deal with the . operator the same way in multi-line content as .NET or most other RegEx parsers do. I've just spent 20 minutes troubleshooting a RegEx expression that works just fine in RegEx Buddy and .NET code, but failed in one of my FoxPro apps here using the COM VBScript.RegEx parser.
The code I was working on required stripping out @Register tags from an ASP.NET style markup document:
TEXT TO lcText NoShow
<!--
* Set the name of your class in the ID property
* Set the GeneratedSourceFile at a PRG file in your FoxPro project directory
* NOTE: the path is relative to your executing directory (CURDIR())
* Remove this block of comment text
-->
<%@ Page Language="C#"
GeneratedSourceFile="devDemo/BasePage.prg"
ID="BasePage_Page"
AuthenticationMode="Basic"
%>
<%@ Register Assembly="WebConnectionWebControls"
Namespace="Westwind.WebConnection.WebControls"
TagPrefix="ww" %>
<%@ Register Assembly="WebConnectionWebControls"
Namespace="Westwind.WebConnection.WebControls.Customization"
TagPrefix="ww" %>
... more HTML here
<form id="form1" runat="server">
ENDTEXT
LOCAL loRegEx as VBScript.RegExp
loRegEx = CREATEOBJECT("VBScript.RegExp")
loRegEx.IgnoreCase = .T.
loRegEx.Global = .T.
loRegEx.MultiLine = .T.
loRegEx.Pattern = '<%@\s{0,}Register.*?%>\s{0,}'
? loRegEx.Replace(lcText,"")
RETURN
So I started out with the above expression to match and then remove the entire @Register tags:
loRegEx.Pattern = '<%@\s{0,}Register.*?%>\s{0,}'
using the . to specify any character in a multi-line expression to parse. This doesn't work because apparently the . operator in the VBScript RegEx parser doesn't match the newline and so only effective matches on the first line. This is despite the multi-line option, which only affects how the ^ and $ (beginning and end of line) characters are parsed by the RegEx parser.
There are a couple of ways around this. What I used here since I just replace the . with [\s,\S] which is essential every character:
loRegEx.Pattern = '<%@\s{0,}Register[\s,\S]*?%>\s{0,}'
Or to be more explict [.|\n] also works to provide the same results.
My short term memory is going bad. Just as I got this working I ran into some older code (in the same program file even!) where I had apparently done exactly the same thing previously using [\s,\S] instead of the .. Nothing like solving the same problem twice, eh? Hopefully this time after writing it up I'll remember. <g>
In general I wish I could remember more of the little bit of RegEx work I do. Even better some of that what other people do, he he. I appreciate the power of RegEx, but it seems whenever I do anything with RegEx it takes forever to do even simple things and once it's done I immediately and completely forget the syntax and process that went into figuring it out. No retention there for me. Case in point here. Next time maybe I'll remember.
Other Posts you might also like