Regex Guru

Thursday, 29 May 2014

What’s New in Delphi XE6 Regular Expressions

Filed under: Regex Libraries — Jan Goyvaerts @ 13:27

There’s not much new in the regular expression support in Delphi XE6. The big change that should be made, upgrading to PCRE 8.30 or later and switching to the pcre16 functions that use UTF-16, still hasn’t been made. XE6 still uses PCRE 7.9 and thus continues to require conversion from the UTF-16 strings that Delphi uses natively to the UTF-8 strings that older versions of PCRE require.

Delphi XE6 does fix one important issue that has plagued TRegEx since it was introduced in Delphi XE. Previously, TRegEx could not find zero-length matches. So a regex like (?m)^ that should find a zero-length match at the start of each line would not find any matches at all with TRegEx.

The reason for this is that TRegEx uses TPerlRegEx to do the heavy lifting. TPerlRegEx sets its State property to [preNotEmpty] in its constructor, which tells it to skip zero-length matches. This is not a problem with TPerlRegEx because users of this class can change the State property. But TRegEx does not provide a way to change this property. So in Delphi XE5 and prior, TRegEx cannot find zero-length matches.

In Delphi XE6 TPerlRegEx’s constructor was changed to initialize State to the empty set. This means TRegEx is now able to find zero-length matches. TRegex.Replace() using the regex (?m)^ now inserts the replacement at the start of each line, as you would expect. If you use TPerlRegEx directly, you’ll need to set State to [preNotEmpty] in your own code if you relied on its behavior to skip zero-length matches.

You will need to check existing applications that use TRegEx for regular expressions that incorrectly allow zero-length matches. In XE5 and prior, TRegEx using \d* would match all numbers in a string. In XE6, the same regex still matches all numbers, but also finds a zero-length match at each position in the string.

RegexBuddy 4 warns about zero-length matches on the Create panel if you set it to Detailed mode. At the bottom of the regex tree there will be a node saying either “your regular expression may find zero-length matches” or “zero-length matches will be skipped” depending on whether your application allows zero-length matches (XE6 TRegEx) or not (XE–XE5 TRegEx).

4 Comments »

  1. I am looking for the fastest regular expression processor in the universe.

    These guys I work with run about 100 regular expressions per file in our workflow using stuff they wrote in Microsoft C# “Dismal” Studio.

    It is like their processor does one replacement per file I/0… like one at a time… like it reads in the file, does one transformation, writes the output, and then does it all over again for the next expression.

    Moooooooooooolasses. Drip.

    I want 1-I/0.

    That is one input, massive expression processing, and one output. Done.
    Wham-bam, thank you ma’am. Its Miller time!

    Thoughts?

    Comment by John Hoffman — Friday, 30 May 2014 @ 7:03

  2. I don’t know if it’s the fastest in the universe (there’s many planets I haven’t been to yet), but our product PowerGREP can do a search-and-replace using 100 or more regular expressions while reading and writing each file only once. Set “action type” to “search-and-replace”, set “search type” to “delimited regular expressions” or “list of regular expressions”, and make sure “non-overlapping search” is turned on. Then add all your regexes on the Action panel. The Quick Replace button will do the search-and-replace the fastest, but you may want to start with Preview or Replace so you can inspect the results.

    Comment by Jan Goyvaerts — Friday, 30 May 2014 @ 15:08

  3. Hi, Jan!

    I have consistently found great use in your tutorials and short articles on regular expression foundations (concepts, play-by-play analysis) and recipes. In my online searches, I find myself coming back to sites you manage and contribute to time and time again.

    I am planning on purchasing something from your online store for a thorough read. I would like to know what you recommend to a beginner. I would love to use Regex Buddy, but I also want to learn from the ground up, as I gravitate best to theory before practice.

    What do you recommend?

    - Daniel

    Comment by Daniel — Monday, 11 August 2014 @ 23:05

  4. RegexBuddy is an excellent choice for beginners. Its help file and PDF manual include a complete tutorial to regular expressions. If you like to learn by reading then reading through the tutorial in the order the topics are presented in the table of contents will give you a complete understanding of regular expressions, starting with the basics. As you progress through the tutorial you can experiment in RegexBuddy to practice what you’ve learned.

    Comment by Jan Goyvaerts — Tuesday, 12 August 2014 @ 7:15

TrackBack URL

Leave a comment

Note: comments are moderated, so your comment will not appear instantly.