Regex Guru

Wednesday, 22 September 2010

Bug in Delphi XE RegularExpressions Unit

Filed under: Regex Trouble — Jan Goyvaerts @ 11:16

A bug in RegularExpressions.pas in Delphi XE may result in fewer matches or blank matches or in an access violation. This article explains the cause of the bug and how to fix it.

Friday, 6 November 2009

TPerlRegEx.CleanUp() Bugfix

Filed under: Regex Trouble — Jan Goyvaerts @ 16:28

A bug in recent versions of TPerlRegEx caused it to crash when reusing a TPerlRegEx instance with another regular expression because two pointers weren’t set to nil after freeing them.

Monday, 27 April 2009

Split() is Not Always The Best Way to Split a String

Filed under: Regex Trouble — Jan Goyvaerts @ 16:17

The split() function makes it easy to split a string when you can use a simple regex to match the delimiters on which you want to split. Often it is much easier to write a regex that matches the content between the delimiters that you want to keep. In such cases, use findall() instead of split().

Friday, 19 December 2008

Don’t Escape Literal Characters That Aren’t Metacharacters

Filed under: Regex Trouble — Jan Goyvaerts @ 17:29

Perl-style regular expressions treat 12 punctuation characters as metacharacters outside character classes. These characters need to be escaped with a backslash if you want to include them as literal characters in your regex: .^$|*+?()[{\ Inside character classes, these flavors treat a different set of 4 punctuation characters as metacharacters. Only those 4 need to be […]

Thursday, 8 May 2008

Follow Up with Adequate Testing

Filed under: Regex Trouble — Jan Goyvaerts @ 15:05

The regular expression from the Do Follow plugin is dedicated to a single purpose. Repurposing it for your own code will expose shortcomings that don’t matter for the plugin, but may matter for what you’re trying to do. Never copy-and-paste a regex without testing it.

No Follow The Lazy Dot

Filed under: Regex Trouble — Jan Goyvaerts @ 8:31

The popular Do Follow WordPress plugin uses a rather inefficient regular expression for its job. Here’s how to improve it.

Tuesday, 15 April 2008

Watch Out for Zero-Length Matches

Filed under: Regex Trouble — Jan Goyvaerts @ 14:51

Zero-length matches are often an unintended result of mistakenly making everything optional in a regular expression. Sometimes they can be useful. In browsers like Firefox, zero-length matches can cause your JavaScript code to loop forever on regex.exec().

Tuesday, 8 April 2008

Unintended Backtracking Can Bite You

Filed under: Regex Trouble — Jan Goyvaerts @ 17:01

Backtracking occurs when the regular expression engine encounters a regex token that does not match the next character in the string. The regex engine will then back up part of what it matched so far, to try different alternatives and/or repetitions. Understanding this process will make all the difference between guessing and understanding why a […]

Friday, 14 March 2008

Regex Trouble

Filed under: Regex Trouble — Jan Goyvaerts @ 8:49

Recap of 4 articles on regular expressions pitfalls I previously posted to