Bug in Delphi XE RegularExpressions Unit
Using the new RegularExpressions unit in Delphi XE, you can iterate over all the matches that a regex finds in a string like this:
procedure TForm1.Button1Click(Sender: TObject); var RegEx: TRegEx; Match: TMatch; begin RegEx := TRegex.Create('\w+'); Match := RegEx.Match('One two three four'); while Match.Success do begin Memo1.Lines.Add(Match.Value); Match := Match.NextMatch; end end;
Or you could save yourself two lines of code by using the static TRegEx.Match
call:
procedure TForm1.Button2Click(Sender: TObject); var Match: TMatch; begin Match := TRegEx.Match('One two three four', '\w+'); while Match.Success do begin Memo1.Lines.Add(Match.Value); Match := Match.NextMatch; end end;
Unfortunately, due to a bug in the RegularExpressions unit, the static call doesn’t work. Depending on your exact code, you may get fewer matches than you should, or you may get blank matches, or your application may crash with an access violation.
The RegularExpressions unit defines TRegEx
and TMatch
as records. That way you don’t have to explicitly create and destroy them. Internally, TRegEx
uses TPerlRegEx
to do the heavy lifting. TPerlRegEx
is a class that needs to be created and destroyed like any other class. If you look at the TRegEx
source code, you’ll notice that it uses an interface to destroy the TPerlRegEx
instance when TRegEx
goes out of scope. Interfaces are reference counted in Delphi, making them usable for automatic memory management.
The bug is that TMatch
and TGroupCollection
also need the TPerlRegEx
instance to do their work. TRegEx
passes its TPerlRegEx
instance to TMatch
and TGroupCollection
, but it does not pass the instance of the interface that is responsible for destroying TPerlRegEx
.
This is not a problem in our first code sample. TRegEx
stays in scope until we’re done with TMatch
. The interface is destroyed when Button1Click
exits.
In the second code sample, the static TRegEx.Match
call creates a local variable of type TRegEx
. This local variable goes out of scope when TRegEx.Match
returns. Thus the reference count on the interface reaches zero and TPerlRegEx
is destroyed when TRegEx.Match
returns. When we call MatchAgain
the TMatch
record tries to use a TPerlRegEx
instance that has already been destroyed.
To fix this bug, delete or rename the two RegularExpressions.dcu files and copy RegularExpressions.pas into your source code folder. Make these changes to both the TMatch
and TGroupCollection
records in this unit:
- Declare
FNotifier: IInterface;
in theprivate
section. - Add the parameter
ANotifier: IInterface;
to theCreate
constructor. - Assign
FNotifier := ANotifier;
in the constructor’s implementation.
You also need to add the ANotifier: IInterface;
parameter to the TMatchCollection.Create
constructor.
Now try to compile some code that uses the RegularExpressions
unit. The compiler will flag all calls to TMatch.Create
, TGroupCollection.Create
and TMatchCollection.Create
. Fix them by adding the ANotifier
or FNotifier
parameter, depending on whether ARegEx
or FRegEx
is being passed.
With these fixes, the TPerlRegEx
instance won’t be destroyed until the last TRegEx
, TMatch
, or TGroupCollection
that uses it goes out of scope or is used with a different regular expression.
Thank you very much! I was working with the static version and didn’t understand what was going on. Google and you saved my sunday
Comment by Erwin Jurschitza — Monday, 8 November 2010 @ 3:07
Thanks for pointing this problem out.
This does not seem to be fixed in “Update 1 for Delphi XE”. There’s only one minor change to “RegularExpressions.pas” on line 282.
Your call “87752” in QualityCentral is still being listed as “Open”:
http://qc.embarcadero.com/wc/qcmain.aspx?d=87752
Regards,
Olaf
Comment by Olaf — Monday, 15 November 2010 @ 1:07
I filled a QC report (90036) for this, you may want to vote for it if you’re facing this problem
link: http://qc.embarcadero.com/wc/qcmain.aspx?d=90036
Comment by jachguate — Wednesday, 1 December 2010 @ 1:50
@jachguate: I had already reported this in QC in September: http://qc.embarcadero.com/wc/qcmain.aspx?d=87752
@olaf: I’m sure Embarcadero won’t fix this in a Delphi XE update. The fix requires an interface change, which makes it a breaking change for precompiled units and packages (i.e. 3rd party libraries without source).
Comment by Jan Goyvaerts — Saturday, 18 December 2010 @ 10:04
@jan (4): Why does the fix require an interface change? Also, your QC report http://qc.embarcadero.com/wc/qcmain.aspx?d=92497 is waiting on you to make a minor change in the steps. Finally, thank you for discovering this: it’s a pity that after so long they add built-in regular expression capabilities only to break it while doing…
I haven’t moved to XE yet and was looking at the additions and trying to find what was the difference between the two new classes for regular expressions and stumble into this… tsc, tsc…
Comment by Fernando Madruga — Saturday, 23 April 2011 @ 5:47
Steps 1 and 2 in my solution are interface changes. An “interface change” is a change made to the interface section of a unit. An interface change requires all units that use the changed unit to be recompiled. Embarcadero doesn’t make interface changes in updates because it would break compatibility with all 3rd party components for which you don’t have source code that use the changed unit.
Comment by Jan Goyvaerts — Friday, 30 September 2011 @ 8:16
This bug has been fixed in Delphi XE2. Embarcadero used a slighly different solution. Instead of adding the FNotifier field, they replaced the FRegex field with FNotifier. The code then uses the FRegex reference held by FNotifier to access the relevant TPerlRegEx instance.
Comment by Jan Goyvaerts — Friday, 30 September 2011 @ 8:18