TPerlRegEx for Delphi 2009
TPerlRegEx is a Delphi VCL component wrapper around the open source PCRE library. I originally developed it for in-house use. It powered EditPad Pro 4 and 5, PowerGREP 1 and 2, and RegexBuddy 1. The latest versions of these products use a custom-built regular expression engine. The custom-built engine can do things such as searching through files larger than 4 GB (in PowerGREP) or emulate many regex flavors (in RegexBuddy), which aren’t typical usage scenarios for regular expressions.
The PCRE library is a great choice to add regex support to your Delphi applications. Though the library is written in C, Delphi can link the OBJ files output by a C compiler into your application. TPerlRegEx includes ready-made OBJ files, so you don’t have to worry about any of this.
The latest version of TPerlRegEx includes PCRE 7.7, with Unicode support enabled. To actually use the Unicode features, you’ll need Delphi 2009. In PerlRegEx.pas, you’ll see the following near the top of the unit:
{$IFDEF UNICODE} type PCREString = UTF8String; {$ELSE} type PCREString = AnsiString; {$ENDIF}
The UNICODE directive is defined by default in Delphi 2009, but not in Delphi 2007 or earlier. If you’re using Delphi 2007 or before, TPerlRegEx will work with AnsiString, which has been the default string type since Delphi 2.
When you migrate your application to Delphi 2009, the default string type becomes UnicodeString. PCRE does not support UTF-16. It only supports 8-bit strings (i.e. one byte per character), and UTF-8. Hence my decision to make TPerlRegEx use the new and improved UTF8String in Delphi 2009.
When you assign a varable declared as “string”, which really is UnicodeString in Delphi 2009, to a property such as TPerlRegEx.Subject, then the Delphi 2009 compiler will automatically do the UTF-16 to UTF-8 conversion for you. When you assign a property such as TPerlRegEx.MatchedExpression to a string variable, the UTF-8 to UTF-16 conversion is also automatic. The net result is that when you use TPerlRegEx and you upgrade from Delphi 2007 to Delphi 2009, your regular expressions automatically become Unicode-enabled.
The only caveat lies in position properties such as MatchedExpressionOffset and MatchedExpressionLength. These indicate byte positions in the UTF-8 strings that TPerlRegEx deals with. To make your code work correctly in Delphi 2009, use those positions with the TPerlRegEx.Subject property (which uses UTF-8) instead of your original string variable (which uses UTF-16 in Delphi 2009).
Note that in Delphi 2007, there’s no difference between UTF8String and AnsiString. Manually defining the UNICODE directive in Delphi 2007 will not make TPerlRegEx support Unicode. You’d have to add explicit calls to UTF8Encode and UTF8Decode to do the conversions.
All in all, porting TPerlRegEx from Delphi 2007 to 2009 was very easy. Changing the string declaration from AnsiString to UTF8String, and passing the PCRE_UTF8 flag to the pcre_compile function is all it really took. I wasted much more time upgrading the OBJ files from PCRE 4.5 to 7.7, which I ended up borrowing from the JCL project.
Download TPerlRegEx. Source is included under the MPL 1.1 license.
I Install it In Delphi 2007,When compiler the Form with TPerlRegEx ,Errors:[Pascal Fatal Error] F2084 Internal Error: L3576
Comment by wwd — Tuesday, 19 August 2008 @ 22:36
I didn’t get the internal error yesterday with Delphi 2007, but now I get it too.
The internal error occurs when the Delphi compiler has to link the OBJ files into your EXE. You can work around it by either putting TPerlRegEx into a runtime package, or by linking the PCRE DLL instead of the OBJ files. I’ve updated the component to use the DLL by default for Delphi 2007 and prior.
The internal error does not occur with Delphi 2009. (I’m on the private field test and have been given specific permission to blog about it.)
Comment by Jan Goyvaerts — Wednesday, 20 August 2008 @ 10:54
I don’t have this problem with my Delphi 7, it just compiles and links without any problems. But I don’t have TPerlRegEx on a form, I create it at runtime. Not sure if that makes a difference.
But with the new version I get more stack overflows what can be fixed by increasing the $MAXSTACKSIZE value.
Jan, may I ask which $MAXSTACKSIZE you can recommend or with $MAXSTACKSIZE you use in your projects?
Comment by Wolf — Thursday, 21 August 2008 @ 14:43
I use TPerlRegex in almost every project. What should be done to solve the problem of link the OBJ files into EXE file?
Comment by Miguel Henley — Thursday, 21 August 2008 @ 19:15
If you get stack overflows, you need to check your regular expressions for catastrophic backtracking. PCRE indeed has no safeguards against this.
Comment by Jan Goyvaerts — Thursday, 21 August 2008 @ 20:53
I still get a lot of warnings if I compile PerlRegEx.pas with D2009:
[DCC Warning] PerlRegEx.pas(367): W1057 Implicit string cast from ‘UTF8String’ to
‘string’
etc etc etc
Comment by Karel — Tuesday, 14 October 2008 @ 15:57
String warning suppresion:
367-370
‘L’, ‘l’: Backreference := PCREString(AnsiLowerCase(string(Backreference)));
‘U’, ‘u’: Backreference := PCREString(AnsiUpperCase(string(Backreference)));
‘F’, ‘f’: Backreference := PCREString(FirstCap(string(Backreference)));
‘I’, ‘i’: Backreference := PCREString(InitialCaps(string(Backreference)));
751
if (Limit = 1) or not Match then Strings.Add(string(Subject))
756
Strings.Add(Copy(string(Subject), Offset, MatchedExpressionOffset – Offset));
760
Strings.Add(Copy(string(Subject), Offset, MaxInt));
784
raise Exception.Create(‘TPerlRegEx.Study() – Error studying the regex: ‘ + Error);
Comment by Aleksandar Milanovic — Tuesday, 14 October 2008 @ 20:43
None of the warnings indicate problems with the code. You can suppress the warnings with explicit string casts, but that doesn’t change the actual compiled code.
The explicit cast that Aleksandar shows for lines 756 and 760 is wrong. The cast needs to go around the Copy() call, to cast the string returned by Copy(), instead of around the Subject parameter. The incorrect cast will work with ASCII text, but not with text where certain characters use more than one byte in UTF-8.
Comment by Jan Goyvaerts — Wednesday, 15 October 2008 @ 9:23
I compiled with Delphi 2009. After fixing J:JGsoft path errors it was compiled okay. But when I drop it on delphi form, it will always give “[DCC Fatal Error] F2084 Internal Error: L4077
Not working with delphi 2009.
Comment by Asif — Tuesday, 28 October 2008 @ 17:30
I don’t get any internal errors with Delphi 2009. I get them with earlier versions when using the OBJ files rather than the DLL. You can change that by editing the compiler directive in pcre.pas.
Internal errors are a bug in the Delphi compiler. You should report them to QualityCentral.
Comment by Jan Goyvaerts — Wednesday, 29 October 2008 @ 8:00
RE: I compiled with Delphi 2009. After fixing J:JGsoft path errors it was compiled okay. But when I drop it on delphi form, it will always give “[DCC Fatal Error] F2084 Internal Error: L4077
I did not have any problems with the install and use until after I installed the JCL/JVCL also. PCRE is also used with it, so I’m confidant there is an issue with having both.
Comment by Delphi Coder — Thursday, 30 October 2008 @ 8:21
The latest version of TPerlRegEx uses the pcre.pas file (with minor tweaks) and the OBJ files (unchanged) from the JCL project. Try deleting everything from TPerlRegEx except for PerlRegEx.pas itself, and link it to the files from the JCL.
Comment by Jan Goyvaerts — Thursday, 30 October 2008 @ 16:28
[DCC Fatal Error] F2084 Internal Error: L4077 is NOT incompatibilty issue between JCL and TPerlRegEx.
Internal errors are in most cases bugs in Delphi IDE itself.
I have both installed and I haven’t seen this error until now, becouse I make an instance of TPerlRegEx in runtime.
I tried to put a component on an empty form and to run it, and i had this error.
I added PerlRegEx1.Subject := ‘Test’; in Form Create, and put brak point at that line. I got mentioned error before that break point.
I added PerlRegEx1.Match; as next line, and there was no error. Everything worked. Break point worked too.
Strange…
Comment by Aleksandar Milanovic — Thursday, 13 November 2008 @ 7:17
Trere is a problem with lenght in expression.
If you have expression like this
regEx.RegEx := ‘(?<= |^).{0,35}(?= |$)’;
and string like this
regEx.Subject := ‘Način da se provede sedmina života.’;
First match will be
‘Način da se provede sedmina’
instead of
‘Način da se provede sedmina života.’
It work fine if I remove Serbian sepcific letters and put string like this:
‘Nacin da se provede sedmina zivota.’;
btw. I tested this with unmodified PerlRegEx.pas (without warning patches).
Comment by Aleksandar Milanovic — Monday, 24 November 2008 @ 22:49
How do I fix J:JGsoft path errors please?
Comment by Adam Sardi — Wednesday, 17 December 2008 @ 21:14
Aleksandar, it took me a while to figure out this is actually a bug in TPerlRegEx. It wasn’t running in UTF-8 (Unicode) mode if you didn’t set the Options property.
I also cleared out the compiler path settings in the packages, and eliminated the implicit string cast warnings.
You can use the same download link to get the latest version.
Comment by Jan Goyvaerts — Thursday, 18 December 2008 @ 16:42
according the comment #13, I was received the same result no more
Internal error L3170 using Delphi 7.0 Enterprise + SP1 (Build 8.1)
my programs does not require PCRE3.DLL, static link of all .obj files and more
Using this compiler, downloaded from http://www.codegear.com/downloads/free/cppbuilder
Borland C++ 5.5.1 for Win32 Copyright (c) 1993, 2000
and pcre\makefile.mak located from original package http://www.regular-expressions.info/download/TPerlRegEx.zip
a few modifications in makefile.mak
at line 66 search this –> .\pcre_ucp_searchfuncs.obj .\pcre_ucd.obj <–
and download of the latest pcre from ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-7.9.zip
of course you need to modify and pcre.pas
comment //{$LINK pcre\pcre_ucp_searchfuncs.obj}
and add {$LINK pcre\pcre_ucd.obj}
I have TPerlRegEx with support from pcre-7.9
if someone wants the package I’ll help or send through an email
Comment by Hristo Markow — Saturday, 9 May 2009 @ 19:49
Hristo emailed me his OBJ files, but I get the same internal error when using them.
The only known workaround is to NOT put TPerlRegEx into a package.
Comment by Jan Goyvaerts — Sunday, 10 May 2009 @ 15:50
How do I compile the newest PCRE (with C++ Builder) for static use with this component for delphi use?
The default inserts underscores for all functions, hence won’t work.
Disabling underscores makes it complain about memcpy and friends (without underscores).
I ask because the provided objs are seriously old (7.9 as compared to 8.11).
Comment by tadaa — Tuesday, 28 December 2010 @ 0:40
You are replying to a seriously old blog post too!
The PCRE OBJ files included with TPerlRegEx were taken from the JCL library. You can try using their latest OBJ files.
Since TPerlRegEx is now included with Delphi XE, I do not plan to continue updating it for newer versions of PCRE. If you upgrade to Delphi XE, simply replace PerlRegEx with RegularExpressionsCore in the uses clause of your units.
Comment by Jan Goyvaerts — Wednesday, 29 December 2010 @ 9:40
I’m using D2009, meet this issue too (compiling internal error), I’m using class-based component, but glad to found work solution from here
http://stackoverflow.com/questions/8278991/component-tperlregex-fatal-error-l3169
all you just need to add a few lines code to PerlRegEx.pas with this :
constructor TPerlRegEx.Create;
procedure UseFunction(P: Pointer);
begin
end;
begin
UseFunction(@pcre_exec); // if not used, D2009 will fail with internal compiler error
inherited Create;
….
Comment by Conan Doyle — Sunday, 22 September 2013 @ 9:51