Problem with Perl Regex engine?

Find, replace, find in files, replace in files, regular expressions

Problem with Perl Regex engine?

Postby sklad2 » Thu Jul 31, 2008 10:32 am

I am reading a tutorial , following examples in regular expressions and in a Perl Example they show that

Code: Select all
[^\d\s]
is not the same as
Code: Select all
[\D\S]
however in UE 14.10.0.1024 I see the latter matches exactly what the first expression matches and not what I expected it to match.

Example
8x8
The first regex should match the x and indeed it does. However the second regex should match the first 8 and it does not. It again matches the x. Am I correct that this is a problem in the regex engine for Perl ?
User avatar
sklad2
Advanced User
Advanced User
 
Posts: 58
Joined: Thu Mar 08, 2007 12:00 am

Re: Problem with Perl Regex engine?

Postby pietzcker » Thu Jul 31, 2008 10:50 am

Seems you're right. Wow, that's a major blunder. I guess you should file a bug report.
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Problem with Perl Regex engine?

Postby mjcarman » Thu Jul 31, 2008 11:59 am

Interesting. I wonder if it's a UE problem or a bug in the Boost library. I suspect the latter.
User avatar
mjcarman
Power User
Power User
 
Posts: 123
Joined: Thu Feb 10, 2005 12:00 am

Re: Problem with Perl Regex engine?

Postby Bego » Thu Jul 31, 2008 12:12 pm

Puuuh. That one is my heaviest problem with UE (or any underlying dll, whatever):
From version to version, fix to fix, something in the regexp stuff if not working correctly anymore. Bugs reappear and so on.
I can understand Tims change to E......d Pro very well...
User avatar
Bego
Master
Master
 
Posts: 357
Joined: Wed Nov 24, 2004 12:00 am
Location: Germany

Re: Problem with Perl Regex engine?

Postby pietzcker » Thu Jul 31, 2008 12:51 pm

I'm pretty sure that it's a bug in the Boost library. I can think of no other way. In my opinion, IDM should seriously start to reconsider their business relationship with Boost. They probably won't get JGSoft's engine. Maybe PCRE?
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Problem with Perl Regex engine?

Postby sklad2 » Thu Jul 31, 2008 1:21 pm

Here is another Regex issue

match the b this should match
Code: Select all
(q?)b\1
this should not
Code: Select all
(q)?b\1
however they both match!

I know which regex engine I like so far. Now I am becoming a bit concerned about regex matching with UE.

Maybe UE could run a few test regex scripts through and check the results.
User avatar
sklad2
Advanced User
Advanced User
 
Posts: 58
Joined: Thu Mar 08, 2007 12:00 am

Re: Problem with Perl Regex engine?

Postby pietzcker » Thu Jul 31, 2008 1:52 pm

I must say that I'm impressed at the rate that you're uncovering bugs in the regex engine. To be fair, this last example is probably a "corner case" that won't matter to many people. And the biggest show-stopper (inability to use positive lookaround) has been fixed in V14, so for most everyday work UE's regex engine is OK. But there are quite a few inconsistencies like the behaviour of greedy quantifiers when the potential match crosses a newline, or the engine skipping matches because the "transmission" bumps along too far and moves beyond the next correct match. And now these new bugs. Sad.
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Problem with Perl Regex engine?

Postby sklad2 » Thu Jul 31, 2008 2:18 pm

Amazingly I am reading a tutorial on regex and Perl expressions and examples and I am trying to understand how they work and why. The good news is I am understanding most 8O , ok some of it. :D It is taking many tries with tests for me, and I keep my examples and expected outputs in files. All of the help from people like you has really excited me about learning this.
User avatar
sklad2
Advanced User
Advanced User
 
Posts: 58
Joined: Thu Mar 08, 2007 12:00 am

Re: Problem with Perl Regex engine?

Postby TLis » Sun Sep 14, 2008 2:38 am

sklad2 wrote:I am reading a tutorial , following examples in regular expressions and in a Perl Example they show that

Code: Select all
[^\d\s]
is not the same as
Code: Select all
[\D\S]
however in UE 14.10.0.1024 I see the latter matches exactly what the first expression matches and not what I expected it to match.

Example
8x8
The first regex should match the x and indeed it does. However the second regex should match the first 8 and it does not. It again matches the x. Am I correct that this is a problem in the regex engine for Perl ?

Am I correct, that in the [...] syntax (character class), special escaped characters loose their meaning, so effectively this character class accepts anything but a backslash and characters 'd' or 's' ('D' or 'S' correspondingly in the second regex)?

As a result, both regexes would match the 'x'. I would be surprised if this character class matched first the '8' (the digit) and then the 'x'. A character class, as far as I know, is supposed to match a single character.
User avatar
TLis
Newbie
 
Posts: 4
Joined: Tue Jun 12, 2007 11:00 pm
Location: Szczecin, Poland

Re: Problem with Perl Regex engine?

Postby pietzcker » Sun Sep 14, 2008 4:15 am

Nope. Inside a character class, escaped characters don't lose their meaning - [\r\n] means "match a CR or an LF". However, most special characters (outside of character classes) have a different meaning inside character classes (e.g., ^, -, [, ], (, ) etc.). A character class does match a single character.

So [^\d\s] means "Match any character that is neither a digit nor a whitespace character". [\D\S], however, means "Match any character that is (not a digit) or (not a whitespace)" which is true for every single character imaginable - "8" is not a whitespace character, so it matches. " " is not a digit, so it matches, too. "x" is neither, so it matches, too. So UE's regex engine is wrong when it matches "x" but not "8".
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Problem with Perl Regex engine?

Postby TLis » Sun Sep 21, 2008 8:33 am

I have submitted an error report to support about the [\D\S] issue and they confirmed it being a bug in the Boost libraries. They have told me that they need now to submit a bug report to Boost library maker to get it fixed.
User avatar
TLis
Newbie
 
Posts: 4
Joined: Tue Jun 12, 2007 11:00 pm
Location: Szczecin, Poland


Return to Find/Replace/Regular Expressions