Problem with OR expression with using Unix engine

Find, replace, find in files, replace in files, regular expressions

Problem with OR expression with using Unix engine

Postby JGLord » Thu Apr 14, 2011 11:10 am

Hi,

I have a problem with a Find/Regular Expression.

Version of UltraEdit: v15.00.0.1043
Regular expression engine: Unix

In a text file I need to look for a specific line where a part of this line match my regular expression.

Here is my regular expression:
Code: Select all
;2011(03|04)[0-9]*;[0-9]*;.*;.*;[0-9]*;SIG;ORL;(PDR|RRS);

My regular expression work well for this example:
166004;000064168098 - 000064168098;571277388;20;000064168098;20110301133913;20110302101021;MAJ DE DONNEES 2011-03-02CG;BJ4768;20110405080225;SIG;ORL;PDR;M8;RNC 266;1;RF;2;T

But doesn't work in this one:
166004;000064168098 - 000064168098;571277388;20;000064168098;20110301133913;20110302101021;MAJ DE DONNEES 2011-03-02CG;BJ4768;20110405080225;SIG;ORL;RRS;M8;RNC 266;1;RF;2;T

It's seems the OR expression "(PDR|RRS)" dont work well for me, at least when the string "RRS" is used... Any idea?

Thank you!
User avatar
JGLord
Newbie
 
Posts: 1
Joined: Thu Apr 14, 2011 10:23 am
Location: Quebec, Canada

Re: Problem with OR expression with using Unix engine

Postby Mofi » Fri Apr 15, 2011 12:29 am

You raised here an interesting problem with the Unix regular expression engine. Also in UE v17.00 your search string fails. But using the Perl compatible regular expression with the same search string works. I played with search strings to find out the reason and found something interesting.

Search string ;.*; is non greedy which means it matches only 1 data in your CSV file with the surrounding semicolons, in other words as less as possible to return a true result. But using ;.*;SIG makes the expression greedy. This search string matches now everything from first semicolon in a line up to word SIG, in other words as much as possible to return a true result.

And that unexpected greedy behavior lets the OR expression at the end fail. Possible workarounds are using the Perl compatible regular expression with your string where .* is always non greedy, or the UltraEdit regular expression engine where [color]*[/color] is available for non greedy and ?+ for greedy search strings, or using following search string for the Unix engine:

;20110[34][0-9]*;[0-9]*;[^;]*;[^;]*;[0-9]*;SIG;ORL;(PDR|RRS);

As you can see I have replaced .* by [^;]* to get the expression non greedy. 0[34] is just optimized for (03|04) which would also work.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Find/Replace/Regular Expressions