greedy [ ]+ ?

Find, replace, find in files, replace in files, regular expressions

greedy [ ]+ ?

Postby majingh » Thu Dec 27, 2007 6:49 pm

I have a problem like this:

Suppose I have this text: (the part between --------)

--------------------------------------------------------------
1 ENTRY --- one white space between "1" and "ENTRY"
2 TEST --- one white space between "2" and "TEST"
3 ENTRY --- two white spaces between "3" and "ENTRY"
4 TEST --- two white spaces between "4" and "TEST"
--------------------------------------------------------------

Perl style. Regular expression

[0-9][ \t]+[^E]

matched line 2, 3 and 4 but not line 1. I expect it to match only 2 and 4.

Why line 3 got matched?

For line 3, seems that ultraedit perl used [ \t]+ to match one white space, and used [^E] to match the 2nd white space.

Anyone know how to generate a perl style regular expression to match only line 2 and 4 but not line 1 and 3? That is, do not match lines with one or more blanks following the number [0-9], followed by "ENTRY"?


Thanks.
User avatar
majingh
Newbie
 
Posts: 4
Joined: Wed Dec 26, 2007 12:00 am

Re: greedy [ ]+ ?

Postby majingh » Thu Dec 27, 2007 6:56 pm

I found a solution:

Using the following regular expression will match only line 2 and 4:

[0-9][ \t]+[^ E]

Comparing to my original regular expression which matched line 2, 3 and 4:

[0-9][ \t]+[^E]

I added a space into ^E
User avatar
majingh
Newbie
 
Posts: 4
Joined: Wed Dec 26, 2007 12:00 am

Re: greedy [ ]+ ?

Postby pietzcker » Fri Dec 28, 2007 7:25 am

You're right, the original regex matches line 3 because [^E] matches the space character. Greedy does not mean that a regex part won't give up a (partial) match if the overall regex requires it.

A more robust way to circumvent this would be to use
Code: Select all
[0-9](?>[ \t]+)[^E]
as your regex. Now the space/tab bit is enclosed in a so-called atomic group that will not be backtracked into once it has matched successfully. In other words, it will "use up" the space/tab part of the match and not try to go back into this if the following character is not an E.
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm


Return to Find/Replace/Regular Expressions