Perl regex replace picky about surrounding characters?

Find, replace, find in files, replace in files, regular expressions

Perl regex replace picky about surrounding characters?

Postby tsmith35 » Wed Nov 14, 2007 1:44 am

I was doing some Perl regex replacement tonight and noticed something odd. Perhaps I've missed something here...

I start with a file containing the following on one line:

Joe Goodman

I do a replace using Perl regex:
Find what: (\w*)\s(\w*)
Replace with: \2 \1

The result is Goodman Joe as expected.

Now I undo, and try this replacement using Perl regex:
Find what: (\w*)\s(\w*)
Replace with: "\2" "\1"

The result is "Joe" "Goodman""" """" "", not expected.

So I undo and try this replacement using Perl regex:
Find what: (\w*)\s(\w*)
Replace with: x\2x x\1x

The result is xGoodmanx xJoex, again as expected.

Now I undo and try this one:
Find what: (\w*)\s(\w*)
Replace with: (\2) (\1)

The result is (Goodman) (Joe)() ()() (), not expected. Perhaps the leading and trailing parenthesis for each replacement need to be escaped?

So I undo and try this one:
Find what: (\w*)\s(\w*)
Replace with: \(\2\) \(\1\)

The result is (Goodman) (Joe)() ()() (), not expected.

What am I doing wrong? I'm currently using UE 13.20+2. Any suggestions are welcome.

Thanks,
Tom
User avatar
tsmith35
Basic User
Basic User
 
Posts: 15
Joined: Mon Mar 28, 2005 12:00 am
Location: Music City, USA

Re: Perl regex replace picky about surrounding characters?

Postby jorrasdk » Wed Nov 14, 2007 7:58 am

You do nothing wrong as far as I see. I can confirm the error. Please report it to IDM support (e-mail address at the top of this page).

In the meantime, switch to the legacy Unix regular expression engine. It will handle your expressions above as expected.
User avatar
jorrasdk
Master
Master
 
Posts: 275
Joined: Mon Mar 19, 2007 11:00 pm
Location: Denmark

Re: Perl regex replace picky about surrounding characters?

Postby tsmith35 » Thu Nov 15, 2007 1:45 am

jorrasdk, thanks for checking this out. I'll report it as a bug and switch to plain Unix regex as you suggested.

Tom
User avatar
tsmith35
Basic User
Basic User
 
Posts: 15
Joined: Mon Mar 28, 2005 12:00 am
Location: Music City, USA

Re: Perl regex replace picky about surrounding characters?

Postby tsmith35 » Fri Nov 16, 2007 9:58 pm

jorrasdk, the folks at IDM pointed me to the root cause of the issue: in Perl compatible regex, \s doesn't just comprise normal whitespace -- it also includes CR and LF characters. By changing my search from:

(\w*)\s(\w*)
(any number of word characters)(Perl whitespace)(any number of word characters)

to

(\w+)\s(\w+)
(one or more word characters)(Perl whitespace)(one or more word characters)

the Replace All search works as intended. I could have also specified tabs and spaces instead of \s, but either way works fine.

Just wanted to post this in case it is helpful to anyone else.

Tom
User avatar
tsmith35
Basic User
Basic User
 
Posts: 15
Joined: Mon Mar 28, 2005 12:00 am
Location: Music City, USA

Re: Perl regex replace picky about surrounding characters?

Postby mjcarman » Wed Nov 21, 2007 6:22 pm

Curious. I'm running UE v13.20+2 and do not get different results depending on what literal text I use in the replacement. When my replacement text is "x\2x x\1x"I get

xGoodmanx xJoexxx xxxx xx

which is consistent, and what one should expect (for a DOS file with CRLF line endings).
User avatar
mjcarman
Power User
Power User
 
Posts: 124
Joined: Thu Feb 10, 2005 12:00 am

Re: Perl regex replace picky about surrounding characters?

Postby Jane » Wed Nov 21, 2007 7:50 pm

One thing that seems to be overlooked is that (\w*)\s(\w*) essentially says match a space with optional trailing and\or leading \w character. In other words it will match a naked space OR it will match a space surrounded by anything else.
For example:

Joe Goodman & @

Search for (\w*)\s(\w*)
Replace with "\2" "\1"

Replace all gives:
"Goodman" "Joe""" ""&"" ""@

The first search selects Joe Goodman
and does the expected replace to give

"Goodman" "Joe" & @
The regex next matches " &
- Note neither of the adjacent characters is a \w but that doesn't matter, they are optional.
The replace continues through the string until it runs out of spaces.

Or, try a simply the string & @ and it will match,
or even a single space on a line and again it will match.
However, if you use (\w+)\s(\w+) these bad matches no longer occur because you insist on at least one \w leading and following.
I think the \s matching \r\n is a bit of a red herring.
The watch out is to remember that using * as a quantifier means match anything or nothing either is OK.

Cheers,
Jane
User avatar
Jane
Basic User
Basic User
 
Posts: 22
Joined: Sat Aug 05, 2006 11:00 pm
Location: Canada

Re: Perl regex replace picky about surrounding characters?

Postby mjcarman » Wed Nov 21, 2007 8:13 pm

"Goodman" "Joe" & @
The regex next matches " &

With apologies for pedantry, you mean to say that the next match occurs at the space character between the " and & characters. The characters themselves are not included in the match.

I agree with you that the real problem is the use of the * (zero-or-more) quantifier, not that \s matches newlines. The behavior of \s just exposed the error in the find expression.
User avatar
mjcarman
Power User
Power User
 
Posts: 124
Joined: Thu Feb 10, 2005 12:00 am

Re: Perl regex replace picky about surrounding characters?

Postby Jane » Wed Nov 21, 2007 9:31 pm

Well spotted (Mr. pedantic :wink: ). Actually I had previewed/submitted the post and went back and had changed
The regex next matches to The regex next matches at
as well as a couple of other things, but when I went to submit I had timed out (it took me a while to pound that in and make sure all the quotes were right), and my system was hanging so I left it.
One further thought on the \s would be to just use [ \t] which is sometimes called "horizontal whitespace" (GNU extension) rather that \s which means [ \t\r\n\f].
I think [ \t] is also equivalent to [[:blank]] as a character class in the UltraEdit Perl regex implementation..

Cheers,
Jane
User avatar
Jane
Basic User
Basic User
 
Posts: 22
Joined: Sat Aug 05, 2006 11:00 pm
Location: Canada


Return to Find/Replace/Regular Expressions

cron