Replace fails when using lookahead and lookbehind

Find, replace, find in files, replace in files, regular expressions

Replace fails when using lookahead and lookbehind

Postby petrodubloseven » Wed Aug 22, 2007 9:32 pm

I am using ultraedit 13.10+1 and attempting to use a perl style regex to find spaces between a word and a number, and replace the found text with a delimiter. In the file example below I would expect to find the whitespace between the word preceding the date and the date on each line.

# is a place holder for a tab character.

Preliminary Project Documents Created#6/22/07
Preliminary Project Documents Sent to Client#6/29/07
Conduct Technical Planning Meeting#5/18/07
Customer Orders Hardware#6/12/07

This regex finds and highlights the correct whitespace on each line.

(?<=\w)[\s](?=\d)

When I try to replace the whitespace with "XX" nothing happens. The text is found, I press replace, and the find goes to the next instance. The XX is not written to the file.

Any idea's.

Thanks,

Pete.
User avatar
petrodubloseven
Newbie
 
Posts: 2
Joined: Tue Aug 21, 2007 11:00 pm

Re: Replace fails when using lookahead and lookbehind

Postby Bego » Thu Aug 23, 2007 5:37 am

Hi Pete

try this:
replace
Code: Select all
([A-Za-z])\s*([0-9])

with
Code: Select all
\1XX\2

rds Bego
User avatar
Bego
Master
Master
 
Posts: 357
Joined: Wed Nov 24, 2004 12:00 am
Location: Germany

Re: Replace fails when using lookahead and lookbehind

Postby pietzcker » Thu Aug 23, 2007 5:52 am

Hi Pete,

this is a known bug in UE'S Perl regex engine. Positive lookaround is broken - searches work, replaces don't (funnily enough, the replace dialog tells you that it did perform n replaces and also marks the file as changed, but it doesn't actually do anything... negative lookaround works fine, by the way.

I have written to IDM support several times about this ; they have been confirming the problem each time and said they'd have their technicians look into it. Maybe it'll get boosted on the list of priorities if you send them a mail at support@idmcomp.com - I'd really appreciate it.

As a workaround, and since negative lookaround does work, the following regex works on your sample data; make sure, though, that it won't produce unwanted matches with your actual data:

Code: Select all
(?!<\W) (?!\D)


HTH,
Tim

edit: Hi Bego, you were faster than me; your regex will work too (but slower), and the * should probably be replaced by a + or else it will also replace "B2B" by "BXX2B"...
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Replace fails when using lookahead and lookbehind

Postby Bego » Thu Aug 23, 2007 7:32 am

Hi Tim,

correct, so the "easy" non-lookaround string looks better like this:
Code: Select all
([A-Za-z])\s+([0-9])


rds Bego
User avatar
Bego
Master
Master
 
Posts: 357
Joined: Wed Nov 24, 2004 12:00 am
Location: Germany

Re: Replace fails when using lookahead and lookbehind

Postby pietzcker » Thu Aug 23, 2007 9:24 am

You mean \s+ :)

And (if you're using Perl regexes) the replacement string should be \1XX\2 (I don't know the UE/Unix styles).
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Replace fails when using lookahead and lookbehind

Postby Bego » Thu Aug 23, 2007 1:10 pm

Oh boy, I shouldn't do 2 things at one time... only women can do this (they say) ;-)

corrected it above.
User avatar
Bego
Master
Master
 
Posts: 357
Joined: Wed Nov 24, 2004 12:00 am
Location: Germany

Re: Replace fails when using lookahead and lookbehind

Postby petrodubloseven » Thu Aug 23, 2007 2:46 pm

Thanks for the replies and the alternatives. Sometimes I get focused on a solution that doesn't work when I should look for alternatives.

Thanks to the mod who fixed my spelling as well.

Pete.
User avatar
petrodubloseven
Newbie
 
Posts: 2
Joined: Tue Aug 21, 2007 11:00 pm

Re: Replace fails when using lookahead and lookbehind

Postby pietzcker » Tue Apr 29, 2008 2:06 am

Good news: In 14.00a+2, positive lookaround has been fixed. This version isn't yet available for download (April 29th) but surely will be soon. That's a great leap forward for Perl regular expressions and will speed up complex regex operations a lot. Great work, IDM! So keep checking for new hotfixes :)
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Replace fails when using lookahead and lookbehind

Postby alanb » Tue May 27, 2008 9:44 am

Are you sure that this has been fixed?

In UEdit 14.00b, with the following text snippet (newline after the "---"):

Code: Select all
---
avast! Antivirus: Inbound message clean.


the Perl regexp:

(avast! Antivirus)(?<!---\r\n)

succeeds, but:

(avast! Antivirus)(?<=---\r\n)

fails.

Does that not mean that lookbehind is still broken?


Alan
User avatar
alanb
Newbie
 
Posts: 8
Joined: Tue Sep 13, 2005 11:00 pm
Location: Essex, England

Re: Replace fails when using lookahead and lookbehind

Postby pietzcker » Tue May 27, 2008 2:41 pm

Wait a second, your regex is wrong - the lookbehind should be at the beginning of the regex. But even with the correct regex, UE doesn't match correctly.

That's a more general problem, though: UE's regex engine is line-based. This leads to lookbehind not working beyond line breaks, and to greedy quantifiers losing their greediness if a match is possible on the current line (but the correct match would be beyond a linebreak). So in most daily use cases, lookaround works, but there are some limitations. I had been hoping for better regex support for a long time (not only for search/replace, but for syntax highlighting, code folding etc.), but have found that most users don't seem to care enough about this for IDM to put this high on their to-do list. If you need really good regex support, try EditPadPro.
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm

Re: Replace fails when using lookahead and lookbehind

Postby alanb » Wed May 28, 2008 8:40 am

Hi Tim,
my bad copy-and-paste; the lookbehind does come first in the original macro (the snippet is part of a large macro to clean up mbox format emails).

I'm sorry to hear the confirmation that lookaround is still broken. I recently finished a long dialogue with IDM support (just prior to the release of 14.00a) on the performance of UltraEdit Perl RegExps and I thought my problems were over.

Oh well, I'll just have to email Troy again...

Thanks for the input.

Alan
User avatar
alanb
Newbie
 
Posts: 8
Joined: Tue Sep 13, 2005 11:00 pm
Location: Essex, England

Re: Replace fails when using lookahead and lookbehind

Postby pietzcker » Wed May 28, 2008 9:06 am

Well, it's not exactly lookaround that's broken. Since the entire regex engine is line-based, regexes that involve multiple lines can get risky. Mostly, it's "corner cases", but every now and then, you get unexpected/incorrect results. Primary reason for me to switch to EPP, most other users don't seem to mind...
User avatar
pietzcker
Master
Master
 
Posts: 241
Joined: Sun Aug 22, 2004 11:00 pm


Return to Find/Replace/Regular Expressions