Why can't find text between HTML tags?

Find, replace, find in files, replace in files, regular expressions

Why can't find text between HTML tags?

Postby fredtheman » Fri Aug 25, 2006 6:03 pm

Hello

I've read the online help and archives here, but I don't understand why UE can't run the following regex on the following text (I want to remove all text between the two tags):

</STRUCTURE>
BLABLA
BLABLA
<GROUP>

Find = </STRUCTURE>.+?<GROUP>
Replace =

I've tried the Unix style, and the PCRE style, with no difference. It doesn't seem like <, /, and > are forbidden characters. Any idea?

Thank you.
User avatar
fredtheman
Basic User
Basic User
 
Posts: 18
Joined: Sun Sep 05, 2004 11:00 pm

Re: Why can't find text between HTML tags?

Postby Bego » Sat Aug 26, 2006 12:10 pm

Hi,

if it was in ONE line, this works:
replace </STRUCTURE>.*<GROUP>
with </STRUCTURE><GROUP>

Perl regexp, UE 12.10a

rds Bego
User avatar
Bego
Master
Master
 
Posts: 358
Joined: Wed Nov 24, 2004 12:00 am
Location: Germany

Re: Why can't find text between HTML tags?

Postby Mofi » Sat Aug 26, 2006 12:29 pm

Why it does not work on multiple lines is explained in help of UE in the regular expressions article:

+ ... Matches the preceding character one or more times. Does not match repeated newlines.

. ... Matches any single character except a newline character. Does not match repeated newlines.

The following Unix style regular expression will also work on multiple lines. But be very careful when using it because the [^<]* expression which ignores line termination characters can lead to unexpected results because it selects sometimes MUCH more as you expect.

Find: </STRUCTURE>[^<]*<GROUP>

Best would be to use a macro with the commands Find, Find Select and Delete in a Loop. There are enough macros in the Macro forum where you can see this macro method of block deletion.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Why can't find text between HTML tags?

Postby fredtheman » Wed Jul 28, 2010 11:37 am

Thanks Mofi for the explanations.
User avatar
fredtheman
Basic User
Basic User
 
Posts: 18
Joined: Sun Sep 05, 2004 11:00 pm

Re: Why can't find text between HTML tags?

Postby fan.of.dilbert » Thu Aug 05, 2010 10:24 pm

Code: Select all
Find: (?s)</STRUCTURE>.+?<GROUP>
Replace: </STRUCTURE><GROUP>

or

Find: (?si)(</structure>)[^<].*?(<group>)
Replace: $1$2

The above Perl style regexp works on multi lines.

The inline modifier of (?s) processes the search as if it was one complete string so that . matches any char including new line.

The second example adds i modifer (ignore case) and the parenthesis around </structure> and <group> store the content in variable $1/$2 used in the replace. The other addition is the char class of [^<] ... e.g. not a "<" char, which prevents the regexp from matching </STRUCTURE>GROUP></STRUCTURE><GROUP> once all the replaces have been made.
User avatar
fan.of.dilbert
Newbie
 
Posts: 4
Joined: Sat Aug 08, 2009 2:28 pm
Location: USA


Return to Find/Replace/Regular Expressions