The problem is that there is no way to make a regex count nested tags. You can't find the matching </div> tag if there is an unspecified number of <div>/</div> tags in-between, unless you know some other way to distinguish the tags (like the level of indentation). If the opening and closing tag are indented the same way, then the following regex could work:
(?s)^([ \t]*)<div>.*?^\1</div>(?s) switches the regex engine to "dot matches all" mode.
^([ \t]*) looks for any whitespace before the first
<div> tag it encounters (which must be the first non-whitespace on the line) and remembers it in backreference no. 1.
.*?^\1 then matches as much as it has to until the next occurence of a line that contains a
</div> tag at the same indentation level (exact same sequence of spaces and/or tabs!) as before.
(In UE 14.00b, the .*? can be written as .* (which makes the regex faster) because of a "bug" in the regex engine. Since IDM might correct that bug some day, I wouldn't do so unless performance is an issue.)
This regex will malfunction in certain conditions. E. g., it will match the following in its entirety because it contains a <div>/</div> pair on the same line:
- Code: Select all
<div>remove</div>
don't remove!
<div>
remove
</div>
If you can't be sure of your indentation levels, you could use UE's reindentation feature before applying this regex.
This solution - if it were applicable - would certainly be the easiest. However, I wouldn't bet my life on it always matching corresponding tags (see above).
A safer solution (like you proposed in your previous post) would be:
(?s)<div>(?:(?!<div>).)*?</div>This will match any <div>/</div> pair that doesn't contain a <div> within, regardless of whether there are line breaks in-between. Of course, you will have to apply this regex over and over again until UE won't find any more matches.