Replacing xml codes within a block of text

Help with writing and playing macros

Replacing xml codes within a block of text

Postby Nancy26 » Tue Oct 23, 2012 9:09 am

How do I select a block of text and do replaces on the xml only within the block.
within the <contrib-group> I need to delete <name> replace <surname> with <SN>; </surname> with </SN>; <given-names> with <FN>; </given-names> with </FN>; <degrees> with <DEG>; </degrees> with </DEG>.
These tags are also in another place in my file but I need to name them something else. I'm really new to this stuff and any help would be much appreciated.

Example of my data

<title-group>
<article-title>Neoadjuvant Accelerated Concomitant Boost Radiotherapy and Multidrug Chemotherapy in Locally Advanced Rectal Cancer</article-title>
<subtitle>A Dose-Escalation Study</subtitle>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Caravatta</surname><given-names>Luciana</given-names>
</name><degrees>MD</degrees>
<xref ref-type="aff" rid="aff1">&#x002A;</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Picardi</surname><given-names>Vincenzo</given-names>
</name><degrees>MD</degrees>
<xref ref-type="aff" rid="aff1">&#x002A;</xref>
</contrib>
</contrib-group>
<aff id="aff1">Departments of <label>&#x002A;</label>Radiation Oncology</aff>
<aff id="aff2"><label>&#x2020;</label>Palliative Therapies</aff>
Nancy26
Newbie
 
Posts: 2
Joined: Tue Oct 23, 2012 8:55 am

Re: Replacing xml codes within a block of text

Postby Mofi » Tue Oct 23, 2012 12:19 pm

There are several methods to make the replaces as you need.

The first one does not make any selections to run the replaces only on selected text. Instead it uses a tagged regular expression with UltraEdit engine to simple search for blocks

Code: Select all
<contrib contrib-type="author">
<name><surname>Caravatta</surname><given-names>Luciana</given-names>
</name><degrees>MD</degrees>

and reformats such blocks to

Code: Select all
<contrib contrib-type="author">
<SN>Caravatta</SN><FN>Luciana</FN>
<DEG>MD</DEG>

The macro is:

Code: Select all
InsertMode
ColumnModeOff
HexOff
Top
UltraEditReOn
Find MatchCase RegExp "^(<contrib contrib-type=*^p^)<name><surname>^(*^)</surname><given-names>^(*^)</given-names>*^p</name><degrees>^(*^)</degrees>"
Replace All "^1<SN>^2</SN><FN>^3</FN>^p<DEG>^4</DEG>"

This is the fastest method, but works only if all your blocks in the XML file look like your example including the whitespace characters - no spaces/tabs at beginning of lines, no trailing spaces/tabs, DOS line terminators used as posted here. If small variations exist, the regular expression could be modified to match all of them.



The second method runs a loop selecting always everything from <contrib-group> to </contrib-group> using the Find Select feature (holding Shift key while clicking on button Find Next when doing it manually). A shorter UltraEdit tagged regular expression Replace All than above is used to reformat the tags within the selection. The result for your example is the same, but UltraEdit needs longer to finish.

The macro is:

Code: Select all
InsertMode
ColumnModeOff
HexOff
Top
UltraEditReOn
Loop 0
Find MatchCase "<contrib-group>"
IfNotFound
ExitLoop
EndIf
StartSelect
Find MatchCase Select "</contrib-group>"
Find MatchCase RegExp SelectText "<name><surname>^(*^)</surname><given-names>^(*^)</given-names>*^p</name><degrees>^(*^)</degrees>"
Replace All "<SN>^1</SN><FN>^2</FN>^p<DEG>^3</DEG>"
EndSelect
Key HOME
EndLoop
Top



The third method runs also a loop selecting always everything from <contrib-group> to </contrib-group>, but using the Perl regular expression engine with advanced option to let the dot also match new line characters. 4 Perl tagged regular expression Replace All are executed on every selection to reformat the 2 lines to the wanted output.

This method does not depend on XML structure. But with making the selection just once although every replace modifies the selection (deleting characters), it could fail to make all replaces correct. I have watched in the past that UltraEdit could not always re-apply the selection after a Replace All in the selection correct and in such cases it was necessary to reselect the block after every replace by appropriate commands again. But for your example it worked with just 1 find for selecting and 4 replaces within selection for reformatting.

The macro is:

Code: Select all
InsertMode
ColumnModeOff
HexOff
Top
PerlReOn
Loop 0
Find MatchCase RegExp "(?s)<contrib-group>.*?</contrib-group>"
IfNotFound
ExitLoop
EndIf
Find MatchCase RegExp SelectText "</*name>"
Replace All ""
Find MatchCase RegExp SelectText "(</*)surname>"
Replace All "\1SN>"
Find MatchCase RegExp SelectText "(</*)given-names>"
Replace All "\1FN>"
Find MatchCase RegExp SelectText "(</*)degrees>"
Replace All "\1DEG>"
CancelSelect
EndLoop
Top
surname
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4049
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Replacing xml codes within a block of text

Postby Nancy26 » Wed Oct 24, 2012 6:36 am

Mofi,
Thank you so much :P
I used the second method and it worked.
Nancy26
Newbie
 
Posts: 2
Joined: Tue Oct 23, 2012 8:55 am


Return to Macros