Delete part of a lines duplicate content

Help with writing and playing macros

Delete part of a lines duplicate content

Postby mightymax » Thu Nov 01, 2007 7:15 pm

Hi all,

I've been trying to remove duplicate data from a file. What I need is for the data on the left side to only show one occurance, and list itself below any of the data on the right side it originally matched. My before and after examples probably makes more sense. I tried modifying DelDupLineInfo- but just keep hitting a wall.

Any help is appreciated.
Thanks, Max


Here is what my data looks like:

Code: Select all
G10002-XXX-01785-REV-IR.xml    <?FRAME ID='50' TITLE='xx' TOCLEVEL='1'>
G10003-XXX-01785-REV-IR.xml    <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
G10003-XXX-01785-REV-IR.xml    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
G10003-XXX-01785-REV-IR.xml    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
G10008-XXX-01785-REV-IR.xml    <?FRAME ID='100' TITLE='xxxxxxxx' TOCLEVEL='1'>
G10009-XXX-01785-REV-IR.xml    <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
G10004-XXX-01785-REV-IR.xml    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
G10004-XXX-01785-REV-IR.xml    <?FRAME ID='250' TITLE='dveg xxx ts' TOCLEVEL='1'>
G10004-XXX-01785-REV-IR.xml    <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
G10001-XXX-01785-REV-IR.xml    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
G10001-XXX-01785-REV-IR.xml    <?FRAME ID='300' TITLE='wwwwwwwwwwww' TOCLEVEL='2'>
G10001-XXX-01785-REV-IR.xml    <?FRAME ID='350' TITLE='draft' TOCLEVEL='2'>


This is what I'm trying to get:
Code: Select all
<?FRAME ID='50' TITLE='xx' TOCLEVEL='1'>
G10002-XXX-01785-REV-IR.xml   
<?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
<?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
G10003-XXX-01785-REV-IR.xml   
<?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
G10008-XXX-01785-REV-IR.xml   
<?FRAME ID='100' TITLE='xxxxxxxx' TOCLEVEL='1'>
G10009-XXX-01785-REV-IR.xml   
<?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
<?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
<?FRAME ID='250' TITLE='dveg xxx ts' TOCLEVEL='1'>
G10004-XXX-01785-REV-IR.xml   
<?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
<?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
<?FRAME ID='300' TITLE='wwwwwwwwwwww' TOCLEVEL='2'>
<?FRAME ID='350' TITLE='draft' TOCLEVEL='2'>
G10001-XXX-01785-REV-IR.xml
User avatar
mightymax
Basic User
Basic User
 
Posts: 30
Joined: Tue Jul 27, 2004 11:00 pm
Location: San Diego, CA

Re: Delete part of a lines duplicate content

Postby mightymax » Thu Nov 01, 2007 8:12 pm

I'm coming a little further along. Below is the macro I wrote to check for a duplicate and if found paste DUPLICATE at the beginning. I figure this will at least give me a marker to delete off of. But I'm still having troubles with my loop. Currently it only works once.

Code: Select all
InsertMode
ColumnModeOff
HexOff
UnixReOff
Loop
Find RegExp "%[A-Z]"
StartSelect
Find Select ".xml"
Copy
EndSelect
Key HOME
Key DOWN ARROW
Find MatchCase "^c"
IfFound
Key HOME
"DUPLICATE"
IfNotFound
ExitLoop
EndIf
User avatar
mightymax
Basic User
Basic User
 
Posts: 30
Joined: Tue Jul 27, 2004 11:00 pm
Location: San Diego, CA

Re: Delete part of a lines duplicate content

Postby Mofi » Fri Nov 02, 2007 7:44 am

The macro below produces following result:

Code: Select all
<?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
<?FRAME ID='300' TITLE='wwwwwwwwwwww' TOCLEVEL='2'>
<?FRAME ID='350' TITLE='draft' TOCLEVEL='2'>
G10001-XXX-01785-REV-IR.xml
<?FRAME ID='50' TITLE='xx' TOCLEVEL='1'>
G10002-XXX-01785-REV-IR.xml
<?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
<?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
G10003-XXX-01785-REV-IR.xml
<?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
<?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
<?FRAME ID='250' TITLE='dveg xxx ts' TOCLEVEL='1'>
G10004-XXX-01785-REV-IR.xml
<?FRAME ID='100' TITLE='xxxxxxxx' TOCLEVEL='1'>
G10008-XXX-01785-REV-IR.xml
<?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
G10009-XXX-01785-REV-IR.xml


As you can see it is nearly what you want. The difference is that the "G*.xml" lines are sorted before reformatting which is necessary for the macro below and as a result of this sort the output is sorted also according to the XML file names. 100% identical lines are also removed by the sort before reformatting the content.

The macro property Continue if a Find with Replace not found or Continue if search string not found must be checked for this macro.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
IfColNumGt 1
InsertLine
EndIf
Top
TrimTrailingSpaces
SortAsc RemoveDup 1 -1 0 0 0 0 0 0
Find RegExp "%^(G*.xml^)[ ^t]++^(<*^)$"
Replace All "^2#|#^1"
Loop
Find RegExp "#|#*$"
IfNotFound
ExitLoop
EndIf
Cut
Find "^c"
Replace All ""
Find "#|#"
IfFound
Key HOME
Else
Bottom
EndIf
Paste
"
"
Key UP ARROW
Delete
Delete
Delete
EndLoop
Top

Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4039
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Macros