Delete data from one file that exists in another
Here's the scenario. File A contains a list of strings; one per line. File B is a larger "master" list. You need to quickly delete all lines from your "master list" in File B that exist in File A. You know this has to be possible with UltraEdit...but what's the best way of getting there?
This is a request that our support team receives frequently. Let's take a look at how we can do this with a quick macro.
Step 1: Break it down logically into simple components
Any time you're writing a macro, it's best to start by breaking things down to the most basic level possible. Writing a macro to accomplish something complex may seem like an overwhelming and time-consuming task, but one will often find that, once broken down into simpler parts, it is actually quite simple!
So, without worrying about specific macro commands just yet, let's think about how this should work. Assuming File A will be the active file at the time of macro execution, this is what needs to get done:
- Select line in File A
- Copy selection in File A
- Switch to File B
- Search for copied text in File B
- If copied text is found, delete found line
- Go back to File A
That really doesn't look too overwhelming, does it?
Step 2: Begin writing macro commands
Now that we know what we need to do, it's time to start writing specific macro commands. Since we want to check every line in File A against File B, we want to start at the very top of File A. Keeping that in mind, here are our simple components transposed into working macro commands:
Replace All ""
(The macro commands are exhaustively documented in the Help documentation.)
This is good, but this will only run once. There are still a couple of things we need to do to make this macro truly great:
- Loop for every line in File A, and
- Always start each Find/Replace at the top of File B, and
- Exit/end the loop and macro when the end of File A is reached.
Step 3: Properly loop the macro
When you're looping a macro, you want to identify the following components of the loop:
- What commands should be looped
- What condition should be met to end the loop
Identifying the above makes it obvious where to place your (starting) "Loop 0" and (ending) "ExitLoop" commands within the macro. It will also reveal whether or not additional commands are needed to check the "ExitLoop" condition. We want to loop everything except the "Top" command for File A, and we want to exit the loop only when we reach the end of File A. So we'll implement our loop logic in the following manner:
Replace All ""
*Note: Because "SelectLine" includes the selected line's terminator (new line character) as part of the selection, this causes the caret to reposition to the beginning of the next line. Therefore, there is no need to use a "Key DOWN ARROW" command to go to the next line in the file. However, if you were using some other method of selection instead of "SelectLine", and this method did not include the line terminator, you would need to use "Key DOWN ARROW" to avoid an infinite loop.
Step 4: Accommodating the last line in File A
You may find that this macro doesn't properly accommodate the last line of File A. That's because the last line of File A should be a blank, empty line. In other words, the last line with real data in File A must also have a line terminator. To accommodate this, we need to ensure the last line in File A is empty, and if it isn't, we need to add an empty line.
Replace All ""
That's it! We now have a macro which accomplishes exactly what we want it do: delete from File B all strings which exist in File A. It is important to note that when playing this macro, File A must be the active file, while File B should be the very next file in the file tab order.
Can you think of any ways to make this macro even better? Please share in the comments!
Update: Feedback from a power user
One of our users, Mofi (who has probably helped some of you in the forum), has sent us another macro for this task which takes into account nonstandard configurations and file contents. He has also kindly commented his macro with explanations of the commands. All lines starting with "//" are comments and must be removed before the macro code can be copied into the Edit/Create Macro dialog.
Here is Mofi's much improved macro:
// Copy content of File A to clipboard 9. A macro should never destroy
// content of Windows clipboard which most often used by the users.
// Disable selection mode and move to top of file to discard the selection
// in active File A. That is not really necessary, but looks better.
// Switch to other document and check last line for line termination.
// If last line does not have one, but has preceding whitespaces and
// auto-indent feature is enabled, UltraEdit adds on inserting the
// line termination also the preceding whitespaces and last byte(s)
// are therefore again not the line ending character(s). Therefore
// make an extra check after inserting line termination on preceding
// whitespaces and delete them.
// Go to top of file and paste there the list from File A.
// Check now if last line of list has a line termination and insert
// a line to mark end of list in File B. The marker string must be
// a string which surely does not exist ever in one of the 2 files.
// Back at top of file use a regular expression search to insert
// at beginning of every line a special "start of line string".
// It would be also possible to do this with ColumnInsert command
// in column mode, but that requires 3 commands and is slower.
Find RegExp "%"
Replace All "#!#"
// Replace the marker line by a single character different to first
// character inserted on every line with the replace above. This is
// a single replace, but a Replace All is used to keep position at
// top of the file and avoid usually two display updates.
Find MatchCase RegExp "%#!#EnD_Of_LiSt"
Replace All "!"
// Now it is time to run the loop which searches in File B for the
// lines listed in line A to delete them in File B. The loop is exited
// when the line with the exclamation is reached which marks end of list.
// Command SelectLine is usually used to select an entire line. But that
// command selects just the displayed line which can be just a part of a
// real line if soft word-wrap is enabled for File B. Therefore a regular
// expression find is used to select the line. The expression as is works
// for DOS, UNIX and MAC/UNIX files temporarily converted to DOS, but not
// for MAC files not converted to DOS. The selected line from File A and
// all occurrences of that line in File B are next removed with a replace
// all command. The inserted special string at start of every line avoids
// deleting just a substring of a line. The macro should not convert a
// line sequence with the 3 words
// when just the line with "like" should be deleted from File B.
Find RegExp "%*^r++^n"
Replace All ""
// The macro is nearly finished. Now only the line with the exclamation
// mark and the inserted strings at beginning of every line must be deleted.
Find MatchCase RegExp "%#!#"
Replace All ""
// Some users use the setting "Automatically copy to clipboard when
// selection is made" and therefore it was good to keep clipboard 9
// all the time active while running this macro. Now it is time to
// clear the content in this clipboard to free memory and switch
// back to the usually used Windows clipboard.