Extract RegExp Matches from Doc to Clipboard

Help with writing and running scripts

Extract RegExp Matches from Doc to Clipboard

Postby bmatsoukas » Wed Aug 03, 2011 3:53 pm

This script performs a regular expression (regexp) search of the active document, writing unique matches to the Windows clipboard.

    This script was written and tested using UEStudio version 10.20.0.1001 running on Windows Server 2008 R2 Enterprise.

    RegExp is enabled before the search is executed, activating your default regexp engine (I use the UE engine). You can force the script to use the UE, Perl, or Unix engine by uncommenting/commenting a couple of lines near the search execution command.

    Each regexp match is placed on the Windows clipboard sorted (ascending, case-insensitive) and with duplicate matches removed. The duplicate removal can be changed by commenting/uncommenting lines in this script's sortDoc() function. You can also modify any of the sorting option lines to suit your needs.

Remember that this script is designed to extract unique RegExp matches to a list. If you enter a search term that returns one or more of the same result, you will have a list of one item. For example, if you searched is script for the term
Code: Select all
index

the result would be one line on the clipboard:
Code: Select all
index

Although there are more than a few occurrences of "index" in this script, duplicates are removed before the list is written to the clipboard. By default the sort is case-insensitive; ergo a single line is left when (case-insensitive) duplicates are removed.

Ok, so how is this script useful?

I use UE extensively to write PowerShell scripts. I often create XML configuration files for automated product installation. These installations usually required a number of Windows domain user accounts which must be embedded in these XML files. In addition, these accounts are sometimes coded in more than one place.

As part of installation setup I must create each user account. To this end I use a list of user accounts which I run against a batch file. I use this script to create the account list by running the script against the
populated XML configuration file using this regexp:
Code: Select all
domainName\\[a-zA-Z][a-zA-Z0-9]++

The result is a list of unique user account names as I use them in my account creation process

    Note: This particular regexp may not work for you as Windows allows characters not specified here and there is a length limit to account names...I just don't need those details for my environment.
I have subsequently used this script to help me document scripts and other source code. For example, if you run the following regexp against this script file:
Code: Select all
var [a-zA-Z][a-zA-Z0-9\\-_]++

you get a list of formally defined variables:
Code: Select all
var callerDocIdx
var callerIdx
var frMatchCase
var frMatchWord
var frMode
var frRegExp
var frSearchAscii
var frSearchDown
var frSearchInColumn
var homeDocIdx
var i
var outputDocIdx
var searchUserName
var tabindex


Here is the complete script:
Code: Select all
////////////////////////////////////////////////////////
/// extractRegExpReults.js
///
/// Searches a document and saves results to the
/// Windows clipboard.
///
/// o The search is a regular expression. RegExp is enabled before the search is executed, so it
///   uses the default UE RegExp engine. You can use the Perl or Unex engine by uncommenting
///   lines in this script (near the search execution).
///
/// o Each RegExp match is placed on the Windows clipboard: Sorted (ascending) and duplicates
///   removed. The duplicate removal can be changed by uncommenting line(s) in the sortDoc()
///   function below. You can also modify any of the sorting option lines to suit your needs.
///
/////////////////////////////////////////////////////////////////////////////////////////////////

////////////////////////////////////////////////////
// main function - called by last line in this file
////////////////////////////////////////////////////
function main() {

    var searchUserName = UltraEdit.getString("Enter a search expression.",1);   // Get the search expression.
   
    var homeDocIdx = getActiveDocumentIndex();  // Remember the target document index
    var outputDocIdx = UltraEdit.document.length;  // Remember temp document index
    UltraEdit.newFile();                                          // Create a temp document (becomes the active document)
    UltraEdit.document[homeDocIdx].setActive();    // Reset source document to active after newFile()
    UltraEdit.activeDocument.top();                        // Start the search at the beginning of the document
   
    // Save current UE search settings
    var frMode                 = UltraEdit.activeDocument.findReplace.mode;
    var frMatchCase         = UltraEdit.activeDocument.findReplace.matchCase;
    var frMatchWord        = UltraEdit.activeDocument.findReplace.matchWord;
    var frRegExp             = UltraEdit.activeDocument.findReplace.regExp;
    var frSearchAscii        = UltraEdit.activeDocument.findReplace.searchAscii;
    var frSearchDown      = UltraEdit.activeDocument.findReplace.searchDown;
    var frSearchInColumn = UltraEdit.activeDocument.findReplace.searchInColumn;
   
    UltraEdit.activeDocument.findReplace.mode=0;                // Set search options
    UltraEdit.activeDocument.findReplace.matchCase=false;
    UltraEdit.activeDocument.findReplace.matchWord=false;
    UltraEdit.activeDocument.findReplace.regExp=true;
    UltraEdit.activeDocument.findReplace.searchAscii=false;
    UltraEdit.activeDocument.findReplace.searchDown=true;
    UltraEdit.activeDocument.findReplace.searchInColumn=false;
   
    // Uncomment the appropriate line for the regExp engine you want to use:
    //UltraEdit.perlReOn();   // Perl
    UltraEdit.ueReOn();       // UltraEdit
    //UltraEdit.unixReOn();   // Unix
   
    // Find all regExp matches and write them to the temporary document
    while(UltraEdit.activeDocument.findReplace.find(searchUserName)) {
       
        UltraEdit.document[outputDocIdx].write(UltraEdit.activeDocument.selection + "\r\n");   
    }
   
    sortDoc(outputDocIdx);                                              // Sort the results in the temporary document
    copyDocToClipboard(outputDocIdx, 0);                                // Copy to clipboard - Using Widows clipboard.
    UltraEdit.closeFile(UltraEdit.document[outputDocIdx].path,2);       // Dispose of the temporary document
   
    // Restore the original settings for UE search
    UltraEdit.activeDocument.findReplace.mode           = frMode;
    UltraEdit.activeDocument.findReplace.matchCase      = frMatchCase;
    UltraEdit.activeDocument.findReplace.matchWord      = frMatchWord;
    UltraEdit.activeDocument.findReplace.regExp         = frRegExp;
    UltraEdit.activeDocument.findReplace.searchAscii    = frSearchAscii;
    UltraEdit.activeDocument.findReplace.searchDown     = frSearchDown;
    UltraEdit.activeDocument.findReplace.searchInColumn = frSearchInColumn;
   
    UltraEdit.activeDocument.top();
   
    return  // Ends main() - effectively exits the script.
}

// ////////////////////////////////////////////////////////
// // sub functions
// ////////////////////////////////////////////////////////

// //////////////////////////////////////
// sortDoc()
// //////////////////////////////////////
// Sorts specified document by line and ascending;
// removes duplicate lines. Target document is
// referenced by index number.
function sortDoc(targetIdx) {
   
    var callerIdx = getActiveDocumentIndex();           // Remember the caller's active document
    UltraEdit.document[targetIdx].setActive();          // Set the sort target
       
    UltraEdit.activeDocument.sort.ascending=true;       // Set sort options
    UltraEdit.activeDocument.sort.col1Start=1;
    UltraEdit.activeDocument.sort.col1End=-1;
    UltraEdit.activeDocument.sort.col2Start=0;
    UltraEdit.activeDocument.sort.col2End=0;
    UltraEdit.activeDocument.sort.col3Start=0;
    UltraEdit.activeDocument.sort.col3End=0;
    UltraEdit.activeDocument.sort.col4Start=0;
    UltraEdit.activeDocument.sort.col4End=0;
    UltraEdit.activeDocument.sort.ignoreCase=true;
    UltraEdit.activeDocument.sort.removeDuplicates=2;
    UltraEdit.activeDocument.sort.remKey1=true;
    UltraEdit.activeDocument.sort.remKey2=true;
    UltraEdit.activeDocument.sort.remKey3=true;
    UltraEdit.activeDocument.sort.remKey4=true;
    UltraEdit.activeDocument.sort.type=0;
   
    UltraEdit.activeDocument.sort.sort();               // Sort the target document
    UltraEdit.document[callerIdx].setActive();          // Restore the caller's active document
   
    return // done.
}

// //////////////////////////////////
// copyDocToClipboard()
// //////////////////////////////////
// Copy entire document (specified by index number)
// to clipboard (also specified by index number)
function copyDocToClipboard(documentIdx, clipIdx) {
   
    var callerDocIdx = getActiveDocumentIndex();                    // Remember the current active document
    try {
        UltraEdit.document[documentIdx].setActive()                 // Activate copy source document
    }
    catch (e) {
        throw "copyDocToClipboard: Invalid document index.";
    }
         
    if (clipIdx < 0) { throw "Clipboard index out of range."; }     // Check clipboard index range.
    if (clipIdx > 9) { throw "Clipboard index out of range."; }
   
    UltraEdit.selectClipboard(clipIdx);                             // Select and clear the active clipboard
    UltraEdit.clearClipboard();
    UltraEdit.activeDocument.selectAll();                           // Select the text, copy to active clipboard
    UltraEdit.clipboardContent = UltraEdit.activeDocument.selection;
   
    UltraEdit.document[callerDocIdx].setActive();                   // Restore the caller's active document   
   
    return // done.   
}

// Get the index for the active document.
// I lifted this from the UE user forum. Thank
// you jorrasdk,
function getActiveDocumentIndex() {
   var tabindex = -1; /* start value */

   for (var i = 0; i < UltraEdit.document.length; i++)
   {
      if (UltraEdit.activeDocument.path==UltraEdit.document[i].path) {
         tabindex = i;
         break;
      }
   }
   return tabindex;
}

main()
bmatsoukas
Newbie
 
Posts: 2
Joined: Mon Jan 17, 2011 6:31 pm

Re: Extract RegExp Matches from Doc to Clipboard

Postby UltraFanatic » Thu Dec 15, 2011 2:16 am

Thank you very much for this script. :) It was almost what I was looking for.

I commented a few lines so the temporary document isn't closed and focus isn't returned to the original document, cause I'd like to see the found matches after the script has been run.

Instead of only searching the active document I would have liked to search all files matching a pattern, for example "C:\dummy\*.txt". My scriptwriting ability is not that good, and so far I've only added one line:

var searchFiles = UltraEdit.getString("Enter files to search.",1); // Search all files matching this pattern

Would it be hard to alter the rest of the script ? Or is there perhaps an easier way to achieve what I need ?
To avoid confusion, maybe I should point out that I only want one resulting list of unique matches, not one list per file.
Meanwhile, I'll see if I can find any more hints in the forum to guide me.

Best Regards
UltraFanatic
User avatar
UltraFanatic
Newbie
 
Posts: 5
Joined: Sun Jan 24, 2010 2:56 pm

Re: Extract RegExp Matches from Doc to Clipboard

Postby Mofi » Thu Dec 15, 2011 8:11 am

What bmatsoukas surely needs to know: Are the files matching the entered pattern already opened in UltraEdit or must the script search for files matching the pattern and open them?

Another question:

Why do you not use command Find in Files and use some regular expression replaces to remove all information from the results written to output window or (in this case better) to a new document window you don't want like the file names, line numbers, how often the search string was found, etc.?

Using Find in Files is much quicker for your task then a script solution. Of course you could code a script to run Find in Files with an entered file name pattern and an entered search string and which removes the unwanted information automatically in the new file.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Extract RegExp Matches from Doc to Clipboard

Postby UltraFanatic » Thu Dec 15, 2011 12:51 pm

Thanks for your reply Mofi. :)

My intention was that the script would find and open files too. However, your suggestion of using find in files seems to work well too.

First I perform the regular expression search on all files, writing the result to an edit window, and then run bmatsoukas script with the same regular expression to extract the unique strings from the result window. Is that how you meant ?

I've set the Find in Files parameter "Found Line" to $S. I can't see that there is a $-command to only write "found text" in the search result instead of the full line, but I may have missed it. Otherwise I could have just used File - Sort and remove duplicates from the search result.

Take care
UltraFanatic
User avatar
UltraFanatic
Newbie
 
Posts: 5
Joined: Sun Jan 24, 2010 2:56 pm

Re: Extract RegExp Matches from Doc to Clipboard

Postby Mofi » Thu Dec 15, 2011 1:35 pm

UltraFanatic wrote:Is that how you meant?

Yes, that is one method. Of course you can modify the script written by bmatsoukas and insert at beginning the command to execute Find in Files with results to an edit window directly from within the script. Just move the block

Code: Select all
    // Uncomment the appropriate line for the regExp engine you want to use:
    //UltraEdit.perlReOn();   // Perl
    UltraEdit.ueReOn();       // UltraEdit
    //UltraEdit.unixReOn();   // Unix

to top of function main() to define the regular expression engine to use also for the Find in Files command executed perhaps first. And insert below the line

Code: Select all
var searchUserName = UltraEdit.getString("Enter a search expression.",1);   // Get the search expression.

following block:

Code: Select all
    var sFilePattern = UltraEdit.getString("Enter file search pattern:",1);     // Get the file pattern.
    if (sFilePattern != "") {
       UltraEdit.frInFiles.searchSubs=false;
       UltraEdit.frInFiles.directoryStart="";
       UltraEdit.frInFiles.searchInFilesTypes=sFilePattern;
       UltraEdit.frInFiles.filesToSearch=0;
       UltraEdit.frInFiles.matchCase=false;
       UltraEdit.frInFiles.matchWord=false;
       UltraEdit.frInFiles.regExp=true;
       UltraEdit.frInFiles.unicodeSearch=false;
       UltraEdit.frInFiles.reverseSearch=false;
       UltraEdit.frInFiles.useOutputWindow=false;
       if (typeof(UltraEdit.frInFiles.openMatchingFiles) == "boolean")
           UltraEdit.frInFiles.openMatchingFiles=false;
       UltraEdit.frInFiles.find(searchUserName);
    }

If you enter nothing for file search pattern, the script runs simply on active file as designed by bmatsoukas. Otherwise first the Find in Files command is executed with results written to an edit window.

It would be perhaps better to clean up the results list before running the rest of the script on the results inside the IF condition posted above at the end. That's not needed for real regular expression searches, but if you just search for a simple word there are usually also the lines containing the searched strings in the results and then the original script will always find the searched word even when not found in any file.

Well, it looks like you have disabled all options to get as less information as possible in the results file. Therefore you don't need to run regular expression replaces to remove unwanted lines, the file names and the line numbers from the results of Find in Files command.

Because Find in Files always returns lines containing a found string there is no variable for just the found string.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Extract RegExp Matches from Doc to Clipboard

Postby UltraFanatic » Sun Dec 18, 2011 8:42 am

Mofi wrote:Because Find in Files always returns lines containing a found string there is no variable for just the found string.

Thanks for confirming that and for your addition to the script.
After the line with "var homeDocIdx" I remove duplicates from the file when a filepattern has been given to minimize data for the second part of the script to process. In one case the number of lines changed from 11783 down to 3149.

Code: Select all
if (sFilePattern != "") {
  // Sort the search results, and removing duplicates
  sortDoc(homeDocIdx);
}

I saw in another thread that you recommended to store values in a string before writing them to a document, so I altered that part too.
Code: Select all
var sFoundLines = "";
// Find all regExp matches and add them to the string followed by a linebreak
while(UltraEdit.activeDocument.findReplace.find(searchUserName)) {
  sFoundLines += UltraEdit.activeDocument.selection + "\r\n";
}

// Write found matches to the temporary document
UltraEdit.document[outputDocIdx].write(sFoundLines);

It would have been handy if the output format could be set temporarily in the script, so I don't have to go to Advanced - Configuration - Set Find Output Format to uncheck header, summaries and changing Found Line to $S. Then go back and reset them after running the script. But I'm not complaining, it just would have been convenient since I mostly prefer using the default values.
I guess I could keep the default setting and add code to the script that cleans up the search result as you suggest, but then I might lose more time on the script runnning than it takes for me to set the output format manually. ;)
User avatar
UltraFanatic
Newbie
 
Posts: 5
Joined: Sun Jan 24, 2010 2:56 pm

Re: Extract RegExp Matches from Doc to Clipboard

Postby Mofi » Mon Dec 19, 2011 5:51 am

Why do you change the find output format before running the script. It would be much easier to delete with the script the not needed lines as I suggested. For example for default English find output format the lines of no interest for this script could be removed with following commands inserted below the line UltraEdit.frInFiles.find(searchUserName); when UltraEdit regular expression is set at start of the script.

Code: Select all
       UltraEdit.activeDocument.findReplace.mode=0;
       UltraEdit.activeDocument.findReplace.matchCase=true;
       UltraEdit.activeDocument.findReplace.matchWord=false;
       UltraEdit.activeDocument.findReplace.regExp=false;
       UltraEdit.activeDocument.findReplace.searchDown=false;
       if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
           UltraEdit.activeDocument.findReplace.searchInColumn=false;
       // Delete find summary and everything below.
       if (UltraEdit.activeDocument.findReplace.find("Search complete, found")) {
          UltraEdit.activeDocument.selectToBottom();
          UltraEdit.activeDocument.deleteText();
       }
       UltraEdit.activeDocument.top();
       UltraEdit.activeDocument.findReplace.regExp=true;
       UltraEdit.activeDocument.findReplace.searchDown=true;
       UltraEdit.activeDocument.findReplace.mode=0;
       UltraEdit.activeDocument.findReplace.preserveCase=false;
       UltraEdit.activeDocument.findReplace.replaceAll=true;
       UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
       // Delete headers and file summaries.
       UltraEdit.activeDocument.findReplace.replace("%---*^p", "");
       UltraEdit.activeDocument.findReplace.replace("%F[iou]+nd*^p", "");
       // Delete the file names at start of every found line.
       UltraEdit.activeDocument.findReplace.replace("%*([0-9]+): ", "");
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Scripts