Find all lines in active file in all files within a directory

Help with writing and running scripts

Find all lines in active file in all files within a directory

Postby zhelev81 » Thu Feb 09, 2012 2:50 pm

Hello everyone, I am new in here so I came to ask what I did not seem to find in this nice software.

What I basically need is to compare a file against multiple other files in other directory.Lets say if we have test.txt file in which:

lksdfksdkfjklsdjkfjlsdkj
mnbmvmbmvnb
mmosaatyewtyetw
xbvcvbcbv
polqw536749


and I want to check if in dir/x which has let's say 10 more text files, in any of them there are same lines.

Is it possible to do that with this software?
zhelev81
Newbie
 
Posts: 6
Joined: Thu Feb 09, 2012 2:40 pm

Re: Find all lines in active file in all files within a directory

Postby Mofi » Tue Feb 14, 2012 1:48 am

Yes, it is possible, but requires coding a script or macro. If you need this only once and there are not many strings in the base file, I suggest to use Find in Files command from menu Search. Select a string to search for and then click on the command in the menu. A dialog opens where selected string is already set as string to search for. Define the other parameters like file type (*.*) and the directory and run the search. In the output window you can see in which files the string was found if found at all.

For many lines to find here is the script running a Find in Files for all lines in the active document on script start. The results of all Find in Files are written to an edit window. You have to change in the script the directory path C:\\Temp\\ (must end with a backslash escaped with an additional backslash) and probably also the file type *.*

The format of the results can be either changed with additional script code by running 1 or more regular expression replaces (best method) or by modifying the options at Advanced - Configuration - Search - Set Find Output Format. I don't know which UltraEdit you use (especially which language) and how the results file should look. Therefore I have not added any code to reformat the results file.

Code: Select all
if (UltraEdit.document.length > 0)
{
   // Define the environment for the script.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();
   UltraEdit.activeDocument.hexOff();

   // Select all and load the file contents into an array of lines.
   UltraEdit.activeDocument.selectAll();
   if (UltraEdit.activeDocument.isSel())
   {
      // The following command works only for files with DOS line terminators!
      var asLines = UltraEdit.activeDocument.selection.split("\r\n");
      UltraEdit.activeDocument.top();

      // Define parameters for the Find in Files executed below in a loop for every line.
      UltraEdit.frInFiles.filesToSearch=0;               // Search in a directory.
      UltraEdit.frInFiles.directoryStart="C:\\Temp\\";   // This is the directory.
      UltraEdit.frInFiles.searchInFilesTypes="*.*";      // Search in these files.
      UltraEdit.frInFiles.useEncoding=false;             // Run an ANSI search.
      UltraEdit.frInFiles.ignoreHiddenSubs=true;         // Ignore hidden subdirectories.
      UltraEdit.frInFiles.matchCase=true;                // Run a case sensitive search.
      UltraEdit.frInFiles.reverseSearch=false;           // Do not find files not containing searched string.
      UltraEdit.frInFiles.matchWord=false;               // Search for strings and not entire words.
      UltraEdit.frInFiles.openMatchingFiles=false;       // Do not open files with string found.
      UltraEdit.frInFiles.displayLinesDoNotMatch=false;  // Do not find lines not containing search string.
      UltraEdit.frInFiles.useOutputWindow=false;         // Output find result to edit window.
      UltraEdit.frInFiles.searchSubs=false;              // Do not search in subdirectories.
      UltraEdit.frInFiles.regExp=false;                  // Run a non regular expression search.

      // Run a Find in Files for all lines in active document. This find does
      // not make sure that the found string is really an entire line in the
      // search files. So it can be that also lines are found which contains
      // the searched string, but also additional characters left and/or right.
      for (var nLineNum = 0; nLineNum < asLines.length; nLineNum++)
      {
         if (!asLines[nLineNum].length) continue;  // Ignore empty lines.
         UltraEdit.frInFiles.find(asLines[nLineNum]);
      }
      // The results file is the active file now. Move caret to top
      // of this file and convert the file from Unicode to ASCII/ANSI.
      UltraEdit.activeDocument.top();
      UltraEdit.activeDocument.unicodeToASCII();
   }
}
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Find all lines in active file in all files within a directory

Postby zhelev81 » Wed Feb 15, 2012 7:12 pm

I created a text file on my desktop with name find.js and pasted what you wrote above. Then opened Scripting - Scripts and added it. Next I opened Search - Find in Files, but when I pasted a few lines that exist in a few files in a directory it doesn't find them.

Can you please tell me what I miss? Do I need any settings?

I made following modifications in the script because I need to search only in text files:

Code: Select all
          UltraEdit.frInFiles.directoryStart="C:\\Temp\\";   // This is the directory.
          UltraEdit.frInFiles.searchInFilesTypes=".txt";      // Search in these files.

Please describe the steps so I can test.

Thank you for spending time on this. I did not find anything online that will do this for my search.
zhelev81
Newbie
 
Posts: 6
Joined: Thu Feb 09, 2012 2:40 pm

Re: Find all lines in active file in all files within a directory

Postby Mofi » Thu Feb 16, 2012 1:28 am

  1. Create a new file in UltraEdit by pressing Ctrl+N if a new file is not already displayed after starting UltraEdit.
  2. Make sure the new file is an ASCII file with DOS line terminators. If you see on the status bar at bottom of the UltraEdit window in third box just DOS, the new file is an ASCII file with DOS line terminators. Otherwise you would need the commands in submenu File - Conversions to convert the file to ASCII with DOS line terminators.
  3. Select the script code in your browser window and press Ctrl+C.
  4. Switch back to UltraEdit and press Ctrl+V to paste the code into the new file.
  5. Go to the lines with folder and file type specification.
  6. First change the file type specification to *.txt. The asterisk is important!
  7. Second change the path to the folder containing the *.txt files if you have not moved the files into folder C:\Temp\. If you modify the path, you must enter 2 backslashes for every backslash in the path and the path must end with 2 backslashes.
  8. Press F12 to open Save As dialog and save the script file to any folder you want. A good place is usually the Scripts folder in the UltraEdit program files directory if you have write access to this folder with your account. But you can use also any other folder.
  9. Open Scripting - Script List and add the just saved script to this list.
  10. Open the file containing the lines you want find in the other files, or for first testing the script, create a new file and enter some lines which exist in the *.txt files in the specified folder.
  11. Open menu Scripting and click on the name of the script file.
  12. The script is now executed and you should see a new file with the results of the Find in Files commands executed by the script for every line of active file on script execution.
  13. If output window is not open, open it with Window - Output Window and check if you can see Script succeeded in the output window and not an error message.
That's it. As I already wrote, you can either with regular expression replaces from within the script change the format of the results, or you define the format of the results before running the script on the file with the lines to find at Advanced - Configuration - Search - Set Find Output Format. I can help you with the regular expressions replaces, but I would need an example how the results file looks after script execution and how it should look to code for you the replaces to get the output format you want.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Find all lines in active file in all files within a directory

Postby zhelev81 » Sun Feb 19, 2012 4:46 am

It doesn't do the job, can you have a look remotely?

If you can do it, please leave your skype, msn, yahoo or any chat you use by pm-ing me ... tnx
zhelev81
Newbie
 
Posts: 6
Joined: Thu Feb 09, 2012 2:40 pm

Re: Find all lines in active file in all files within a directory

Postby Mofi » Sun Feb 19, 2012 2:15 pm

I can't help you remotely nor do I use Skype or any chat tool.

I have packed a slightly modified version of the script with ZIP and uploaded it as attachment to the post.

With this modification you can open the script file together with the text file containing the lines to search for in the other files in UltraEdit.

No other file than this script file and the list file should be opened in UltraEdit.

Edit the directory path on line 23 if necessary and save the script file.

Use from menu Scripting the command Run Active Script and you should see shortly a switch to the list file with the lines to search for and then a results document window should appear with listing the results.
Attachments
FindLinesInTextFiles.zip
Script file to use for searching lines in *.txt files in an entered directory.
(1.24 KiB) Downloaded 53 times
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Find all lines in active file in all files within a directory

Postby zhelev81 » Sun Feb 19, 2012 2:33 pm

Now it is working fine.

But is it possible to make it search within directory I select? Do I need to paste all the time in C:\Temp?

Is it possible to code the script, so that I can select the folder to search in?

For example I open the file I want to check, and all other files which I want to check against are in dir/whatever ... so I just set that dir?

Thanks for the help.
zhelev81
Newbie
 
Posts: 6
Joined: Thu Feb 09, 2012 2:40 pm

Re: Find all lines in active file in all files within a directory

Postby Mofi » Sun Feb 19, 2012 3:03 pm

It is possible that the script asks you for the full path of the directory to search in. But you have to type the full directory path manually, or paste the full path copied from address bar of Windows Explorer into the edit field. There is no scripting command which opens a "browse for directory" dialog which returns the selected directory path as string to the script. UltraEdit scripts are primary to automate regularly needed (file modification) actions without user interactions. There are lots of programming and script languages which are designed for coding applications with user interactions. I replaced above ZIP file with a new version which asks you for the directory path on execution.

It would be also possible that the script file uses the path of the list file as directory path. But the list file with the lines to search for should not be in the same directory as the *.txt files or it has a different file extension. Otherwise all lines in the list file with file extension TXT are surely found in a *.txt file in the directory of the list file making the results less useful.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Find all lines in active file in all files within a directory

Postby zhelev81 » Tue Mar 13, 2012 9:14 am

Hello again, it works now. But is it checking the lines against each other in the same file?

Example:

1234567890
hop6709984
1234567890


Will it detect now that line 1 is same as line 3?

Thank you again for this great plugin.
zhelev81
Newbie
 
Posts: 6
Joined: Thu Feb 09, 2012 2:40 pm

Re: Find all lines in active file in all files within a directory

Postby Mofi » Tue Mar 13, 2012 11:24 am

The script as is does not eliminate duplicate entries in the source file before searching for the lines in all files of a directory. And the script does not remove lines found several times in one of the files.

If you want to remove duplicate lines in source file before searching for the lines in the files, best run from within the script a sort of entire lines with removing duplicates on the source file. I don't know how the output currently looks like and how it should look like and therefore can't suggest a method to remove duplicates in output file.

There are several macros and also some scripts posted demonstrating how to remove duplicate lines without sorting which is slower than simply sorting with removing duplicates, but sometimes necessary because the order of the lines should be kept.


If I should adapt the script to find and ignore duplicate lines in source file before running the Find in Files, or report duplicate found lines in one of the files searched in, I need much more details.

What should the script do exactly? And on which file should the new code work on - the source file with the lines to search for or the generated output file?

Please post a block of lines before running new script code and how this block should look like after running new script code. Enclose both blocks in Code BBCode tags by selecting each block and clicking on Code button above the edit area.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Find all lines in active file in all files within a directory

Postby zhelev81 » Thu Mar 15, 2012 3:31 am

All the script does now is to check all lines in the source file against all files in the c:\temp

I need the script to check also the lines in the source file for duplicates.

Source file example:
Code: Select all
11111-22222-33333-44444-55555
aaaaa-bbbbb-ccccc-ddddd-eeeee
11111-22222-33333-44444-55555

As you see here we have line 1 and 3 duplicated. The script should alert me about this, how many they are, and if there are other dupplicates.

But will it not check if:
Code: Select all
1111122222333334444455555
aaaaabbbbbcccccdddddeeeee
1111122222333334444455555

No dashes?

Or will it not check if it's:
Code: Select all
11112222333344445555
aaaabbbbccccddddeeee
11112222333344445555

Less characters per line?

I need it NOT to do that (to skip line if the are not the same character count or no dashes)

You can make report to a different file and the file contains for example:

11112222333344445555 was found in this and this file, or in the source file and this and this file, if nowhere just report it as now 0 times in 0 files + 0 times in source file.

Add these changes should be done on existing script and not a new one. I would like to do all the stuff with just one click as it is now.

Thank you again for the good support.
zhelev81
Newbie
 
Posts: 6
Joined: Thu Feb 09, 2012 2:40 pm

Re: Find all lines in active file in all files within a directory

Postby Mofi » Sat Mar 17, 2012 10:49 am

I'm now really confused and do not understand anymore what you want. I give you an example what I understand as a detailed description.



File open on script start is C:\Temp\Test.lst containing following lines:

Code: Select all
11111-22222-33333-44444-55555
11112222333344445555
aaaaa-bbbbb-ccccc-ddddd-eeeee
11111-22222-33333-44444-55555
1111122222333334444455555
11112222333344445555
11111-22222-33333-44444-55555

The script is executed on C:\Temp\ as entered by me. This directory contains 3 *.txt files.

C:\Temp\Test1.txt contains the lines:

Code: Select all
1111122222333334444455555
aaaaabbbbbcccccdddddeeeee
1111122222333334444455555

C:\Temp\Test2.txt contains the lines:

Code: Select all
11111-22222-33333-44444-55555
aaaaa-bbbbb-ccccc-ddddd-eeeee
1111122222333334444455555

C:\Temp\Test3.txt contains just the line:

Code: Select all
88811-22222-33333-44444-55555


The script produces currently for these files:

Code: Select all
----------------------------------------
Find '11111-22222-33333-44444-55555' in 'C:\Temp\Test2.txt':
C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555
Found '11111-22222-33333-44444-55555' 1 time(s).
Search complete, found '11111-22222-33333-44444-55555' 1 time(s). (1 file(s)).
Search complete, found '11112222333344445555' 0 time(s). (0 file(s)).
----------------------------------------
Find 'aaaaa-bbbbb-ccccc-ddddd-eeeee' in 'C:\Temp\Test2.txt':
C:\Temp\Test2.txt(2): aaaaa-bbbbb-ccccc-ddddd-eeeee
Found 'aaaaa-bbbbb-ccccc-ddddd-eeeee' 1 time(s).
Search complete, found 'aaaaa-bbbbb-ccccc-ddddd-eeeee' 1 time(s). (1 file(s)).
----------------------------------------
Find '11111-22222-33333-44444-55555' in 'C:\Temp\Test2.txt':
C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555
Found '11111-22222-33333-44444-55555' 1 time(s).
Search complete, found '11111-22222-33333-44444-55555' 1 time(s). (1 file(s)).
----------------------------------------
Find '1111122222333334444455555' in 'C:\Temp\Test1.txt':
C:\Temp\Test1.txt(1): 1111122222333334444455555
C:\Temp\Test1.txt(3): 1111122222333334444455555
Found '1111122222333334444455555' 2 time(s).
----------------------------------------
Find '1111122222333334444455555' in 'C:\Temp\Test2.txt':
C:\Temp\Test2.txt(3): 1111122222333334444455555
Found '1111122222333334444455555' 1 time(s).
Search complete, found '1111122222333334444455555' 3 time(s). (2 file(s)).
Search complete, found '11112222333344445555' 0 time(s). (0 file(s)).
----------------------------------------
Find '11111-22222-33333-44444-55555' in 'C:\Temp\Test2.txt':
C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555
Found '11111-22222-33333-44444-55555' 1 time(s).
Search complete, found '11111-22222-33333-44444-55555' 1 time(s). (1 file(s)).

And the output window contains the lines:

Code: Select all
Running script: C:\Program Files\IDM Computer Solutions\UltraEdit\scripts\FindLinesInTextFiles.js
========================================================================================================
Script succeeded.


The script should first detect duplicate lines in C:\Temp\Test.lst. If there are no duplicate lines in active file on script start, the output window is simply not modified.

But if any duplicate line is found during script execution, the output window should be automatically made visible and list the duplicate lines as follows in the output window.

Found 3 duplicate lines in input list file. The duplicate lines are:

C:\Temp\Test.lst(1): 11111-22222-33333-44444-55555
C:\Temp\Test.lst(4): 11111-22222-33333-44444-55555
C:\Temp\Test.lst(7): 11111-22222-33333-44444-55555

C:\Temp\Test.lst(2): 11112222333344445555
C:\Temp\Test.lst(6): 11112222333344445555


The script should ignore the duplicate lines in input file and reformat the list of found lines to show finally following:

Code: Select all
----------------------------------------

C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555

Found '11111-22222-33333-44444-55555' 1 time(s) in 1 file(s).

----------------------------------------

Found '11112222333344445555' 0 time(s) in (0 file(s).

----------------------------------------

C:\Temp\Test2.txt(2): aaaaa-bbbbb-ccccc-ddddd-eeeee

Found 'aaaaa-bbbbb-ccccc-ddddd-eeeee' 1 time(s) in 1 file(s).

----------------------------------------

C:\Temp\Test1.txt(1): 1111122222333334444455555
C:\Temp\Test1.txt(3): 1111122222333334444455555

C:\Temp\Test2.txt(3): 1111122222333334444455555

Found '1111122222333334444455555' 3 time(s) in 2 file(s).



Something like that would make it clear for a script developer what the script should do and how to test it.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Scripts