Split up large file based on line number count

Help with writing and running scripts

Split up large file based on line number count

Postby srhydderch » Tue Mar 27, 2012 5:46 am

Hi there!

I'm trying to split a massive text (xyz) file into usable chunks so I can process them into survey software.

I am trying to copy by row number (from-to) as I know the size of each chunk that I can deal with.

I want to copy from row 0-1048576 to start. Then from row to 1048576-2097152 as the 2nd set. And so on.

Is this possible in UE?

Thanks, sam
srhydderch
Newbie
 
Posts: 3
Joined: Tue Mar 27, 2012 5:42 am

Re: Split up large file based on line number count

Postby Mofi » Tue Mar 27, 2012 9:44 am

There is already a macro solution for this task, see Splitting Big Files. The task could be done nowadays better with an UltraEdit script, but recoding the macro as script would make sense only if you need to do this regularly and not just once.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Split up large file based on line number count

Postby srhydderch » Tue Mar 27, 2012 10:04 am

Thanks Mofi - much appreciated. Being completely new to this.

Do I have to copy the code, save as a .mac file and run it?

Do I need to define into how many chunks to split up the file, in other words the amount of output files?

Cheers
srhydderch
Newbie
 
Posts: 3
Joined: Tue Mar 27, 2012 5:42 am

Re: Split up large file based on line number count

Postby Mofi » Tue Mar 27, 2012 12:40 pm

Okay, the macro solution is not easy to setup for a beginner. Therefore I decided to code a script for that task. I first wanted to code it for general usage by every UltraEdit / UEStudio user who needs to split up a file based on number of lines. But I stopped the development for the general script some minutes after starting coding the script because for general usage lots of things must be taken into account like file names with no file extension, splitting up file contents of new file not yet saved, Unicode and ASCII/ANSI files, DOS/UNIX/MAC terminated lines, version of UltraEdit, ...

So I developed a quick solution for you working only for ASCII/ANSI files. The file to split must be the first file opened in UltraEdit which is the most left file on open file tabs bar.

Copy the following code into a new file and save it for example with file name SplitFile.js. Then run the script by clicking on menu item Run Active Script in menu Scripting. The script copies now 1.048.576 lines into a new file, saves the new file into the same directory as the first opened file with same file name, but with an incrementing number after an underscore before the file extension.

Code: Select all
// First file (most left on file tabs bar) must be the file to split.

if (UltraEdit.document.length > 1) {  // Is any file open?

   var nLinesPerFile = 1048576;
   var nNextLineNum = nLinesPerFile + 1;
   var nFileCount = 0;

   // Define the environment for the script.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();
   UltraEdit.document[0].hexOff();
   // Move caret to top of the file.
   UltraEdit.document[0].top();

   // Quick and dirty solution to get file name without extension
   // and the file extension. Does not work for all file names.
   var nLastPoint = UltraEdit.document[0].path.lastIndexOf('.');
   if (nLastPoint < 0) nLastPoint = UltraEdit.document[0].path.length;
   var sFileName = UltraEdit.document[0].path.substr(0,nLastPoint) + '_';
   var sFileExt = UltraEdit.document[0].path.substr(nLastPoint);

   while (1) {
      UltraEdit.document[0].gotoLineSelect(nNextLineNum,1);
      if (!UltraEdit.document[0].isSel()) break;
      UltraEdit.newFile();
      UltraEdit.activeDocument.write(UltraEdit.document[0].selection);
      nFileCount++;
      UltraEdit.saveAs(sFileName + nFileCount + sFileExt);
      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      nNextLineNum += nLinesPerFile;
      UltraEdit.document[0].cancelSelect();
   }
   UltraEdit.document[0].top();
   UltraEdit.messageBox(nFileCount + " files created.");
}

The script as is requires UltraEdit v17.20 or UEStudio v11.20 or later because of function cancelSelect() and does not work for Unicode files.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Split up large file based on line number count

Postby srhydderch » Wed Mar 28, 2012 4:24 am

Mofi - I really appreciate this - many many thanks

I let you know how i get on.

cheers dude
sam
srhydderch
Newbie
 
Posts: 3
Joined: Tue Mar 27, 2012 5:42 am


Return to Scripts