Splitting Big Files

Help with writing and playing macros

Splitting Big Files

Postby Craig_UE » Tue Mar 22, 2005 5:17 pm

Is there an easy way to split files in UltraEdit?

I have read the posts below that show how to use macros to select and copy every 7 lines and create new files but I wondered if there was another way?

I have a csv file of 100MB or so with over 1 million lines and I would like to break it up small enough to load it into Excel in chunks.

Any advice gratefully accepted.

thanks,
Craig
User avatar
Craig_UE
Newbie
 
Posts: 2
Joined: Tue Mar 22, 2005 12:00 am

Re: Splitting Big Files

Postby Mofi » Wed Mar 23, 2005 5:47 am

First you should make a copy of your big csv file. Open the copy with UltraEdit without temp file. Run following macro as often as needed. It saves always the first 65535 lines to a new file and you have always to enter the filename for the new file.

InsertMode
ColumnModeOff
HexOff
UnixReOff
GotoLine 65536
SelectToTop
Cut
NewFile
Paste
SaveAs ""
CloseFile

If the byte count of each line is constant, you could also use the split file feature of Total Commander from http://www.ghisler.com/
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Splitting Big Files

Postby Craig_UE » Wed Mar 23, 2005 4:19 pm

Mofi,

thanks for your reply. I was thinking it might be a bit inefficient to run a macro to select and paste such a big range but I was surprised how well the macro ran.

Eventually I'll stop being so lazy and write something to handle it for myself in java but in the meantime - thank you for taking the time to reply. I appreciate your help.

regards,
Craig
User avatar
Craig_UE
Newbie
 
Posts: 2
Joined: Tue Mar 22, 2005 12:00 am

Re: Splitting Big Files

Postby khoelsch » Thu Mar 29, 2007 6:48 pm

OK, Mofi, I have searched the existing threads but still don't see the full answer. I have the split part up to taking the 1st 25k lines, copy to new file and save, but it leaves the "P" mark for the cut 25K lines. I wanted to delete these lines completely so that I could then select the first 25K lines again, copy to a new file, save, and so on. Then, some way to stop the process when there are no more rows with data. My file is currently about 175K lines.

My original post:

I am trying to write a macro that does the following:

1) Selects the 1st 25,000 lines of the file (required by the target system)
2) Opens a new file (must be blank) and pastes the 25000 lines in
3) Save and close the new file
4) Takes the next 25,000 lines in the original file and repeats the process to a seconde new file name
5) Ends when the bottom of the original file is reached

I managed to figure out how to cut/paste to the new file but am unable to do the following:

a) How do I delete the special end of line character in the original file? The idea being I cut/paste, the next 25K lines move up to the top and then I run some type of loop until reaching a completely blank row.

b) What is the best way to loop this process including new file names?

c) Do I need to clear the newly created files each time or is there a way to replace what is already in the file?

Any suggestions would be greatly appreciated.
User avatar
khoelsch
Newbie
 
Posts: 6
Joined: Mon Mar 12, 2007 11:00 pm

Re: Splitting Big Files

Postby Mofi » Fri Mar 30, 2007 7:35 am

I used the second macro from Spliting text file and adapted it hopefully correct for your need.

For auto saving the new files with an increasing number you must FIRST create my universal CountUp macro. The source code with description can be found at counter.

THEN create the following macro. You have to adapt the red highlighted filename with path. If you think you will not produce more than 999 files, you can modify all 0000 to 000 or less (a single 0 should be also enough for you).

Usage of the blue highlighted column number 1 depends on your version of UE. The column number is required since UE v12.20 and must be removed for previous versions.

Make sure you have only your source file open or it is the most right one in the file tab order because of possible problems with setting Move to nearest left tab after current tab is closed.

InsertMode
ColumnModeOff
HexOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
"0000"
SelectToTop
Clipboard 8
Cut
Clipboard 9
Loop
GotoLineSelect 25001 1
IfSel
Cut
EndSelect
NewFile
Paste
Top
"C:\Temp\Test_0000.tmp
"
Key UP ARROW
Find "0000"
Clipboard 8
PlayMacro 1 "CountUp"
Key HOME
StartSelect
Key END
Clipboard 9
Copy
EndSelect
DeleteLine
SaveAs "^c"
CloseFile
Else
ExitLoop
EndIf
EndLoop
CloseFile NoSave
Clipboard 9
ClearClipboard
Clipboard 8
ClearClipboard
Clipboard 0
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Splitting Big Files

Postby khoelsch » Mon Apr 02, 2007 4:56 pm

Thanks Mofi! This works great except for it never ends. My file is about 175,000 rows to it should end after 8 files are created. It keeps going. Should your "If's" already be taking care of it?

Mofi, also, the exact row count for now is 171,019 so I would expect 6 full files (25000 rows each, which happens) and then one partial file with the last 21,019 rows in the 7th file. Instead, the 7th file and all the files created afterwards until I cancel the macro only shows 88 rows and then a partial 89th row. Any ideas?

I use version 13. I am on a temp copy for now.
User avatar
khoelsch
Newbie
 
Posts: 6
Joined: Mon Mar 12, 2007 11:00 pm

Re: Splitting Big Files

Postby Mofi » Tue Apr 03, 2007 1:49 pm

Well, it worked perfect on my computer. I have tested it with UE v13.00+4 with a small file and a smaller line number and the last file contained less lines. GotoLineSelect with a too high line number should select everthing to end of the file and so cuts the last part of the file into the last file. After this has be done the source file is completely empty and GotoLineSelect cannot select anything anymore which should make the Else branch of IfSel active which means ExitLoop.

I have created now also a file with 171,019 lines and the macro is really not working with UE v13.00+4. It works for the first 6 files for line 1 - 150,000, but the remaining 21,019 lines are not correct saved into the last file. This is definitely a bug of UE.

It looks like there is a synchronization problem with last GotoLineSelect 25000 1 when there are no more 25,000 lines. The macro continues in this situation before the cursor is moved in selection mode to bottom of the file. So only a few lines (on my computer a few hundred - thousand) are only selected before the macro continues resulting in producing more files than expected (10 files too much on my computer).

I could not find any workaround for this synchronization problem. I will sent a bug report email to IDM support with my test file and the test macro.

The only solution I have for you is to run the loop only a specified number of times - 6 for your source file - and then save the remaining part of the file as last file. I know, this is not really good, because the loop number must be edited in the macro to the correct number (line count / 25000) before macro execution. But currently I have no better idea how to handle this UE bug and I have tried a lot.

InsertMode
ColumnModeOff
HexOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
"0000"
SelectToTop
Clipboard 8
Cut
Clipboard 9
Loop 6
GotoLineSelect 25001 1
IfSel
Cut
EndSelect
NewFile
Paste
Top
"C:\Temp\Test_0000.tmp
"
Key UP ARROW
Find "0000"
Clipboard 8
PlayMacro 1 "CountUp"
Key HOME
StartSelect
Key END
Clipboard 9
Copy
EndSelect
DeleteLine
SaveAs "^c"
CloseFile
Else
ExitLoop
EndIf
EndLoop
IfEof
CloseFile NoSave
Else
"
C:\Temp\Test_0000.tmp
"
Key UP ARROW
Find "
0000"
Clipboard 8
PlayMacro 1 "CountUp"
Key HOME
StartSelect
Key END
Clipboard 9
Copy
EndSelect
DeleteLine
SaveAs "^c"
CloseFile
EndIf

Clipboard 9
ClearClipboard
Clipboard 8
ClearClipboard
Clipboard 0
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Splitting Big Files

Postby khoelsch » Tue Apr 03, 2007 2:27 pm

Thanks. Nice to know I am not completely crazy. Now which version were you using before when it worked? The client I am at is using version 12 so it might actually work for them. I will try it. If not, I will use the limited macro you sent. Thanks again for all your help.

Ended up going with the modified Loop6 macro as they are moving to Version 13 too.

Only issue is I saved and emailed the macro file to the person who will maintain. When we opened the 171K source file on his PC and ran my 1st macro, half the file mod's were off 1 column. When I went to change the column # in the macro, running the macro no longer worked. The first step of that macro replaces the " marks with nothing. It appears to not be recognizing any of the " marks. If open the same file on my UltraEdit, the macro is fine. Is there some setting on his UE that is different? I have to hand over this whole process to him to maintain. We are both on Version 13. Thanks for your help.
User avatar
khoelsch
Newbie
 
Posts: 6
Joined: Mon Mar 12, 2007 11:00 pm

Re: Splitting Big Files

Postby Mofi » Wed Apr 04, 2007 6:52 am

I used UE v13.00+4 as I wrote the first version which worked. But it worked only because I have used a very small source file and GotoLineSelect 6 1 and not a large file with several MBs and line number 25001 for this command. I have not expected that there is a difference, but we and IDM now know that there is a difference because of a synchronization problem. IDM support could reproduce this and forwarded it to the developers.

About the macro loading/editing problem look at Selecting a block (range) to the end of file in macro.

In the mean time I have found a workaround. It is MUCH slower, but it really works independent of the number of lines in the source file. In my first posted version of the macro instead of GotoLineSelect 25001 1 a second submacro with for example name "Down 25k lines" must be called with PlayMacro 1 "Down 25k lines". The submacro must be created before editing the main macro and must contain following commands:

Loop 25000
Key DOWN ARROW
IfEof
ExitLoop
EndIf
EndLoop
SelectToTop


Key PGDN would be faster but how many lines a page has depends on current window size and so it is not really good practise to use it here.

Edit: The problem with macro continuation before cursor reaches the bottom of the file when the line number is much greater than the number of lines was fixed with UE v13.10 and UES v6.30.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Splitting Big Files

Postby debontehond » Thu Apr 15, 2010 9:19 am

Hello,


I have very large (3GB) .CSV pipe | delimited files containing millions of records. I would like to split this into chunks based on the records in the first row. The first line contains the headers and I would like to keep this on top of each split file. The files should be split based on the records in the first row containing a unique name. So in the end I would like to end up with a split for each name that occurs in the first row and have each split start with the same header as the source file. The split files should be have the filename according to same the name in the 1st row used to split the file.
The data looks something like this:

Vendor|Period|Inventory Number|Owner User Name|Owner EMP ID
ATTU1|JAN10|123657898446551|NAME1|NLMMCD1
ATTU1|JAN10|123657898446552|NAME2|NLMMCD2
ATTU1|JAN10|123657898446553|NAME3|NLMMCD3
ATTU2|JAN10|123657898446554|NAME4|NLMMCD4
ATTU2|JAN10|123657898446555|NAME5|NLMMCD5
ATTU3|JAN10|123657898446556|NAME6|NLMMCD6
etc.

I would like to split using the names in the Vendor column and this example should give me three files named ATTU1 to ATTU3 and each file should contain the same header.
Is this possible using UE8.0? Is there a macro available to do this?
debontehond
Newbie
 
Posts: 1
Joined: Thu Apr 15, 2010 9:05 am

Re: Splitting Big Files

Postby Mofi » Thu Apr 29, 2010 12:46 am

You have done this perhaps already manually or using a different tool, but it should be possible. The reason why I did not reply earlier with a possible solution is that I do not have anymore UE v8.00. So the macros below which worked on your example are tested with UE v11.20b and I just can hope that they work also for extremly old version 8.00.

Both macros must have property Continue if a Find with Replace not found checked and property Show Cancel Dialog for this macro unchecked.

The macro you must create first is named FindVendorLines. Don't change the name. The macro with this case sensitive name is called by the second macro which must be created next. The code for this macro is:

Loop
Clipboard 8
Find RegExp "%^c|*^p"
IfFound
Clipboard 9
CopyAppend
Else
ExitLoop
EndIf
EndLoop

The second macro can have any name you want. The code for this macro is:

InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
SelectLine
Clipboard 7
Copy
EndSelect
Key HOME
Loop
IfEof
ExitLoop
EndIf
SelectWord
Clipboard 8
Copy
SelectLine
Clipboard 9
Copy
PlayMacro 1 "FindVendorLines"
EndSelect
Key HOME
Key DOWN ARROW
NewFile
Clipboard 7
Paste
Clipboard 9
Paste
Top
Clipboard 8
Paste
".csv"
SelectToTop
Cut
SaveAs "^c"
CloseFile
EndLoop
ClearClipboard
Clipboard 9
ClearClipboard
Clipboard 7
ClearClipboard
Clipboard 0

This second macro must be executed on the huge CSV file. It is important that no vendor string contains any UltraEdit regular expression character, see the list of UltraEdit regular expression characters in the help of UltraEdit. Further it is important that no vendor string contains a character not allowed in a file name because the vendor string is used for saving the files.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Splitting Big Files

Postby wieland.korn » Mon Oct 25, 2010 4:36 am

Hi, Mofi.
I have a good idea for you second macro.
The key point is about dynamic name for "Save As...".
When you split a big file into some smaller files, I think you can write a macro let the file's name change automatically.
I am a beginner, I don't know how to implement this funtionality, but I think you can try.
wieland.korn
Newbie
 
Posts: 2
Joined: Mon Oct 25, 2010 1:14 am

Re: Splitting Big Files

Postby Mofi » Tue Oct 26, 2010 10:29 am

I have already supplied this functionality. The explanation for the macro in my second post is mainly explaining how saving the new files with an auto-increasing number in the file name is done and most of the code of the macro in my second post is just for this functionality. However, if you want a stand alone macro for saving a file with an auto-increasing number, here it is.

Macro FileNameNumber

Code: Select all
InsertMode
ColumnModeOff
Top
Clipboard 8
Paste
"|"
Find RegExp Up "[0-9]"
EndSelect
Key LEFT ARROW
OverStrikeMode
Loop 0
IfCharIs "0"
"1"
ExitLoop
EndIf
IfCharIs "1"
"2"
ExitLoop
EndIf
IfCharIs "2"
"3"
ExitLoop
EndIf
IfCharIs "3"
"4"
ExitLoop
EndIf
IfCharIs "4"
"5"
ExitLoop
EndIf
IfCharIs "5"
"6"
ExitLoop
EndIf
IfCharIs "6"
"7"
ExitLoop
EndIf
IfCharIs "7"
"8"
ExitLoop
EndIf
IfCharIs "8"
"9"
ExitLoop
EndIf
IfCharIs "9"
"0"
Key LEFT ARROW
IfColNum 1
InsertMode
"1"
ExitLoop
EndIf
Key LEFT ARROW
IfCharIs "0123456789"
Else
Key RIGHT ARROW
InsertMode
"1"
ExitLoop
EndIf
EndIf
EndLoop
InsertMode
Top
StartSelect
Find Select "|"
Key LEFT ARROW
Cut
EndSelect
Delete

This macro consists mainly of the code from macro CountUp. Just the code at top is slightly changed and the code at bottom is simplified for the purpose of this macro.

The final file name is stored in user clipboard 8 after playing this macro with the command

PlayMacro 1 "FileNameNumber"

and therefore just the command SaveAs "^c" must be used to save the current file with the file name with the auto-increasing number inside.

Please note that first the macro FileNameNumber must be created before any other macro stored in the same macro file playing this macro can be created.

Further take into account that clipboard 8 is used as string variable buffer for the file name. So make sure to use another clipboard in the main macro playing macro FileNameNumber to increase the number in the file name. In other words after saving a new file with command SaveAs "^c" the next command before using any clipboard again in code execution sequence should be Clipboard x with x is 0 to 9 except 8.

Last the macro can't be used as is without an initialization of the file name in clipboard 8. It is necessary that the main macro contains code to copy a valid file name with or without path into user clipboard 8. I suggest to initialize clipboard 8 always with a file name with full path because a new file saved with a file name without path is saved in the current working directory of UE/UES which could be also the program directory of UE/UES which is often write-protected and therefore saving the files fail. Here is a code example for initializing clipboard 8 with a file name.

Top
"C:\Temp\Temp_00.txt"
SelectToTop
Clipboard 8
Cut
Clipboard 0


Of course the file name string could be also manually copied into clipboard 8 before running any macro.

Important is that the file name contains a number. If this number starts with 0 or for example with 2395 does not matter. The number of leadings zeros also does not matter. But it is advisable to use the right number of zeros in the initial file name string according to the expected number of files to not get files with the numbers 1, 2, 3, ..., 8, 9, 10, etc. but get instead files with 01, 02, 03, ..., 08, 09, 10, etc.

As an example for usage of macro FileNameNumber let us assume that a macro is needed to create 10 files with a file name entered by the user and the content for the 10 files should be the current content of the Windows clipboard.

Code: Select all
InsertMode
ColumnModeOff
HexOff
UnixReOff
NewFile
GetString "Please insert the file name with path."
Find Up "."
IfNotFound
"_00.txt"
Else
EndSelect
Key LEFT ARROW
"_00"
Key END
EndIf
SelectToTop
Clipboard 8
Cut
CloseFile NoSave
Loop 10
Clipboard 0
NewFile
Paste
PlayMacro 1 "FileNameNumber"
SaveAs "^c"
CloseFile NoSave
EndLoop
ClearClipboard
Clipboard 0
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Splitting Big Files

Postby wieland.korn » Fri Oct 29, 2010 12:48 am

Exellent!
Thank you Mofi!
I think the code segment - "Save As '^c'" is miracle.
The string "^c" maybe include more meanings.
Actually, I don't know what's that meaning.

By the way, if there is a huge file, maybe bigger than 100MB.
There are two types of information in it: one is ERROR, another is INFOR.
I don't know whether macro can do this thing or not:
I hope the macro can pick up the ERROR information to another file(i.e:error.txt) and at the same time pick up the INFOR information to a different file(i.e:infor.txt).
That mean macro should have concurrent processing ability.
wieland.korn
Newbie
 
Posts: 2
Joined: Mon Oct 25, 2010 1:14 am

Re: Splitting Big Files

Postby Mofi » Fri Oct 29, 2010 5:26 am

On UltraEdit help page Edit Macro command the command SaveAs is explained and there you can read what ^s and ^c mean. ^s is replaced during execution with currently selected text in the active file and ^c is replaced by content of the active clipboard.

If you want to copy all lines containing ERROR into a new file and you need to do this only once, you better don't use a macro, do it manually.

  • Go to top of file with pressing Ctrl+Home.
  • Press Ctrl+F to open the Find dialog and enter ERROR as search string. Uncheck all other standard settings.
  • Press button Advanced if the advanced options are not already visible.
  • Enable the option List Lines Containing String.
  • Execute the find with pressing button Next.
  • A dialog opens showing all lines containing the word ERROR, press button Clipboard and close the dialog.
  • Press Ctrl+N to open a new file and Ctrl+V to paste the copied lines into the new file. That's it.
I have posted a macro which does the same as above, see Search string and copy all found lines to clipboard.

This macro adapted to your needs is below. Red highlighted are small modifications and gray formatted the line not needed because of using an UltraEdit regular expression to find entire, DOS terminated lines containing the word ERROR. These modifications would not be really necessary, but make the macro faster.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
Clipboard 9
ClearClipboard
Loop
Find MatchCase RegExp "%*ERROR*^p"
IfFound
SelectLine
CopyAppend
Else
ExitLoop
EndIf
EndLoop
NewFile
Paste
ClearClipboard
Clipboard 0
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3937
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Macros