Macro for large file - convert to individual chapters

Help with writing and playing macros

Macro for large file - convert to individual chapters

Postby TheChipstar » Sat Sep 08, 2007 11:54 pm

Hi. I need help with a macro please.

The file looks like this: (on a smaller scale of what I am wanting to do)

Input:
Code: Select all
Psalms

1 Happy is the man that has not walked in the counsel of the wicked ones,
And in the way of sinners has not stood,
And in the seat of ridiculers has not sat.

 2 But his delight is in the law of Jehovah,
And in his law he reads in an undertone day and night.

 3 And he will certainly become like a tree planted by streams of water,
That gives its own fruit in its season
And the foliage of which does not wither,
And everything he does will succeed.

 4 The wicked are not like that,
But are like the chaff that the wind drives away.

 5 That is why the wicked ones will not stand up in the judgment,
Nor sinners in the assembly of righteous ones.

 6 For Jehovah is taking knowledge of the way of righteous ones,
But the very way of wicked ones will perish.

2 Why have the nations been in tumult
And the national groups themselves kept muttering an empty thing?

 2 The kings of earth take their stand
And high officials themselves have massed together as one
Against Jehovah and against his anointed one,
 3 [Saying:] “Let us tear their bands apart
And cast their cords away from us!”
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am

Re: Macro for large file - convert to individual chapters

Postby TheChipstar » Tue Sep 18, 2007 8:55 am

Thanks heaps Mofi for the macro. (And the point of .PNG file format for images.)

Here are the results... both the input html source code and the output, once your macro has been run (attached archive file already deleted). It is almost there... any help to get it to finish off the macro would be much appreciated.
I see that chapter 2 is in the same output file as chapter one as well as others (see examples).

Thanks again Mofi. You're the man!
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am

Re: Macro for large file - convert to individual chapters

Postby Mofi » Tue Sep 18, 2007 1:45 pm

We could have saved a lot of time if you would have first attached the HTML source file. I have deleted all posts except the first one and your post with the real source.

I have completely rewritten the macros. Now there are 2 macros which you can combine to 1 macro if you want. The macro property Continue if a Find with Replace not found or Continue if search string not found must be checked for both macros.

The first macro converts the HTML file to a text file. To later find the chapters, it does this conversion with inserting a page break (^b) immediately before every chapter number as you wanted to do manually as you have written in your first post.

Note: The space character inside the replace command below the Find MatchCase "&bnsp;" is not a normal space. It is the non breaking space (decimal 160, hex A0). Check that before copying the macro code into the edit macro dialog with Search - Character Properties with the cursor on left side of the non breaking space.

After this conversion the first macro selects whole file and exits. Why?

Well, the output after deleting all the HTML elements is not very beautiful for reading. So it would be a good idea to reformat all paragraphs with command Format - Reformat Paragraph with appropriate settings which you can specify at Format - Paragraph Formatting - Paragraph Setup/Formatting. A reformating of selected paragraphs is not possible via macro or script. It must be done manually. The paragraph settings are saved in uedit32.ini. You only have to specify it once.

If you don't want or need the paragraph reformatting, you can delete the last command from first macro and the first 4 commands of the second macro and combine the 2 macros to 1.

Macro WordHtml2Text

InsertMode
ColumnModeOff
HexOff
Top
Find "<meta name=Generator content="Microsoft Word"
IfNotFound
ExitMacro
EndIf
Top
UnixReOff
StartSelect
Find RegExp Select "<body*>"
Delete
EndSelect
Find MatchCase "&nbsp;"
Replace All " "
Find MatchCase RegExp "<b><span style='font-size:9.0pt;font-family:Arial'>^([0-9]+^)</span></b>"
Replace All "^b^1"
Find RegExp "<[~>]+>"
Replace All ""
StartSelect
Find RegExp Select "[~ ^t^p]"
Key LEFT ARROW
Delete
EndSelect
Bottom
StartSelect
Find RegExp Up Select "[~ ^t^p]"
Key RIGHT ARROW
Key RIGHT ARROW
Delete
EndSelect
SelectAll

The second macro does the job you wanted first: split the now perfect marked chapters up to several files each containing 1 chapter with an appropriate name in same folder as the original HTML file or a default folder if the HTML source is not saved once.

Macro Split2Chapters

InsertMode
ColumnModeOff
HexOff
UnixReOff
Top
Clipboard 9
Find RegExp "[a-z]*$"
Copy
EndSelect
Key END
Clipboard 8
CopyFilePath
NewFile
Paste
Find Up "\"
Replace "\"
IfFound
DeleteToEndofLine
Else
"C:\"
EndIf
Clipboard 9
Paste
TrimTrailingSpaces
Bottom
"_.txt"
SelectAll
Copy
CloseFile NoSave
Clipboard 8
Loop
Find "^b"
IfNotFound
ExitLoop
EndIf
Key RIGHT ARROW
Key LEFT ARROW
StartSelect
Find Select "^b"
IfSel
Key LEFT ARROW
Copy
EndSelect
Key RIGHT ARROW
Key LEFT ARROW
Else
EndSelect
Key RIGHT ARROW
Key LEFT ARROW
SelectToBottom
Copy
EndSelect
Key UP ARROW
Bottom
EndIf
NewFile
Paste
" "
StartSelect
Find RegExp Up Select "[~ ^t^p]"
Key RIGHT ARROW
Key RIGHT ARROW
Delete
EndSelect
Top
Find RegExp "[0-9]+"
Cut
"("
Paste
")"
Top
Clipboard 9
Paste
Key LEFT ARROW
Key LEFT ARROW
Key LEFT ARROW
Key LEFT ARROW
Clipboard 8
Paste
Top
Find RegExp "%*.txt"
Cut
SaveAs "^c"
CloseFile
EndLoop
ClearClipboard
Clipboard 9
ClearClipboard
Clipboard 0
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Macro for large file - convert to individual chapters

Postby TheChipstar » Tue Sep 18, 2007 8:14 pm

You are the MAN Mofi!!!
Thanks so much for that!

I didn't even need to reformat in... was fine how it is!

Sorry again for mucking you around. I have learnt a lot on how to post my problems, and I know the if there's a next time... I willl make it a lot more simplier for you.


Thanks again Mofi!
The macro king!
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am

Re: Macro for large file - convert to individual chapters

Postby TheChipstar » Thu Oct 04, 2007 5:27 am

Hey Mofi.

Is there any chance of changing the naming part of the macro when it saves it?
From:
"Genesis_1"
To:
"Genesis_001"

So...
"Genesis_2" becomes "Genesis_002"
"Genesis_3" becomes "Genesis_003"
.......................
"Genesis_50" becomes "Genesis_050"
and so on.

Thanks in advance.
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am

Re: Macro for large file - convert to individual chapters

Postby Mofi » Thu Oct 04, 2007 12:10 pm

Yes, this can be done. Normally I would do such a file renaming with Total Commander's Multi-Rename Tool which is incredible powerful, but extremly easy to use to rename thousands of files with a view mouse clicks within 20 seconds.

However, why should not the macro save the files with the preferred naming scheme when it is possible. So here is the solution.

First you have to create a new macro named SaveChapterFile.

Attention: The name is case-sensitive.

And this macro must be saved in the same file as the other 2 macros or the merged macro. It is important that you first create this sub macro.

An additional macro is required because nesting of loops (loop inside another loop) is not possible in the macro environment. A inner loop is necessary to insert the corret number of leading zeros into the file name of the current chapter based on the number of digits of the last (=highest) chapter number.

As the name of the new sub macro already indicates, it creates the file name for the actual file and saves it.

Macro SaveChapterFile

Top
Clipboard 9
Paste
Paste
Key UP ARROW
Find RegExp "0+$"
Replace ""
Clipboard 8
Paste
Loop
Key UP ARROW
IfCharIs "0"
Key DOWN ARROW
Find Up "_"
Replace "_0"
Key END
Else
ExitLoop
EndIf
EndLoop
DeleteLine
Key END
".txt"
SelectToTop
Cut
EndSelect
Delete
SaveAs "^c"
CloseFile

Okay, after creating this macro, the code of the existing macro Split2Chapters must be completely replaced with the following code:

InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
Find Up "^b"
EndSelect
Key LEFT ARROW
Key RIGHT ARROW
SelectWord
Clipboard 7
Copy
EndSelect
Top
Clipboard 9
Find RegExp "[a-z]*$"
Copy
EndSelect
Key END
Clipboard 8
CopyFilePath
NewFile
Clipboard 7
Paste
SelectToTop
Find RegExp "[0-9]"
Replace All SelectText "0"
Cut
EndSelect
Clipboard 8
Paste
Find Up "\"
Replace "\"
IfFound
DeleteToEndofLine
Else
"C:\"
EndIf
Clipboard 9
Paste
TrimTrailingSpaces
Bottom
"_"
Clipboard 7
Paste
ClearClipboard
Clipboard 9
InsertLine
SelectAll
Copy
CloseFile NoSave
Clipboard 8
Loop
Find "^b"
IfNotFound
ExitLoop
EndIf
Key RIGHT ARROW
Key LEFT ARROW
StartSelect
Find Select "^b"
IfSel
Key LEFT ARROW
Copy
EndSelect
Key RIGHT ARROW
Key LEFT ARROW
Else
EndSelect
Key RIGHT ARROW
Key LEFT ARROW
SelectToBottom
Copy
EndSelect
Key UP ARROW
Bottom
EndIf
NewFile
Paste
" "
StartSelect
Find RegExp Up Select "[~ ^t^p]"
Key RIGHT ARROW
Key RIGHT ARROW
Delete
EndSelect
Top
Find RegExp "[0-9]+"
Cut
"("
Paste
")"
PlayMacro 1 "SaveChapterFile"
EndLoop
ClearClipboard
Clipboard 9
ClearClipboard
Clipboard 0

In the main loop only the file name creating and file saving part is replaced now by the command PlayMacro because sub macro SaveChapterFile does this part now. Main changes are at top of this macro where the last chapter number is searched, then all digits of it are replaced with zeros to get the correct number of digits and those zeros are appended now to the file name after the underscore. There is no file extension anymore appended in this macro. This is done later in sub macro SaveChapterFile. And the string in clipboard 9 with "path\book title_000" is now a real line with a line termination instead of only a string.

Macro WordHtml2Text is not modified. If you again want to merge macro WordHtml2Text with Split2Chapters, you have again to delete last command (line) of WordHtml2Text and the first 4 commands (lines) of Split2Chapters.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Macro for large file - convert to individual chapters

Postby TheChipstar » Fri Oct 05, 2007 3:54 am

It didn't work Mofi. I get the same output file name "Genesis_1".
Don't worry about it though.

I will also take a look at that multi-rename tool too.

Cheers.
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am

Re: Macro for large file - convert to individual chapters

Postby Mofi » Fri Oct 05, 2007 6:26 am

You must have done something wrong because I have tested it with your "before HTML file" and the macro created Genesis_01.txt to Genesis_10.txt because 10 is the last chapter number in the HTML file.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Macro for large file - convert to individual chapters

Postby TheChipstar » Fri Oct 05, 2007 8:00 am

Mofi wrote:You must have done something wrong because I have tested it with your "before HTML file" and the macro created Genesis_01.txt to Genesis_10.txt...


Fair enough, it has done that I mine too (now). But my post above didn't ask for Genesis_01.txt, it asked for Genesis_001.txt through to Genesis_050.txt (and then onto Genesis_176.txt which would be the highest chapter number).
With your above macro... it creates Genesis_01.txt, Genesis_02.txt, Genesis_03.txt, Genesis_04.txt...... Genesis_10.txt,
and then after this...
Genesis_11.txt, Genesis_12.txt, Genesis_13.txt, and so on.
It's missing a zero basically.

Not really any need to worry about it if the marco can't do it.
Thanks tonnes anyway.
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am

Re: Macro for large file - convert to individual chapters

Postby Mofi » Fri Oct 05, 2007 8:49 am

I have tested the macros again with your test file where I have changed last chapter number 10 to 176. And the macro creates Genesis_001.txt, ..., Genesis_009.txt, Genesis_176.txt. So it works perfect.

Your HTML source hopefully contain all 176 chapters and chapter 176 is the last one in the file.
Attachments
chapters.zip
Archive contains Chapters.mac - the macro file with the macros SaveChapterFile, Split2Chapters, WordHtml2Text and CreateChapters (WordHtml2Text and Split2Chapters merged).
(1.37 KiB) Downloaded 232 times
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Macro for large file - convert to individual chapters

Postby TheChipstar » Mon Oct 08, 2007 5:05 am

Mofi wrote:You must have done something wrong because I have tested it with your "before HTML file" and the macro created Genesis_01.txt to Genesis_10.txt...


Mofi wrote:I have tested the macros again with your test file where I have changed last chapter number 10 to 176. And the macro creates Genesis_001.txt, ..., Genesis_009.txt, Genesis_176.txt. So it works perfect.


Yes Mofi, we are getting the same results. But these results are slightly wrong.
Let me explain.
At the moment, if I have chapters ranging from Genesis 1 to Genesis 10 (ie: 1 digit to 2 digits) then my output is like this:
Code: Select all
Genesis_01.txt
Genesis_02.txt
Genesis_03.txt
Genesis_04.txt
Genesis_05.txt
Genesis_06.txt
Genesis_07.txt
Genesis_08.txt
Genesis_09.txt
Genesis_10.txt

But if I have chapters ranging from Genesis 1 to Genesis 176 (or just change the number 10 to 176) then the output is like this:
Code: Select all
Genesis_001.txt
Genesis_002.txt
Genesis_003.txt
Genesis_004.txt
Genesis_005.txt
Genesis_006.txt
Genesis_007.txt
Genesis_008.txt
Genesis_009.txt
Genesis_176.txt


To me (and correct me if I'm wrong) it seems, that what ever the number of digits the last chapter has, this is what determines how many digits the saved txt file has. For example:
If I have Genesis chapter 1 through to chapter 9 then the output is like this:
Code: Select all
Genesis_1.txt
Genesis_2.txt
Genesis_3.txt
Genesis_4.txt
Genesis_5.txt
Genesis_6.txt
Genesis_7.txt
Genesis_8.txt
Genesis_9.txt

If I have Genesis chapter 1 through to chapter 10 then the output is like this:
Code: Select all
Genesis_01.txt
Genesis_02.txt
Genesis_03.txt
Genesis_04.txt
Genesis_05.txt
Genesis_06.txt
Genesis_07.txt
Genesis_08.txt
Genesis_09.txt
Genesis_10.txt

And if I have Genesis chapter 1 through to chapter 100 then the output is like this:
Code: Select all
Genesis_001.txt
Genesis_002.txt
Genesis_003.txt
Genesis_004.txt
Genesis_005.txt
Genesis_006.txt
Genesis_007.txt
Genesis_008.txt
Genesis_009.txt
..........................
Genesis_100.txt

Is this correct?

But this is not what I want...
As I originally asked (I'm not rubbing it in, just trying to cover my back, because I know I mucked you around earlier in this post) I want three numbers always, no matter if there is only one chapter or 176 chapters.

Basically:
If the chapter range is between 1 and 9 then.....
between 1 and 9 they will have two zero's preceeding them. (ie: 001, 002, 003, ..., 009)
If the chapter range is between 1 and 99 then.....
between 1 and 9 they will have two zero's preceeding them THEN between 10 - 99 they will have one zero preceeding them. (ie: 001, 002, 003, ..., 009, 010, 011, 012..., 099)
If the chapter range is between 1 and 999 then.....
between 1 and 9 they will have two zero's preceeding them THEN between 10 - 99 they will have one zero preceeding them THEN between 100 - 999 they will have no zero's preceeding them (ie: 001, 002, 003, ..., 009, 010, 011, 012..., 099, 100, 101, 102, ..., 999)


Sorry if you don't really understand that.
Genesis has 50 chapters total, whereas Psalms has 176; but I want both to be saved with 3 numbers (ie: same file name length).
In English: I want both a minimum and maximum of three digits ALWAYS.
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am

Re: Macro for large file - convert to individual chapters

Postby Mofi » Mon Oct 08, 2007 6:50 am

Yes, the macros are written to dynamically use the number of digits of the highest (last) chpater number to store all files with the same number of digits depending on the highest number. That's what I have supposed you want and what I have written in the explanation for re-written macro Split2Chapters

Now you tell me the first time that you want the chapter number in the file name always with exactly 3 digits.

Okay, no problem. That makes macro Split2Chapters more easily. You should really try to understand the macros to adapt it to your needs by yourself when necessary. It is not so difficult to understand the macros as you know the input and the output.

Here is the upper part of the macro Split2Chapters till command Loop which prepares the file name now for a fixed number of digits of 3 for the chapter number in the file name.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Top
Clipboard 9
Find RegExp "[a-z]*$"
Copy
EndSelect
Key END
Clipboard 8
CopyFilePath
NewFile
Paste
Find Up "\"
Replace "\"
IfFound
DeleteToEndofLine
Else
"C:\"
EndIf
Clipboard 9
Paste
TrimTrailingSpaces
Bottom
"_000
"
SelectAll
Copy
CloseFile NoSave
Clipboard 8
Loop

I have uploaded in my previous post an updated ZIP archive which contains macro file ChaptersFixed.mac where this modification is done in Split2Chapters and CreateChapters.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Macro for large file - convert to individual chapters

Postby TheChipstar » Mon Oct 08, 2007 7:33 am

TheChipstar wrote:"Genesis_2" becomes "Genesis_002"
"Genesis_3" becomes "Genesis_003"
.......................
"Genesis_50" becomes "Genesis_050"


Yes... this is what I originally asked for, sorry for the confusion.
I just think you like the challenge and so went for the harder option. Haha.

And yes, I am slowly learning the language. I can recognize similar functions to VBA coding, so I'm getting there.

Thanks once again, you didn't even have to do any of this at all! So I appreciate it!
Thanks Mofi.
User avatar
TheChipstar
Newbie
 
Posts: 8
Joined: Wed Dec 13, 2006 12:00 am


Return to Macros