To compare text files paragraph by paragraph it is necessary to remove line breaks inside the paragraphs, so that the paragraphs are not only visually for humans a paragraph, but are also a paragraph for word processing, text editing and text compare applications. Text files with line breaks inside paragraphs are often produced by copying text from word processing applications (MS WORD, OpenOffice Write) or from PDF Readers into a text file or saving *.doc, *.pdf, etc. as text file, so that paragraphs in text files look like the soft wrapped text in the *.doc, *.pdf, etc.
That can be done in UltraEdit quite easily. Open both files (or a copy of both files), execute first
Format - Trim Trailing Spaces to get rid of whitespaces at end of lines and visually empty lines, next run
Format - Convert CR/LFs to Wrap. Additionally you might want to remove multiple spaces/tabs in the paragraphs which you can do with a regular expression replace as posted at
Remove double or extra spaces. Last at end of file the single space should be replaced by a line terminator to terminate also the last paragraph by pressing Ctrl+End, Shift+LEFT ARROW and RETURN.
I wrote quickly an UE/UES macro for that job with some additional commands. The macro property
Continue if search string not found must be checked for this macro to convert a bad formatted text file with line breaks inside paragraphs to a well formatted text file.
- Code: Select all
InsertMode
ColumnModeOff
HexOff
UnixReOff
Top
TrimTrailingSpaces
ReturnToWrap
TabsToSpaces
Find RegExp " +"
Replace All " "
Find RegExp "% +"
Replace All ""
Bottom
IfColNumGt 1
Key LEFT ARROW
IfCharIs 32
Delete
EndIf
InsertLine
EndIf
Top
Of course that UltraEdit/UEStudio macro is useful only for files containing real text and not for files containing program/script code or lists.