To compare text files paragraph by paragraph it is necessary to remove line breaks inside the paragraphs, so that the paragraphs are not only visually for humans a paragraph, but are also a paragraph for word processing, text editing and text compare applications. Text files with line breaks inside paragraphs are often produced by copying text from word processing applications (MS WORD, OpenOffice Write) or from PDF Readers into a text file or saving *.doc, *.pdf, etc. as text file, so that paragraphs in text files look like the soft wrapped text in the *.doc, *.pdf, etc.
That can be done in UltraEdit quite easily. Open both files (or a copy of both files), execute first Format - Trim Trailing Spaces
to get rid of whitespaces at end of lines and visually empty lines, next run Format - Convert CR/LFs to Wrap
. Additionally you might want to remove multiple spaces/tabs in the paragraphs which you can do with a regular expression replace as posted at Remove double or extra spaces
. Last at end of file the single space should be replaced by a line terminator to terminate also the last paragraph by pressing Ctrl+End, Shift+LEFT ARROW and RETURN.
I wrote quickly an UE/UES macro for that job with some additional commands. The macro property Continue if search string not found
must be checked for this macro to convert a bad formatted text file with line breaks inside paragraphs to a well formatted text file.
- Code: Select all
Find RegExp " +"
Replace All " "
Find RegExp "% +"
Replace All ""
Key LEFT ARROW
Of course that UltraEdit/UEStudio macro is useful only for files containing real text and not for files containing program/script code or lists.