Is comparing text files with completely ignoring line terminators possible?

Two- and three-way text compare and merge issues.

Is comparing text files with completely ignoring line terminators possible?

Postby YSLGuru » Fri Dec 10, 2010 2:02 pm

Why do the File Ignore Options (IGNORE BLANK LINES IGNORE BLANK SPACE, IGNORE CASE & IGNORE LINE TERMINATORS ) not work when comparing 2 files of the same type (text in this case) where the only difference between them is the number of spaces and/or case and/or lines?

For example if file 1 has this:

Code: Select all
See the brown dog run over the Red Moon

And File 2 has this:

Code: Select all
  SEE   the BROWN DOG
  run
over THE    RED
Moon

And I have all 4 File Ignore Options checked then to me these 2 files should logically be the same since the only difference between them is 1 or more of the 4 things that I told it to ignore. And yet UltraCompare does not see these 2 files as identical even when all 4 File Ignore Options are checked.

Does anyone know why?

Thanks
User avatar
YSLGuru
Basic User
Basic User
 
Posts: 17
Joined: Thu Mar 13, 2008 11:35 am
Location: Texas

Re: Is comparing text files with completely ignoring line terminators possible?

Postby Mofi » Sat Dec 11, 2010 5:48 am

That can be answered by reading the first sentence on help page Ignore Line Terminators (Options menu):

This item may be selected to allow the active compare to ignore line terminator differences (DOS/UNIX/MAC) when comparing files for differences.

A text compare in UltraCompare is always a line by line comparison. It is not possible to compare text with ignoring the line terminators completely and read/compare the entire file as single line text. The option Ignore Line Terminators is just for being able to compare files with different types of line terminators like a DOS text file with carriage return + line-feed as line terminator with a UNIX text file with just line-feed as line terminator.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4049
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Is comparing text files with completely ignoring line terminators possible?

Postby YSLGuru » Tue Jan 11, 2011 11:27 am

Mofi,

"It is not possible to compare text with ignoring the line terminators completely and read/compare the entire file as single line text."

UC might not be able to do this, but it is by design and therefore intentional and not that way simply because it is something that can’t be done period. I'm not disputing the problems with comparing 2 different file types, but when you have 2 files of identical type with simply different text you should be able to compare them and ignore the line especially in a program you pay for as opposed to free or shareware.

While this limitation in UC may seem like a dumb inquiry it is not and I doubt I am the only person to every wonder why UC cannot do this. One would think an application for comparison would be able to do this kind of thing.

Thank you for your time
User avatar
YSLGuru
Basic User
Basic User
 
Posts: 17
Joined: Thu Mar 13, 2008 11:35 am
Location: Texas

Re: Is comparing text files with completely ignoring line terminators possible?

Postby bulgrien » Tue Jan 11, 2011 10:31 pm

YSLGuru wrote:Why do the File Ignore Options (IGNORE BLANK LINES IGNORE BLANK SPACE, IGNORE CASE & IGNORE LINE TERMINATORS ) not work...

Actually, I can tell you why. Most compare tools, including UltraCompare, do a line-by-line comparison and will not recognize a single line on one side being spread over multiple lines on the other side. The "IGNORE LINE TERMINATORS" doesn't actually mean to ignore the line terminators (pretend that they are not there). It means, "Ignore different kinds of line terminators". In other words, if one file has DOS line terminators (CRLF) and another file has Unix line terminators (LF without the CR), then UltraEdit will not consider the files to be different simply because of the different line terminators.
User avatar
bulgrien
Master
Master
 
Posts: 92
Joined: Fri Dec 11, 2009 1:02 am
Location: Pennsylvania, USA

Re: Is comparing text files with completely ignoring line terminators possible?

Postby stjudeb » Wed Jan 12, 2011 10:15 am

So what you are saying is that the same document where word wrapping takes place in different places in the line will always have differences on every line. To get a better compare each paragraph should be unwrapped. That's the only solution?
stjudeb
Newbie
 
Posts: 1
Joined: Thu Jan 06, 2011 3:50 pm

Re: Is comparing text files with completely ignoring line terminators possible?

Postby Mofi » Wed Jan 19, 2011 3:23 am

To compare text files paragraph by paragraph it is necessary to remove line breaks inside the paragraphs, so that the paragraphs are not only visually for humans a paragraph, but are also a paragraph for word processing, text editing and text compare applications. Text files with line breaks inside paragraphs are often produced by copying text from word processing applications (MS WORD, OpenOffice Write) or from PDF Readers into a text file or saving *.doc, *.pdf, etc. as text file, so that paragraphs in text files look like the soft wrapped text in the *.doc, *.pdf, etc.

That can be done in UltraEdit quite easily. Open both files (or a copy of both files), execute first Format - Trim Trailing Spaces to get rid of whitespaces at end of lines and visually empty lines, next run Format - Convert CR/LFs to Wrap. Additionally you might want to remove multiple spaces/tabs in the paragraphs which you can do with a regular expression replace as posted at Remove double or extra spaces. Last at end of file the single space should be replaced by a line terminator to terminate also the last paragraph by pressing Ctrl+End, Shift+LEFT ARROW and RETURN.

I wrote quickly an UE/UES macro for that job with some additional commands. The macro property Continue if search string not found must be checked for this macro to convert a bad formatted text file with line breaks inside paragraphs to a well formatted text file.

Code: Select all
InsertMode
ColumnModeOff
HexOff
UnixReOff
Top
TrimTrailingSpaces
ReturnToWrap
TabsToSpaces
Find RegExp "  +"
Replace All " "
Find RegExp "% +"
Replace All ""
Bottom
IfColNumGt 1
Key LEFT ARROW
IfCharIs 32
Delete
EndIf
InsertLine
EndIf
Top

Of course that UltraEdit/UEStudio macro is useful only for files containing real text and not for files containing program/script code or lists.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4049
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Is comparing text files with completely ignoring line terminators possible?

Postby YSLGuru » Tue Dec 06, 2011 3:15 pm

Thanks for the suggestion Mofi. I realized I'm very slow to reply back, but I am just getting back to delaying with this quirk using UC.


Thanks again
User avatar
YSLGuru
Basic User
Basic User
 
Posts: 17
Joined: Thu Mar 13, 2008 11:35 am
Location: Texas


Return to Text Compare and Merge