UE symbol explanations for line teminators

Display customization and font issues

UE symbol explanations for line teminators

Postby slouw » Tue Feb 21, 2012 5:29 pm

Is there a reference anywhere for symbols in UE?
I did a couple of searches in the help and here but could not find something useful.

I have saved a .docx file to text and when I open the text file I see these symbols for EOL
Image

When I paste in parts of this text into a fresh word file word is not interpreting the EOL properly.
I am also aware of possibe conversions as shown below.
I tried the "UNIX/MAC to DOS" conversion and it seemed to help.
Are the symbols highlighted in yellow then UNIX or MAC end of line characters?
Image
slouw
Newbie
 
Posts: 3
Joined: Tue Mar 24, 2009 4:18 pm

Re: UE symbol explanations for line teminators

Postby Mofi » Wed Feb 22, 2012 6:52 am

With View - Show Line Endings (UE v13.10 and later) respectively View - Show Spaces/Tabs (UE < v13.10) following is displayed for the various line termination types:

The paragraph sign (decimal: 182, hex.: 0xB6) is used for DOS line terminations which is carriage return plus linefeed.

The not sign ¬ (decimal: 172, hex.: 0xAC) is used for UNIX line terminations which is a linefeed only.

The plus-minus sign ± (decimal: 177, hex.: 0xB1) is used for MAC line terminations which is a carriage return only.

The symbols displayed depend on the font set and the selected script (code page). That's the reason why I have added also the decimal and hexadecimal value of the characters because the symbols can be different when not using a font like Courier New and a different script than Western (ANSI 1252, Latin I).

Carriage return is often abbreviated with CR, must be coded in Unix/Perl regular expressions and in Javascript strings with \r and in UltraEdit regular expression and in non regular expression Finds/Replaces with ^r.

Linefeed is often abbreviated with LF, must be coded in Unix/Perl regular expressions and in Javascript strings with \n and in UltraEdit regular expression and in non regular expression Finds/Replaces with ^n.

^p can be used for the pair carriage return plus linefeed in UltraEdit regular expression and in non regular expression Finds/Replaces. In Unix regular expressions \p can be used for this control character pair. In Javascript and in Perl regular expressions \r\n must be used because no separate definition for this character sequence.


Carriage return only is often used by Microsoft Office applications for line breaks within a paragraph or table cell. Line break is not equal end of paragraph in word processing applications. A line break can be inserted within a paragraph in MS Word with Shift+Return.

In MS Excel tables a line break inserted within a cell with Alt+Return results on saving the table as CSV file in having a carriage return without a linefeed within a value. Copying table data from MS Word or MS Excel are copied always in CSV format into the text version of the clipboard.

According to CSV specification it is absolutely no problem if a field value contains CR, LF or CRLF for a line break if the field value is enclosed in double quotes. That does not make the CSV invalid according to CSV specifications. But many, really many programmers have implemented CSV file reading poorly and interpret every line terminating character as end of data row. And also many programmers have coded the export of CSV files also poorly because field values with line terminating characters are not enclosed in double quotes (and quotes within such values escaped with an additional double quote character).

For details on CSV see Wikipedia article Comma-separated values.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4049
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: UE symbol explanations for line teminators - Thanks Mofi

Postby slouw » Wed Feb 22, 2012 3:09 pm

As ever thankyou Mofi. Great reference with even better useful detail about Word/Excel. Thank you sir :)
slouw
Newbie
 
Posts: 3
Joined: Tue Mar 24, 2009 4:18 pm


Return to Editor Display

cron