I see I did not explain my situation properly. The newlines and spaces are displayed as "$" and "_" only
in cases where I edit ASCII files with UNIX line ends. A concrete instance of this would be what I attached (conference_trimmed.zip). It's the first lines of a document-class definition file for LaTeX. This file is supposed to use only ASCII characters (at least in its code lines) and should have unix line endings. Whenever I (re-)open this file, UE displays it like attached in unwanted_display.png.
Thank you for mentioning your update in the other
post. I remember our discussion there. Good work on finding out that if you add "Force UTF-8=1" to the [Settings] of uedit32.ini, UE will treat all non-Unicode files as UTF-8. I tried this out and it effectively solved
(circumvent) my display problem
. When I start UE with that setting and open the file, UE assumes it is a U8-UNIX rather than "UNIX" and hence I get the spaces and newlines displayed just the way I want (wanted_display.png), as is the case in any other UTF-8 document.
I don't mind the forced UTF-8. In files where only first 127 code points are used, UTF-8 (without BOM) is identical to ASCII. Luckily, I don't need to work with ANSI files using national 8bit codepages anymore. Even without forced UTF-8 in the ini settings, I never figured out why UE cannot display these files properly and why switching to another Code Page (View/Set Code Page ...) does not have any effect on the display of the opened file.
To address the other topics of your thorough answer:
… which type of files your are editing using UTF-8 encoding, but does not have a UTF-8 character set declaration at top, or UTF-8 encoded characters in the first 64 KB, nor (usually) a UTF-8 BOM
None. Really, none. Here I think you misunderstood. I wasn't complaining about anything concerning UTF-8. I just mentioned that when I work with UTF-8 files I see spaces like "·" and line ends like "¬" or "¶" (for \n or \r\n respectively), whereas when UE opens an ASCII file with UNIX line ends (i.e. \n) the spaces and line ends are displayed differently (like "_" and "$") and that was what I wanted to change.
Nevertheless I am more than happy to tell you which files I actually work on so the UE community sees that the best text editor around is not only used by programmers:I work mostly with linguistic stuff. Lots of files in this field don't have any UTF-8 declaration, but most of them don't mind having a BOM in their first three bytes — at least as long as you work with them in Windows.
More specifically, I write XeTeX source files. Theoretically, you could include an XML-like UTF-8 declaration in the comments somewhere within the first lines there, but I haven't seen anyone doing or recommending that. It is also not necessary because XeTeX requires the source files to be in any form of Unicode and can handle both BOM and NOBOM UTF-8. (But as soon as you get to the very internals which are pure (La)TeX, you don't write any special characters into the source files, so these are ASCII)
Another kind of UTF-8 files are annotated linguistic corpora of various languages. Many of them are written for a rather primitive software where there is no place for an encoding declaration in the source files. But again, since this is for Windows, both BOM and NOBOM UTF-8 is accepted and treated properly.
And there are other files containing nothing but texts (with special characters) with no place for encoding declaration. These may be raw parts of a text corpus or just some notes or intermediate files arising in various stages of a linguist's workflow. Again here the common practice is using UTF-8 BOM rather than an encoding declaration
Typically, all these files have some special characters within their first 64 KiB, except maybe for some bits and pieces of a multi-file XeTeX document which could as well be in ASCII and of course those files concerning the very internals of (La)TeX.
I also write some sripts in Python 3 but there UE really treats UTF-8 well because there is an explicit and recommended way how to declare the encoding at the beginning of the source file. (UE still cannot fold python's code, but that's worth another topic.)
UTF-8 BOM as strongly recommended by the Unicode working group to help applications reading such files from the beginning correct
I thought we agreed on that UTF-8 BOM is deprecated and not recommended when we discussed it in the other
you have it. I looked it up once again in the drafts of the Unicode 6.0 documents. There the relevant passage can be found in section 2.6 under the Table 2-4. Its wording has not changed: "Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature."
In my point of view, it is not UTF-8 BOM which should help the applications, but the applications should help themselves and make Unicode the new default character set
and assume that any text document from these days is most likely to be encoded in UTF-8, -16, or -32
. Even the most basic things like source codes of programs written in 7-bit ASCII are technically a subset of files written in UTF-8 (without BOM). It is sad that Windows Notepad and many other text editors assume one of the national 8-bit codepages as default. The same can be said about the very internals of operating systems like command line consoles, data storage and file systems. Having grown on computers with English, Czech and German software I have been going through almost every kind of character set and character encoding problem there is (since MS-DOS times) and I still don't see the end of it.
I see that Unicode has its own drawbacks and problems which are not trivial, but still, it makes so many things for the majority of languages and their speakers so much easier.
That a DOS text file contains also just \n or just \r is very, very unusual
I agree, but again, I wasn't talking about DOS files with \n's or \r's as line ends. I think we know that there are three line ending standards in the computer world. Unix system use \n, early Macintosh systems used \r, and Microsoft systems use \r\n. UE can treat all three kinds very well and displays which kind of line ends are being used in the document by saying "UNIX", "MAC", or "DOS" in the status line. The strange thing about it it how the newlines are being displayed.
I don't understand UE's system behind the type of newlines and the way they are displayed ("¬", "¶", or "$"). It seems to depend on so many things, including how UE should open the file (Open As, Format). I can only say with certainty that any document of which UE thinks it's UTF-8 shows "¬" for \n but "¶" for \r\n. A visual distinction of the newline type is a great help since there are cases where it matters. This way the difference is more apparent than just few letters in the status line. Since I have worked mainly with UTF-8 files I have got so used to it that I was confused by the way newlines (and spaces) are displayed when actually editing something different.