Well, the charset specification comes from you and of course you have to make sure that the characters are really encoded according to the charset declaration at top of the HTML5 file. You can see in the status bar at bottom of the UltraEdit main window which encoding is used currently by UltraEdit for a file.
UTF-8 (new status bar in UE v19.00) or
U8- (basic status bar in UE v19.00 and all previous versions of UE) indicate a UTF-8 encoding of the file. Just the line terminator type (DOS, UNIX, MAC) or an ANSI code page (new status bar in UE v19.00) means ANSI encoding.
Character encodings on W3C website explains how character set respectively encoding should be declared in an
HTML,
XHTML and
XML file.
UltraEdit detects UTF-8 encoded files by
- UTF-8 BOM at beginning of a file (not recommended for HTML files)
- One of the following four strings is found at top of the file (within the first 1024 bytes):
charset=UTF-8, charset=utf-8, encoding="UTF-8, encoding="utf-8 - Within the first 64 KB at least one byte sequence is found which looks like a UTF-8 character encoding sequence.
As it can be read at
HTML 5.1 Nightly - Specifying the document's character encoding the short character set as you use can be used also for HTML5. But as
charset="utf-8 is not recognized yet by UltraEdit, the HTML5 file is opened as ASCII/ANSI file if there is no UTF-8 byte sequence within the first 64 KB.
Entering now a character with a code value greater 127 results in using a wrong encoding for this character in comparison to the character set declaration at top of the HTML5 file.
Solution:
- Select Create new files as UTF-8 at Advanced - Configuration - Editor - New File Creation.
- Uncheck at Advanced - Configuration - File Handling - Save
Write UTF-8 BOM header to all UTF-8 files when saved
and
Write UTF-8 BOM on new files created within this program - While UltraEdit is not running, open %appdata%\IDMComp\UltraEdit\uedit32.ini with Notepad and add to group [Settings] a line with Force UTF-8=1 and save the modified INI.
Now new files are by default encoded in UTF-8 as required for your HTML5 files. And all files not detected as UTF-16 encoded files are interpreted now always as UTF-8 encoded files.
If you need to open an ASCII/ANSI encoded file like an UltraEdit script file, you have to use the
Open As option with
ASCII selected in the File Open dialog to overwrite the
Force UTF-8=1 setting for such files.
I have sent an enhancement request to IDM support by email for supporting also HTML5 character set declarations. Best you do the same so that request count is already 2. The more users request an enhancement, the higher becomes the priority for being implemented.