Today I have had time to look into the UTF-8 problems you described.
First what I think is the reason why you and other users think UltraEdit has problems handling UTF-8 files without BOM.
A UTF-8 file without BOM is 100% identical with an ASCII file, if it does not contain at least 1 character with ASCII code greater than 0x7F (dec. 127) and so must be saved in a UTF-8 file with a 2 byte encoding like the German umlauts äöü. So if a file without BOM does not contain any multi-byte character, it is interpreted as ASCII file and this is 100% correct.
But UltraEdit is handling the Character encodings
correctly. If the file contains either the string charset=utf-8
(HTML, PHP, ASP, ...) or encoding="utf-8"
(XML) at top of the file in the first few KB, UE is handling the file as UTF-8 file independent of the existence of a UTF-8 multi-byte character. So although for example an English webpage does not contain any character encoded in UTF-8 and so could be also an ASCII file, UE is nevertheless loading and handling it as a UTF-8 file if it contains one of the 2 encoding specification strings.
I tested saving a new file with Ctrl+S and with Save As with the format UTF-8 - NO BOM. I can't see a difference. I have done following tests:1)
Open a new file and save it immediately with Ctrl+S with format UTF-8 - NO BOM. As everybody can see in the status bar of UltraEdit, UE is still handling the file as DOS file and not as UTF-8 file. This is correct, because the 0 byte file is according to the international encoding standard still not a UTF-8 file. You maybe have a different opion here because you think, "I have specified it as UTF-8, so UE should handle it as UTF-8 until saved and reopend". But this is not correct according to the international encoding standard because you have not really specified it as UTF-8 file.2)
Open a new file and save it immediately with Save As with format UTF-8 - NO BOM. Same result as at 1), the empty file is still a DOS file.3)
Open a new file, enter a few ASCII characters all with a hexadecimal code lower than 0x80 and save the new file with format UTF-8 - NO BOM. According to the status bar UE handles it still as DOS file. According to the international encoding standard this is correct even if you think it is a bug. It isn't. The cursor position is not changed after this first save. There is no difference between Ctrl+S and Save As because first save a new file always opens the Save As dialog.4)
Open a new file, enter a few ASCII characters and also at least 1 character with a hexadecimal code higher than 0x80 like Ä and save the new file with format UTF-8 - NO BOM. According to the status bar UE handles it now as a UTF-8 file (U8-DOS with my settings).
You can see what really happens in this situation if you look the file content temporarily in hex mode before save and temporarily also look it again in hex mode after save.
Attention: Do not save the new file while you are in hex mode. Just enable the hex mode temporarily before save and after save.
The file is converted from 1 byte per character before save to a Unicode UTF-16 LE file with BOM and 2 byte per character after save. The cursor position has changed to top of the file after the first save because of the automatic conversion in background.5)
Open a new file, enter a few ASCII characters and also the string charset=utf-8
and save the new file with format UTF-8 - NO BOM. According to the status bar UE handles it now as a UTF-8 file although the file does not contain any character which is really encoded as multi-byte character. The file is also converted and handled temporarily now after save as UTF-16 LE with BOM. The cursor position has changed to top of the file after the first save because of the automatic conversion in background.
Conclusion: UltraEdit handles new files as UTF-8 files 100% correct according to the international encoding standard.
Johna and you have 2 problems caused by "wrong" UTF-8 handling.
It is not possible to open a file which does not contain whether a correct encoding specification nor at least 1 multi-byte character and insert by keyboard or paste from clipboard now characters which must be encoded in UTF-8. If these characters don't have an ANSI equivalent in the selected codepage of the currently used font (a single byte with hex code lower 0x100), you will not see those characters correctly.
The file is loaded as ASCII file according to the international encoding standard. As long as you do not convert it manually to a UTF-8 file (in real temporarily to a Unicode UTF-16 LE file), you cannot insert or paste characters which simply need 2 bytes.
And the second very similar problem is, that you can also not insert multi-byte encoded characters into a new file as long as it is not a real UTF-8 file according to the international encoding standard which is correctly indicated in the status bar of UltraEdit.
The second problem can be easily avoided. UTF-8 is a byte optimized version of Unicode. So if you most of the time want to create new files in UTF-8 format, enable the option Configuration - Editor - New File Creation - Always create new files as UNICODE
. Now a new file is by default a UTF-16 LE file as every loaded UTF-8 file is also while editing. With the format UTF-8 - NO BOM in the Save As dialog the new file is then automatically saved as you want. The 5 tests above has been done with this option not checked to make it more difficult for UE as necessary.
It's correct that templates cannot contain characters which must be saved with 2 bytes because they have no single byte equivalent. The template file of UltraEdit is still a binary file where only single byte characters are possible. Changing the format of the template file just for support of a few 2 byte characters would be a hard work. You have to take also all the thousands of existing template files of UltraEdit users into consideration which are already satisfied with the current format. And the downwards compatibility will be also lost. I think you will understand now why IDM will not change the format of the template file because a few users think, they need it.
And you don't really need it. Write your templates for a new PHP, CSS, HTML, ... but don't forget to add to the template also the correct encoding specification. The templates must not have a special character, only the correct encoding specification.
Then you can use the templates on new files and after first save with the format UTF-8 - NO BOM the file is automatically converted by UltraEdit to UTF-8 (UTF-16 LE). But don't forget, first save the new UTF-8 file with no BOM but with the encoding specification before you insert manually or from clipboard a character which must be encoded with 2 bytes. Best is to use 1 or more macros for that job. An example:
or without immediately saving the new file
Template 4And the Format selected in the Save As dialog is UTF-8 - NO BOM.ASCIItoUnicode
is only needed if Always create new files as UNICODE
is not checked.
Template 4 for example contains your standard body for new PHP files with the charset=utf-8 encoding specification string in the HTML header. I should add, that UltraEdit is not examining where either charset=utf-8
is found. If the string is for example inside a PHP comment, UltraEdit will also interpret it as valid encoding specification. Don't know if the PHP interpreters or the browsers except the encoding specification only in the correct environment or also anywhere in the file like UltraEdit.
To your last question: No it is not possible to add macros to the menu or a toolbar. But I never missed it because there is the macro list view at View - Views/Lists - Macro List
. Activated by a click on it in the menu or by a hotkey you have assigned to this command or by a click on its symbol in the toolbar after you have added this command to the toolbar, it opens the macro list in a docked or undocked window as you have specified it on last usage. Then you will see all the macros of the macro file currently loaded and you can run the macro you currently need to create a new PHP or a new CSS or a new ??? file with a double click or with the Return/Enter key if a macro in the macro list has the focus.
What I think IDM could do to help wegpage writers who use UTF-8.
First an ASCIItoUTF8 macro command could be very helpful.
Second a file loading configuration option like "Create and load ASCII files as UTF-8" would be helpful for some users like you.
With such an option checked a new and also an existing ASCII file is automatically loaded and handled as UTF-8 file without BOM (internally in UE as UTF-16 LE) and so saved also as UTF-8 file without BOM. A real ASCII file wihout any character with a code higher 0x7F will be after closing still an ASCII file and not a UTF-8 file, it it still does not contain the utf-8 encoding specification.
I have never requested the macro command and the configuration option, because I personally don't need it. Especially the config option would never be checked by me because I rarely edit or create UTF-8 files, but daily work with ASCII files with characters with a code greater 0x7F - German characters äüöÄÖÜß with OEM or ANSI code.
So if there are webpage writers who would need these 2 things, they all should write an appropriate feature request email to IDM support.
My suggestions for the configuration for UTF-8 webpage writers:
First read the FAQ about UTF-8, UTF-16, UTF-32 & BOM
and the Character encodings
to get the basic knowledge you need.
Second in UltraEdit or UEStudio open Configuration - File Handling
and set following options:Conversions
Uncheck the 2 EBCDIC options if you are not editing EBCDIC files, but check the option On Paste convert line ending to destination type (UNIX/MAC/DOS)
Set the Default file type for new files
to whatever you prefer. If your host server is a Linux/Unix server, you should use Unix
to avoid problems while downloading or uploading via FTP. If your host server is a Windows server, use DOS
Set the Unix/Mac file detection/conversion
to Automatically convert to DOS format
to avoid problems with copy and paste with other windows applications.
Uncheck Only recognize DOS terminated lines (CR/LF) as new lines for editing
Uncheck Write UTF-8 BOM header to ALL UTF-8 files when saved
If Write UTF-8 BOM on new files created within this program (if above is not set)
should be enabled or not depends on the type of Unicode files you are creating. If you create for example only XML and HTML type files (HTML, HTML, PHP, ASP, ...) in UTF-8, you should uncheck this option, because then the encoding should be defined inside the file with encoding="utf-8"
(XML) or with content="text/html; charset=utf-8"
(HTML). See FAQ above for details about BOM and when it should be used.
Enable Save file as input format (UNIX/MAC/DOS)
. That's important because we convert every file automatically to DOS for editing, but we want to save it in the original format and not in DOS format. This option is moved from the Save to the DOS/UNIX/MAC Handling configuration dialog in v12.10 of UltraEdit!
You can set option Trim trailing spaces on file save
to whatever you prefer. Normally it is good to activate it because it can reduce the file size a little bit which is interesting for HTML files.Temporary Files
Use the second option Open file without temp file but prompt for each file
and set the Threshold
for example to 4096 (4 MB). You can set the threshold value to a higher value if your computer has enough performance and your harddisk is fast and you often edit large files.Unicode/UTF-8 Detection
Enable Auto detect UTF-8 files
, Detect Unicode(UTF-16) files without BOM
and Detect ASCII/ANSI files with Escaped Unicode
. You can disable for example the UTF-16 detection if you are sure that you will never edit a UTF-16 file. Every enabled detection increases the file load time of normal ASCII files. But if you don't know what format your files have, it is better to let UE/UES automatically detect it.
The 3rd option Disable automatic detection of HEX file format on reload
is not important for handling Unicode files.
And as already explained above also enable the option Always create new files as UNICODE
at Editor - New File Creation
Last if you download/upload the files via the FTP client of UE/UES, always use the binary transfer mode and not the text mode. If your files on your Apache (Unix/Linux) host server are already Unix files, than UE/UES is converting a file temporary for editing only into DOS after loading from FTP and before opening in the editor and before saving back to Unix with the settings above. So there is no need to do it while transfering the file content. Local copies are then also Unix files and so are 100% identical with the files on the server. Using binary transfer mode is faster than the text/ASCII mode. Even if you don't use the FTP client of UE/UES and use a different FTP tool, you should always create and edit files with Unix line termination and use the binary transfer mode and the automatic conversion to DOS feature of UE/UES except your host server is a Windows server.Added on 2009-11-09:
I have found an undocumented setting in uedit32.exe of v11.10c and later. With manually adding to uedit32.ini[Settings]
you can force all non Unicode files (not UTF-16 files) to be read/saved as UTF-8 encoded files. But new files are nevertheless created and saved either as Unicode (UTF-16 LE) or ASCII/ANSI files. So this special setting is only for already named files. However, creating a new file in ASCII/ANSI, save it with a name, close it and re-open it results in a new file encoded in UTF-8. Be careful with that setting. Even real ANSI files are loaded with this setting as UTF-8 encoded file causing all ANSI characters to be interpreted wrong.Added on 2010-03-28:
With UltraEdit v16.00 instead of Create new files as Unicode
there are now the choices
Create new files as ANSI
Create new files as UTF-8
Create new files as UTF-16
at Advanced - Configuration - Editor - New File Creation
. Therefore users of UltraEdit 16.00 and later can set the default encoding for new files to UTF-8. With this change the option Format
of the Save As dialog is not remembered and preset anymore in UE v16.00 and later. Format
of the Save As dialog is now always set to Default
on opening of the dialog.