hydrant wrote:I don't know how to make macro with conversion to UTF-8...
My first suggestion is to assign a hot key to command FileConvASCIItoUTF-8(UNICODE) in the key mapping configuration dialog. This would enable you to make the conversion with a single key stroke. You could also add the command to a toolbar, but the command does not have an icon. This is my best advice.
My second suggestion is a macro, which you can use on every file load. I have to take following into consideration while writting the macro.
1) Convert the file only if it is not already an UTF-8 or UTF-16 file. Problem: There is no macro command which I can use to determine the file format.
2) If no UTF-8 multi-byte character is added, the file should be still an ASCII file on next file open. The option
Write UTF-8 BOM header to ALL UTF-8 files when saved must be unchecked to fulfill this requirement.
3) There is no ASCIItoUTF8 macro command.
Point 1) is done by the macro by first searching for an appropriate charset (HTML files) or encoding (XML files) information. If one of those strings is found, the macro has nothing to do.
If there is no encoding information, the macro switches to HEX mode (if not already active) and copies the first 2 bytes in hex mode into a new ASCII file. The space must be inserted before paste to be capable to switch to hex mode. If in the new file with the 3 bytes the UTF-16 LE BOM is found, the file is an UTF-8 or UTF-16 file without encoding information.
If the BOM is not found, the file is either an ASCII file or a binary file. So the macro searches next for a NUL byte (0x00). If the NUL byte is found, the file is handled as binary file and the macro exits in hex mode.
You see here, Unicode (UTF-x) files without BOM or encoding information are really awful for file reading routines. The macro solution is still not perfect because it fails on binary files without a NUL byte, although this is rare, or on binary files which starts with FF FE. But this detection routine should be enough for your purpose.
Point 2) and 3) is done with an extremly dirty trick. The file is an ASCII file (hopefully). To force UltraEdit to open it as UTF-8 file without BOM, the macro inserts at top of the file the HTML specification "charset=utf-8", saves the file, closes it and reopens it again. Because of the string UltraEdit handles it now as UTF-8 file without BOM.
This is no real conversion. The macro just let's UltraEdit think, it is an UTF-8 file. If your ASCII file contains characters with hex code greater than 0x7F, that characters will be now handled wrong and are also displayed wrong. YOU HAVE BEEN WARNED!
Well, you can avoid this character destruction if you run replace all commands with search for a character with hex code greater 0x80 and replace it with the appropriate characters of the multi-byte encoding before saving and closing the ASCII file with the temporary charset=utf-8 string. But the multi-byte codes are different according to the current codepage used, so I can't make suggestions here. And the sequence of the replace all commands is important to not convert 1 byte of an already inserted multi-byte code again.
After editing your file and save it, no BOM is added and if your file still does not contain a character greater 0x7F, it is in real still an ASCII file.
Top
Find "charset=utf-8"
IfFound
Top
ExitMacro
EndIf
Find "encoding="utf-8""
IfFound
Top
ExitMacro
EndIf
Find "charset=utf-16"
IfFound
Top
ExitMacro
EndIf
Find "encoding="utf-16""
IfFound
Top
ExitMacro
EndIf
HexOn
Clipboard 7
StartSelect
Key RIGHT ARROW
Key RIGHT ARROW
Key RIGHT ARROW
Key RIGHT ARROW
Key RIGHT ARROW
Copy
EndSelect
Key HOME
NewFile
UnicodeToASCII
" "
HexOn
Paste
ClearClipboard
Clipboard 0
Key HOME
Find "FF FE"
IfFound
CloseFile NoSave
HexOff
ExitMacro
EndIf
CloseFile NoSave
Find "00"
IfFound
ExitMacro
EndIf
HexOff
Top
InsertMode
"charset=utf-8"
Clipboard 7
CopyFilePath
CloseFile Save
Open "^c"
Find "charset=utf-8"
Delete
Save
ClearClipboard
Clipboard 0
Again! Best is to convert it manually to UTF-8 with a hot key when you need it. The macro is no real good solution.
What type of files do you edit which need UTF-8 encoding, but without BOM and without appropriate charset or encoding specification in the file header before first multi-byte character?