Convert all files in a folder from UTF-8 to UTF-16

Help with writing and playing macros

Convert all files in a folder from UTF-8 to UTF-16

Postby neill80 » Mon Jun 28, 2010 9:51 am

Hi all,
I'm after some advise from you clever people out there :)

I am looking for converting some (approximately 100) XML documents from UTF-8 to UTF-16 to make it possible for being loaded by an application.

I have updated a file manually using the steps below but I'm unsure if this is correct.

I have updated the encoding in the text of the file as below

From
encoding="UTF-8"

to
encoding="UTF-16"

Once this is done I have used the "save as" function and selected the file format to be UTF-16 and saved the document.

Is this all I need to do?
How do I know it is UTF-16 and the conversion is successful?
How do I process this over a folder?

Happy to upgrade the version of UltraEdit if required.

Thank you in advance

Kind Regards

Neil


UltraEdit 12.20b+1
User avatar
neill80
Newbie
 
Posts: 4
Joined: Tue Feb 27, 2007 12:00 am

Re: Convert all files in a folder from UTF-8 to UTF-16

Postby Mofi » Mon Jun 28, 2010 10:31 am

neill80 wrote:Is this all I need to do?

Yes, this is one method. The other is to change the encoding information in the file, then use File - Conversions - UTF-8 to Unicode and save the file with default format. But the method you described is faster here.

neill80 wrote:How do I know it is UTF-16 and the conversion is successful?

File size doubled because UTF-16 uses always 2 bytes per character while UTF-8 uses 2 or 3 bytes per character only for non ASCII characters. You can also look into the file with a hex viewer. However, you can trust UE that the conversion was done right.

neill80 wrote:How do I process this over a folder?

That is a problem because there are no macro commands (up to UE v17.20) to convert from ANSI/UTF-16 to UTF-8 and vice versa. (With UE v17.30 the commands ASCIIToUTF8 and UTF8ToASCII were introduced). But there is a workaround, you can simply copy entire content from a UTF-8 file into a new Unicode (UTF-16) file and save the new file with the same name as the UTF-8 file (= overwrite UTF-8 file). UTF-8 file are loaded always with temporary conversion to UTF-16 for editing.

Following macro should work with your version of UltraEdit. You have to adapt the directory "C:\Temp\" and perhaps also the file type "*.*" and the final results line string "Search complete, found " if you are not using English version of UltraEdit. The macro property Continue if a Find with Replace not found should be checked for this macro.

Please note: I have not tested this macro whether with currently latest version nor with your version of UltraEdit. So please run it on a copy of your UTF-8 files which should be converted to UTF-16.

FindInFiles "C:\Temp\" "*.*" ""
Loop
Find MatchCase Up "Search complete, found "
IfFound
ExitLoop
Else
NextWindow
EndIf
EndLoop
DeleteLine
SelectToBottom
IfSel
Delete
EndIf
Top
UnicodeToASCII
Loop
IfEof
ExitLoop
EndIf
StartSelect
Key END
Clipboard 8
Copy
EndSelect
Key HOME
Key DOWN ARROW
Open "^c"
Clipboard 9
SelectAll
Copy
CloseFile NoSave
NewFile
ASCIIToUnicode
Paste
Top
UnixReOff
Find MatchCase "encoding="UTF-8""
Replace "encoding="UTF-16""
Clipboard 8
SaveAs "^c"
CloseFile NoSave
EndLoop
CloseFile NoSave

You may also first test if your application likes a UTF-16 BOM at top of the file or not. XML files encoded with UTF-8 have normally no UTF-8 BOM (Byte Order Mark - not displayed in the editor) at top of the file. UTF-16 files have normally a BOM and therefore UltraEdit by default saves new UTF-16 files always with BOM. The BOM can be easily removed with a simple Replace In Files when your application does not like the UTF-16 BOM.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4049
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Convert all files in a folder from UTF-8 to UTF-16

Postby neill80 » Mon Jun 28, 2010 11:07 am

Hi Mofi,

Thank you very much for your post, its much appreciated. I will try the macro and let you know.

Kind Regards

Neil
User avatar
neill80
Newbie
 
Posts: 4
Joined: Tue Feb 27, 2007 12:00 am


Return to Macros