Format (encoding) not preset correct in Save As dialog (fixed)

General and specific configuration/INI settings

Format (encoding) not preset correct in Save As dialog (fixed)

Postby FatBear » Wed Aug 03, 2011 12:00 pm

Hi,

I need to know how UES handles byte order marks. For example, if I open a file from another source which does not contain BOMs, will UES add them? Generally speaking, will UES keep the BOMs as they found them? Can I add or remove them?

Thanks,

--Brian
FatBear
Newbie
 
Posts: 2
Joined: Sun Jul 03, 2011 12:32 pm

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby Mofi » Wed Aug 03, 2011 12:45 pm

A byte order mark already present in a file on opening is kept on save (with Ctrl+S) by default.

For UTF-8 files without BOM there are 2 configuration settings:

Write UTF-8 BOM header to all UTF-8 files when saved
Write UTF-8 BOM on new files created within this program (if above is not set)


By default both settings are not enabled and therefore UTF-8 files without BOM are saved (with Ctrl+S) also without BOM.

UTF-16 files without BOM on opening are saved by default also without BOM. (Except a sort is done on entire file which results in adding the BOM in the background. The workaround is to select all with Ctrl+A and then sort the file which results in sorting entire file without adding the BOM.)

The Save As dialog has the Format option where you can select with which encoding a Unicode file should be saved and if the save is with or without BOM. So the Save As command can be used to remove or add a BOM.

To remove a BOM from a bunch of files, you can use Replace In Files command, see for example Byte Order Marker (BOM) query?
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby arofer » Fri Aug 05, 2011 12:40 pm

The "Save As" action always chooses UTF-8 (with BOM), and I have to mouse around and reset it in every case.

Since the UTF-8 with BOM is a brain-dead format, which is also incompatible with many applications, I NEVER want a BOM on UTF-8.
There are no 16-bit entities in the UTF-8 encoding, so it is completely extraneous. It is not "erroneous" only because the UNICODE standards committee made a mistake to allow it.

I want the default to be UTF-8 without BOM.
In fact, I would like to remove the UTF-8 with BOM encoding completely from UltraEdit!

Is it possible to change the default for "Save As"?
arofer
Newbie
 
Posts: 5
Joined: Fri Aug 05, 2011 12:34 pm

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby Mofi » Fri Aug 05, 2011 1:59 pm

I'm using currently English UltraEdit v17.10.0.1015 on Windows XP SP3 and the Format option in the Save As dialog is always set to Default which means the file is saved with the encoding as indicated in the status bar, and for UTF-8 files the BOM saving is handled according to the 2 settings for newly created files and UTF-8 files without BOM on opening. As far as I know UltraEdit does not remember the last used value for the Format option. Whenever I open the Save As dialog, Default is preselected for the Format option.

arofer, you have not posted which version of UltraEdit you use and which operating system. The Save As dialog of UE v17.00 and later is different for Windows 7 and Vista. Perhaps there is something different resulting in preselecting always UTF-8.

Wait a moment, I just detected something interesting. While Format option in Save As dialog is always set to Default when I open the Save As dialog, I can see that UltraEdit remembers the last used Format option in uedit32.ini with value File Format= in section [Settings]. I don't know why this is done because on my installation this value is never applied. I will ask IDM by email about this value in uedit32.ini and will post the reply here.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby arofer » Mon Aug 08, 2011 11:00 am

I am on version 17.10.0.1015 of UltraEdit on Windows 7 Professional.
The UTF-8 settings are:
New File Creation: UTF-8
File Handling/Save: all options are deselected

I never see "Default" on the save as. I always see "UTF-8", which puts a BOM on the file.

UTF-8 with BOM is incompatible with most programs, since the BOM is essentially useless in this encoding.
Hence text processing programs such as editors, compilers, etc. tend to assume no one with an ounce of sense would use it.
Search the web for "UTF8 BOM" and you can see the numerous problems it causes.
It seems like ultraedit is one of the very few programs in the world that uses this file format regularly, for no good reason I can discern.
It is unnecessary for detection of utf-8 file encoding, which is always done correctly on files without the BOM.

We really like UltraEdit and therefore we are quite baffled about this apparent lack of good judgment.
arofer
Newbie
 
Posts: 5
Joined: Fri Aug 05, 2011 12:34 pm

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby Mofi » Mon Aug 08, 2011 12:00 pm

I have not yet received a reply by IDM. I can tell you only what I see with UltraEdit v17.10.0.1015 on Windows XP SP3 x86.

  • If I select Create new files as UTF-8 at Configuration - Editor - New File Creation,
  • and Write UTF-8 BOM header to all UTF-8 files when saved
    and Write UTF-8 BOM on new files created within this program
    are both not enabled at Configuration - File Handling - Save,
  • and I press Ctrl+N to open a new file which is encoded in UTF-8 with DOS line terminators as indicated in the status bar at bottom with U8-DOS,
  • and enter some text with characters with decimal value greater than 127,
  • and press key F12 to open Save As dialog for saving the new file,
  • the Format option is set to Default and the file is saved as UTF-8 file without BOM.
I agree that usually UTF-8 encoded files are used without BOM and that using a BOM makes problems in many applications not supporting Unicode files. The UTF-8 encoding without BOM is often used because

  • many applications like compilers and interpreters still do not really support Unicode files. UTF-8 encoding makes it possible to use those applications nevertheless because UTF-8 encoded files can be interpreted when only single byte, null terminated strings can be processed.
  • UTF-8 encoding saves storage space and reduces data transfer volume for many text files in comparison to UTF-16 because often most text files contain mainly characters from ASCII table and just a few characters are from Unicode table.
However, UTF-8 with BOM should be nevertheless supported by a text editor.

I just tested UltraEdit with the default settings by renaming uedit32.ini (usually located in directory %appdata%\IDMComp\UltraEdit) and starting UltraEdit which created the INI file completely new. The 2 Write UTF-8 BOM ... settings are by default not checked. The default encoding type for new files is ANSI and therefore this setting must be changed by the user when new files should be encoded with UTF-8. After making this change, opening a new file, entering a text, opening Save As dialog, entering a file name and saving the file, the new file was saved on my hard disk as UTF-8 encoded file with DOS line terminators and without BOM.

So UltraEdit is configured by default for saving UTF-8 files without BOM, at least on Windows 2000 / XP. Just when a UTF-8 file with BOM is opened, modified and saved, it still contains the BOM after save.

Perhaps the Save As dialog for Windows 7 and Vista is working currently different to Save As for Windows XP and 2000.

Is there any other user using UE v17.10.0.1015 on Windows 7 / Vista who can confirm this behavior?

In the meantime while waiting for a reply from IDM support, I suggest that you check your uedit32.ini. Open it with Notepad while UltraEdit is not running. Search for File Format= and set the value to 0 (zero). Make sure there is no second File Format= in uedit32.ini. Save uedit32.ini, close Notepad and check if UltraEdit now has always Default set in the Save As dialog.

If that does not work and in Save As dialog still UTF-8 is preselected, close UE, rename uedit32.ini for example to uedit32_bak.ini, start UE which creates uedit32.ini new, select encoding type UTF-8 for new files in the configuration, open a new file (the one already open is an ANSI file), enter some text and save it. Is the file now saved as UTF-8 file without BOM?
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby arofer » Tue Aug 09, 2011 11:44 am

Okay I fixed it.
First, I changed the Uedit32.ini as you recommended. No change in behavior.

Second, I renamed the Uedit32.ini and with save as, I got the Default encoding.
I changed this to "utf8 no bom" and that became the new defaulted "save as" encoding.

I diffed the old Uedit32.ini and the new one and noticed 3 significant changes. The new ones were:

Default File Type=0
Default UTF8=0
File Format=4

By reverting to the old Uedit32.ini and changing these three settings, my default encoding is now (hurrah!) "utf8 no bom".

So, thanks for the hints.
Apparently, the first time you do "save as" plants in concrete the defaulted encoding, which can then only be fixed with a change to Uedit32.ini outside of UltraEdit.

As far as your other comments go, I would agree that text processors "should" support "utf-8 with BOM", if only because others have made the mistake of creating files in this encoding. My main complaint was that UltraEdit was DEFAULTING to this rotten encoding, causing me many headaches.

It's not only text processors that have "no support for Unicode" that do not support "utf-8 with BOM". It's also text processors with robust Unicode support that fail to support "utf-8 with BOM". Take, for example, the standard Sun java compiler (!!!). The reason for this lack of support is therefore not due to not embracing Unicode, but rather because of a recognition that "utf8-with-BOM" is a backwater, rarely-used, inferior encoding that uses a redundant and useless marker only because of some mistake made by a standards committee.

BTW, did you ever wonder what it might mean if the bytes in the BOM in utf8 were switched? It's like one hand clapping.

May I suggest that the BOM in "utf8-with-BOM" stands for "Backwater Obsolete Mark" rather than "byte order mark", which is a misnomer.
arofer
Newbie
 
Posts: 5
Joined: Fri Aug 05, 2011 12:34 pm

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby Mofi » Tue Aug 09, 2011 2:14 pm

The configuration setting Encoding Type of New File Creation is saved in uedit32.ini with 2 settings (2 because of downwards compatibility):

For ANSI:
[Settings]
Default UTF8=0
Default Unicode=0


For UTF-8:
[Settings]
Default UTF8=1
Default Unicode=0


For Unicode:
[Settings]
Default UTF8=0
Default Unicode=1



I have already explained that the last used Format is remembered with:

[Settings]
File Format=
index of last Format selection


The configuration setting Default file type for new files of DOS/Unix/Mac Handling is stored with:

[Settings]
Default File Type=0


Value 0 is for DOS, value 1 for UNIX and value 2 for MAC.

It should not be required to change any of these settings manually and directly in uedit32.ini.



I have received today the reply from IDM on my email with the questions:

Mofi wrote:What is the purpose of File Format= in uedit32.ini?

Is it possible that "Format" in "Save As" dialog on Windows 7 / Vista is set to this value while this is not done for "Save As" dialog on Windows 2000 / XP?

My email contained not just these 2 questions, but also some additional information and the link to this topic.

IDM wrote:Thank you for your message. Yes, this value is used to track the Encoding value last used in the Save As dialog so that this can be remembered and provided the next time the Save As dialog is invoked.

So we know now that UltraEdit should remember last used Format selection and should preselect the encoding on next opening of the Save As dialog. This definitely does not work in UE v17.10.0.1015 on Windows XP / 2000 because always "Default" is set in Save As dialog, although last used format option is correct remembered. I tested this with several archived versions of UltraEdit and found out that preselecting the last used format option is not working anymore since UltraEdit v16.00.0.1025. It's really interesting that nobody reported this until now. It looks like most users don't need a conversion from default encoding type for new files to a different encoding on first save.

And it looks like on Windows 7 / Vista the opposite is the case. The remembered encoding is preset in the Save As dialog, but the dialog does not remember correct anymore the last used format selection. You should report this by email to IDM support. I don't want to do this because I can't verify if this issue is fixed in a future version because of having no computer running Windows 7 or Windows Vista. Thanks.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby arofer » Tue Aug 09, 2011 6:04 pm

I believe I have found an (additional) problem.
If "Default UTF8=1", I cannot get the "Save As" default setting to move off of "UTF-8 with BOM".

It looks like this is a bug in UltraEdit, so I am bailing out and going to support with this.

I sent in my settings and example to IDMCOMP support.
They verified that this is indeed a bug and will let me know when it gets fixed.
arofer
Newbie
 
Posts: 5
Joined: Fri Aug 05, 2011 12:34 pm

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby FatBear » Thu Aug 11, 2011 8:08 pm

Thank you Mofi and arofer for getting to the bottom of this!

--Brian
FatBear
Newbie
 
Posts: 2
Joined: Sun Jul 03, 2011 12:32 pm

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby Mofi » Fri Sep 02, 2011 12:49 am

The issue on Windows XP / 2000 that Format is always set to Default instead of last used format (encoding) is fixed in UE v17.20.

The issue with wrong Format on Windows 7 / Vista should be also fixed in UE v17.20, but I can't verify this because of not using Windows 7 or Vista.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Format (encoding) not preset correct in Save As dialog (fixed)

Postby arofer » Thu Sep 08, 2011 12:48 pm

Okay. I just downloaded Version 17.20.0.1014 to my Windows 7 Pro, and it seems to have fixed the BOM default problem. The default encoding is now set to UTF8-no BOM, unlike the prior version, which stubbornly stuck at UTF-8 WITH BOM.

It baffles me, however, as to why IDM, which is otherwise so clever in their product development, are so supportive of a brain-dead (utf8 with bom) encoding. It's like they have a blind spot in their vision.

For example, the entry labeled "UTF-8" is actually "UTF-8 WITH (obsolete, unsupported, and unnecessary) BOM".
And, the entry labeled "UTF-8 WITH NO BOM" is actually the standard, supported form of UTF-8 file encoding.
This nomenclature is backwards at best. The oddball is the UTF-8 WITH BOM.

Anyway, thanks to IDM for fixing this problem, which has cost me a great deal of frustration.
arofer
Newbie
 
Posts: 5
Joined: Fri Aug 05, 2011 12:34 pm


Return to Configuration/INI Settings