Tidy and typographer's quotes

Help with setting up and configuring custom user tools in UltraEdit (based on command line input)

Tidy and typographer's quotes

Postby JonF » Fri Nov 09, 2007 6:44 pm

I really like using the alphabetical references (I don't know the correct terminology) for quotes and the like: “, ”, …, ...

When I first tried HTML Tidy in UltraEdit, using a sample config file I got from somewhere, they all got translated to numerics: “, ”, …, ... I tried all sorts of changes to the config file but failed. Now I can't even get it to do the numerics; it's inserting characters directly (“Xmas” turns into “Xmas” and I can't even get back to the numerics).

My config file right now is:

Code: Select all
indent: auto
indent-spaces: 2
wrap: 72
markup: yes
output-xhtml: yes
input-xml: no
show-warnings: yes
quote-marks: no
quote-nbsp: yes
quote-ampersand: yes
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: utf-8

and my documents start out with:

Code: Select all
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >

Help?
User avatar
JonF
Newbie
 
Posts: 2
Joined: Fri Nov 09, 2007 12:00 am

Re: Tidy and typographer's quotes

Postby Mofi » Sat Nov 10, 2007 6:10 pm

Here is the manual for HTML Tidy. The correct name for &ldquo;, &rdquo; is HTML entities. You will find the keyword entities several times on the manual page which you should read once. The numeric values are the Unicode values of these characters.

However, add a line with following text to your configuration file and the existing well-formed HTML entities should be preserved in the UTF-8 encoded XHTML file (not tested):

preserve-entities: yes

The of course correct option char-encoding: utf-8 is responsible for the conversion of the HTML entities into UTF-8 characters.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Tidy and typographer's quotes

Postby JonF » Mon Nov 12, 2007 1:49 pm

Thank you. That did not work but it led me to the solution, which is to use <?xml version="1.0" encoding="iso-8859-1"?> in the document and

Code: Select all
char-encoding: latin1
preserve-entities: yes


in the config, and the result still validates.
User avatar
JonF
Newbie
 
Posts: 2
Joined: Fri Nov 09, 2007 12:00 am

Re: Tidy and typographer's quotes

Postby Mofi » Mon Nov 12, 2007 2:39 pm

Ah yes, on the main page of the HTML Tidy project I now read:

11 February 2007

The configuration option preserve-entities has been added.


That's the reason why it did not work because the HTML Tidy published with UltraEdit is an older version.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 4055
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to Custom User Tools/Tool Configuration