Incorrect XML Indentation

This forum is user-to-user based and not regularly monitored by IDM.
Please see the note at the top of this page on how to contact IDM.

Incorrect XML Indentation

Postby cawoodm » Sat Nov 24, 2012 8:48 am

I work with UES (v12.00) for (X)HTML development and it's important for me to have correct indentation. Mostly this works but there are a few strange cases when it doesn't. For example:
Code: Select all
<div><p><b>Hello</b>World!</p></div>

Gets indented as:
Code: Select all
<div>
   <p>
   <b>Hello</b>World!</p>
</div>

Whilst correct would be:
Code: Select all
<div>
   <p>
     <b>Hello</b>World!
   </p>
</div>

A more complex example which really goes wrong:
Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
           <a>Dropdown <b></b></a>
         </li>
       <li>
          <a>Dropdown 2 <b></b></a>
      </li>
    </ul>
  </div>
</div>

becomes
Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
         <a>Dropdown <b></b>
      </a>
   </li>
   <li>
   <a>Dropdown 2 <b></b>
</a>
</li>
</ul>
</div>
</div>


I am referring to the commands XMLConvertToCRLF and ReIndentSelection.
cawoodm
Basic User
Basic User
 
Posts: 27
Joined: Mon Mar 27, 2006 12:00 am

Re: Incorrect XML Indentation

Postby Mofi » Sun Nov 25, 2012 6:07 am

The command XML Convert to CR/LFs is not designed for HTML and XHTML files. This command is for XML files which of course have a different structure as HTML/XHTML files. In XML files there is no devision in block elements and inline elements as in HTML/XHTML. Therefore the general indentation rules are different between XML and HTML/XHTML.

The command ReIndent Selection is a better choice for using on HTML/XHTML files. But this command just reindents existing lines. It does not insert or remove line breaks. The command ReIndent Selection works based on the indent/unindent strings as defined in the wordfile used currently for syntax highlight the content of the active file. You can see in wordfile %appdata%\IDMComp\UltraEdit\wordfiles\html.uew just the basic indent definitions:

/Indent Strings = "<"
/Unindent Strings = "</"


You can modify the indent and unindent strings in the wordfile. But please don't forget that the command ReIndent Selection does not insert or remove line breaks. See also Correct reformat of HTML.

To get a well structured HTML output the command Format - HTML Validation - Run HTML Tidy can be used. Third-party tool HTML Tidy installed with UltraEdit and therefore ready to use without extra installation does not only validate the HTML/XHTML file, it can also output a well structured version of validated file. You can set the options for HTML Tidy using the dialog in UltraEdit or specify a text file which contains the manually entered options according to your requirements. The options of HTML Tidy are explained on Quick Reference page of HTML Tidy and the HTML Tidy manual.

Use advanced forum search to find other topics about using HTML Tidy.

By the way: In my HTML files I would prefer a different indentation scheme for the 2 examples you posted. It is with some effort possible to create an UltraEdit macro or UltraEdit script to reformat an HTML/XHTML file perfectly to the format you want. We can help you with writing such a macro/script. I don't have yet such a macro/script for myself as I only rarely modify HTML files written by others which I usually in those rare cases reformat with HTML Tidy and some manually executed regular expression replaces. The HTML files I write are from beginning perfect indented according to my requirements on indentation because of using auto-indent and appropriate indent/unindent strings.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Incorrect XML Indentation

Postby cawoodm » Mon Nov 26, 2012 11:42 pm

Servus Mofi and thanks for the detailed reply. However I must remind you: XHTML is valid XML and I can prove this with a parser. Also, when I work with my .html files I always choose View As XML so that the XML format (XMLConvertToCRLF + ReIndentSelection) works - it doesn't work properly unless you switch to XML view in my experience.

Now, the issue is not one of inline (e.g. <span> or <a> tags) vs block (<div> or <table>) tags. The issue is that the indentation UES produces is incorrect when elements contain text and sub-elements.

Example:

Code: Select all
<a>
   <b>This will indent correctly</b>
</a>


Code: Select all
<a>
<b>This won't indent<c>correctly</c>
</b>
</a>


It seems clear that this is something which needs to be improved.
cawoodm
Basic User
Basic User
 
Posts: 27
Joined: Mon Mar 27, 2006 12:00 am

Re: Incorrect XML Indentation

Postby Mofi » Tue Nov 27, 2012 2:11 am

You second example reindents well to

Code: Select all
<a>
   <b>This won't indent<c>correctly</c>
   </b>
</a>

using UE v18.20.0.1020 or UES v12.20.0.1002 with installed html.uew.

Well, XHTML files can be parsed with XML parsers and are valid XML files. But the structure of XHTML files is nevertheless different.

XML files:

Code: Select all
<docelement>
   <childelement1>
      <childelement2>value<childelement2>
      <childelement2>value<childelement2>
      <childelement3>value<childelement3>
      <childelement4>
         <childelement5>value<childelement5>
         <childelement5>value<childelement5>
         <emptyelement1 />
      <childelement4>
   </childelement1>
</docelement>


XHTML files:

Code: Select all
<blockelement>
   block value 1 part 1<inlineelment1>inline 1 value part 1 <inlineelment2>value</inlineelment2> inline 1 value part 2</inlineelment2>block value 1 part 2
   <otherblockelement>block value 2 part 1<inlineelment1><inlineelment2>value</inlineelment2></inlineelment3>block value 2 part 2</otherblockelement>
</blockelement>


In XML files the elements usually do not contain values and other elements mixed. XML elements contain either other elements OR a value.

In XHTML files this is not the case because some tags are definite block elements which can contain values, inline elements and some other block elements and other tags are definite inline elements which can contain only values and other inline elements. Therefore inserting line breaks should be done in XHTML files only left to opening/closing tags of block elements like <p>, <table>, <div>, <ul>, <li>, ... and never left to inline elements like <em>, <span class="warning">, <strong>, ...

Using >[ \t]*< as search string for a regular expression search and >\r\n< as replace string results in same behavior on execution as command XML Convert to CR/LFs with reference to inserting line breaks. The command XML Convert to CR/LFs additionally reindents the file. This behavior is 100% correct for XML files, but not good for XHTML files.

Better would be a smarter approach as HTML Tidy uses for HTML and XHTML files and which can be rebuild very easily with or or more regular expression replaces. A line break should be inserted only left to an opening or closing block element tag if it is not already on a separate line. That can be achieved with using ([^\r\n\t ])[ \t]*(</*)(p|div|td|tr|table|ul|li|ol|head|body|blockquote|code|and so on) as search string and \1\r\n\2\3 as replace string for a Perl regular expression search. After running this regular expression the entire file content can be selected and a reindent can be performed. As macro:

Code: Select all
InsertMode
ColumnModeOff
HexOff
Top
PerlReOn
Find RegExp "([^\r\n\t ])[ \t]*(</*)(p|div|td|tr|table|ul|li|ol|head|body|blockquote|code|h1|h2|h3)"
Replace All "\1\r\n\2\3"
SelectAll
ReIndentSelection

Please note that the list of block elements in the regular expression search string is not complete.

BTW: Since UES v11.10 the command XML Convert to CR/LFs can be used for all files independent on file extension or syntax highlighting. This change was made by IDM because many files with different file extensions are nowadays XML files. But it should be nevertheless used only on real XML files, not on other files like pure text files, C/C++ files or XHTML files.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna

Re: Incorrect XML Indentation

Postby cawoodm » Tue Nov 27, 2012 2:35 am

I'm using UES 12.00 and the button "XML Convert to CR/LFS" is enabled for all file types, but it only actually indents correctly if you switch view to XML. You can probably confirm:

Input:
Code: Select all
<div> <div> <a href="#">Hello</a> <ul> <li> <a>Dropdown <b></b></a> </li> <li> <a>Dropdown 2 <b></b></a> </li> </ul> </div></div>


XMLConvertToCRLF + ReIndentSelection (HTML View):
Code: Select all
<div>
<div>
<a href="#">Hello</a>
<ul>
<li>
<a>Dropdown <b></b>
</a>
</li>
<li>
<a>Dropdown 2 <b></b>
</a>
</li>
</ul>
</div>
</div>


XMLConvertToCRLF + ReIndentSelection (XML view):
Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
            <a>Dropdown <b></b>
            </a>
         </li>
         <li>
            <a>Dropdown 2 <b></b>
            </a>
         </li>
      </ul>
   </div>
</div>


Note, this example previously did not indent correctly even in XML mode. I noticed in my wordfile.txt under /L6"XML" that I had no settings for indentation. Also xml.uew had no indentation settings. So I added:
Code: Select all
/Indent Strings = "<"
/Unindent Strings = "</"


This seems to have fixed the issue for me :) - this is a valid indentation. Much better than before:
Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
         <a>Dropdown <b></b>
      </a>
   </li>
   <li>
   <a>Dropdown 2 <b></b>
</a>
</li>
</ul>
</div>
</div>


How do wordfile.txt and xml.uew interplay?
cawoodm
Basic User
Basic User
 
Posts: 27
Joined: Mon Mar 27, 2006 12:00 am

Re: Incorrect XML Indentation

Postby cawoodm » Tue Nov 27, 2012 4:12 am

OK, it seems that this "fix" in the wordfile.txt breaks the XML formatting elsewhere. :(

UES now does not properly indent XML with self-closing tags like <c/>.
Code: Select all
<a>
   <b>
      <c/>
         <d>oh no!</d>
      </b>
   </a>

If I remove this from the wordfile.txt under "XML":
Code: Select all
/Indent Strings = "<"
/Unindent Strings = "</"

it will correctly indent:
Code: Select all
<a>
   <b>
      <c/>
      <d>oh no!</d>
   </b>
</a>


Changes to xml.uew seem to have no affect.

It would seem this is a confirmed defect: XML Convert to CR/LFs problem with />
cawoodm
Basic User
Basic User
 
Posts: 27
Joined: Mon Mar 27, 2006 12:00 am

Re: Incorrect XML Indentation

Postby Mofi » Wed Nov 28, 2012 1:55 pm

I copied the line

Code: Select all
<div> <div> <a href="#">Hello</a> <ul> <li> <a>Dropdown <b></b></a> </li> <li> <a>Dropdown 2 <b></b></a> </li> </ul> </div></div>

into a new ASCII/ANSI file (no Unicode encoding like UTF-8) with DOS line terminates and saved the file as Test.htm. The save activated syntax highlighting for HTML as indicated in the status bar. Next I executed XML Convert to CR/LFs and the output was:

Code: Select all
<div>
 <div>
  <a href="#">Hello</a>
  <ul>
   <li>
    <a>Dropdown <b></b>
    </a>
   </li>
   <li>
    <a>Dropdown 2 <b></b>
    </a>
   </li>
  </ul>
 </div>
</div>

I have configured for files with file extension HTM and HTML to use spaces in place of tabs and use 1 space as indent space.

The result is perfect. That's exactly what I expected, but what I would never like for HTML/XHTML files.

The command XML Convert to CR/LFs automatically reindents all lines. So there is no need to execute additionally command ReIndent Selection.

I have nevertheless selected this block and executed the command ReIndent Selection. There was no change as expected by me.

Next I used the commands File - Revert to Saved and File - Rename File to restore previous single line and rename the file to Test.xml. Then I executing once more Revert to Saved to trigger highlighting as XML instead of HTML file.

Now I executed once again command XML Convert to CR/LFs and got as output:

Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
         <a>Dropdown <b></b>
      </a>
   </li>
   <li>
   <a>Dropdown 2 <b></b>
</a>
</li>
</ul>
</div>
</div>

I have configured for files with file extension XML to use tabs with a tab stop/indent value of 3.

Yes, this result is not what can be expected. Either the output should be like above on running the command on the HTML file or <b></b> should be twice indented on a new line and the result is:

Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
            <a>Dropdown
               <b></b>
            </a>
         </li>
         <li>
            <a>Dropdown 2
               <b></b>
            </a>
         </li>
      </ul>
   </div>
</div>


I have used the standard wordfiles as installed with UES v12.20.0.1002 where html.uew contains

/Indent Strings = "<"
/Unindent Strings = "</"


and xml.uew contains no indent/unindent strings definitions. Therefore I added to the xml.uew the 2 lines as present in html.uew, restarted UEStudio and executed the command XML Convert to CR/LFs once again on Test.xml. Now the result was:

Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
            <a>Dropdown <b></b>
            </a>
         </li>
         <li>
            <a>Dropdown 2 <b></b>
            </a>
         </li>
      </ul>
   </div>
</div>

That's a much better because one of the 2 possible correct results.

Next I copied into Test.xml additionally the line

Code: Select all
<a><b><c/><d>oh no!</d></b></a>

and executed the conversion command again. The result for this block as like what you posted. The reindent failed starting on the empty element.

Therefore I modified the unindent strings definition line to

/Unindent Strings = "</" "/>"

as I suggested in the topic you referenced, saved xml.uew, restarted UES and re-opened Test.xml. I executed command XML Convert to CR/LFs and now both lines are correct reformatted to:

Code: Select all
<div>
   <div>
      <a href="#">Hello</a>
      <ul>
         <li>
            <a>Dropdown <b></b>
            </a>
         </li>
         <li>
            <a>Dropdown 2 <b></b>
            </a>
         </li>
      </ul>
   </div>
</div>
<a>
   <b>
      <c/>
      <d>oh no!</d>
   </b>
</a>

Of course reverting the file content to saved, renaming it to Test.htm, executing once more Revert to Saved to trigger highlighting as HTML instead of XML file and executing XML Convert to CR/LFs resulted in same wrong indentations for the second line as before on XML because of the empty element and "/>" not present in html.uew as unindent string.
User avatar
Mofi
Grand Master
Grand Master
 
Posts: 3936
Joined: Thu Jul 29, 2004 11:00 pm
Location: Vienna


Return to UltraEdit General Discussion