Here is what you are waiting for a long time – a macro (set) which is able to sort all words of all color groups of a syntax highlighting language definition.
It handles correct case sensitivity according to Nocase, words beginning with /, substrings defined with ** and also special language settings like HTML_LANG, XML_LANG and LATEX_LANG. FORTRAN_LANG and the other language markers have no importance on the sort order of the words in the color groups.
It does not matter if the language definition with words to sort is stored in a file with other language definitions, for example wordfile.txt or wordfile.uew, or the file contains only one language definition. Also blank lines within the whole language definition are allowed and are removed by the macro during execution. Set the caret anywhere within the language definition you want to sort and start the macro SortLanguage. That's all, lean back and look what's going on.
Do not use the macro SortLanguage with UltraEdit v15.00.0.1033 to v15.00.0.1047. With these versions of UltraEdit this macro does not work because of two bugs in UltraEdit.
The macro is not working correct if the setting Automatically copy to clipboard when selection is made is enabled at Configuration - Editor - Miscellaneous. Uncheck this setting before running the macro. And also disable word-wrap mode if word-wrap is enabled for the wordfile or by default for new (temporary) files.
For wordfiles containing only a single syntax highlighting language see also command line tool SortLanguage and the Windows GUI UE Companion Utility.
GENERAL SORTING REQUIREMENTS
Here are some general information about the sorting requirements of words in a syntax highlighting wordfile for UltraEdit and UEStudio.
The first line of a syntax highlighting language block in a wordfile is the language definition line. It starts with uppercase /Lx with x is a number in the range of 1 to 20. Normally the name of the syntax highlighting language in double quotes follows immediately the language number. The language definition line must end with either File Extensions = or File Names = and the list of file extensions or file names of those files of which content should be highlighted with this syntax highlighting language.
All keywords and key strings supported by UltraEdit and UEStudio to define how a syntax highlighting language should highlight the content of a file are case-sensitive and those key strings with an equal sign require exactly 1 space before and 1 space after the equal sign. An example demonstrating the incorrect usage of keywords and key strings in a language definition line with red marked errors:
/L20"Example" NoCase String Chars =" Line Comment = // File Extensions= TXT
/L20"Example" Nocase String Chars = " Line Comment = // File Extensions = TXT
Compare the first line with the errors with the correct second line. What is wrong in the first line?
In keyword "Nocase" the character c is written in wrong case.
In key string "String Chars = " the space after the equal sign is missing.
In key string "Line Comment = " there are 2 spaces before the equal sign.
In key string "File Extensions = " the space before the equal sign is missing.
Important for the sorting order of the words in the color groups is the keyword Nocase in the language definition line because it controls among other things the case sensitivity of the words. Therefore the macro SortLanguage searches in the language definition line for the word nocase in any case and replaces it always by the correct keyword Nocase before it starts to sort the words.
All words in the color groups starting with the same character may be on the same line or spread across multiple lines, however if they are spread across multiple lines the lines must be one after the other with no empty lines or other line lines between them.
If the language is case-sensitive, the letter A is different from a and so words starting with A must be on a different line from words starting with a.
Words starting with the letter A must be on the same line as words starting with the letter a if the language is not case-sensitive.
First an example for a case-sensitive language with several sorting errors marked with red color:
Anchor Applet Dictionary Area Arguments Array abstract
Date Document Drive Drives
default delete do double
class const catch char continue
What is wrong in the example above and why?
The word case starts with a lowercase c and the language is case-sensitive. So this word must be on a different line than the word Checkbox which starts with an uppercase C. The same mistake was made here for the word abstract.
The word Dictionary starts with D and therefore must be on a different line than the words starting with A.
The word Crypto starts with C and therefore must be on the same line with Collection or Checkbox or on a separate line, but with no other lines between the lines with words starting with C. In the example there are lines with words starting with A and B between the line with Checkbox and the line with Crypto and therefore this word is ignored.
That the words class const catch char continue are not sorted alphabetically within the line is no problem for UltraEdit/UEStudio. It is also no problem that for example the line with the words default delete do double is above the line with the words starting with c. And it also doesn't matter if some lines contain multiple words starting with the same character and other words starting with the same character are spread over multiple lines as long as lines with words starting with the same character build a unique block within a color group. But with such a weird grouping and ordering of the words mistakes can happen very easily when inserting additional words. Therefore the SortLanguage macro sorts also the words within a line and the entire lines alphabetically. Here is the corrected words list as produced by the macro:
Anchor Applet Area Arguments Array
Checkbox Collection Crypto
Date Dictionary Document Drive Drives
case catch char class const continue
default delete do double
Now let us assume the keyword Nocase exists on the language definition line and therefore the case of the letters of the words in the color groups is not important. In this case all the words starting with a lowercase character in the list above would not be correct highlighted. The correct word order for a language ignoring the case of the letters A to Z would be:
abstract Anchor Applet Area Arguments Array
Boolean break Button byte
case catch char Checkbox class Collection const continue Crypto
Date default delete Dictionary do Document double Drive Drives
Language specific letters with a character value greater 127 are interpreted by the syntax highlighting engine always case-sensitive independent on presence of keyword Nocase in the wordfile. But wordfiles usually do not contain such letters and therefore the macro set for sorting the keywords do not process words with such letters different although the syntax highlighting engine would require it.
Lines starting with / are interpreted by UltraEdit/UEStudio as a line with a special syntax highlighting keyword. Therefore all lines in the color groups containing one or more "words" starting with / must start with // to be correct interpreted. An example with a wrong and a correct line:
// /word1 [color=blue]/word2
A line starting with ** defines a line with 1 or more substrings. The strings on this line can start with different characters. The lines with substrings must only build a block within a color group, best at top of the color group. Normally only 1 line is required for the definition of substrings in a color group. All words starting with those substrings are completely highlighted with the color of the color group. For more details on substrings see the documentation of TestForDuplicate below.
Languages marked with HTML_LANG or XML_LANG in the language definition line enables the HTML/XML specific interpretation of the words in the color groups. If one of these keywords is present, < or </ may be placed in front of any word (tag) to highlight as desired without all keywords starting with < need to be on the same line. Instead the tags starting with the same letter must be on the same or contiguous lines as normally required for words like if the tags would not begin with < or </.
A language marked with LATEX_LANG in the language definition line enables the LaTex/Tex specific interpretation of the words in the color groups. If a word begins with \ then the second character is used to determine which line the word should be on. All words beginning with \a should be on the same line as other words beginning with \a or just a. In the same way, all words beginning with \b should be on the same line as other words beginning with \b or just b, but on a different line from those starting with \a, and so on.
For more details and help about syntax highlighting wordfiles see in help of UltraEdit or UEStudio the page Syntax Highlighting and the forum topic Readme for the Syntax Highlighting forum.
GENERAL MACRO INFORMATION
Some general information about the macros used to sort the words in all color groups of a language.
The macros are ready for usage in the macro file SyntaxTools.mac. The macros are developed with having in view the compatibility with many versions of UltraEdit and UEStudio and are tested with many versions of UltraEdit. But always take a quick look on the result of the sorting operation. It is always possible that a version of UE/UES released after last update of the macro set has a bug in program code resulting in a wrong macro execution.
To use this macro set you need at least v8.20 of UltraEdit or any version of UEStudio. The macros were developed and tested with UE v10.10c and later versions of UltraEdit.
If you find any bugs or have other related questions, post it here.
You can see the source code of the macros in the file SyntaxTools.uem with lots of comments in case of being interested in how the macros work. If you want to make changes to fit your requirements better, feel free to do so, but take following into consideration:
All macros should have following properties:
Show Cancel Dialog for this macro ............ disabled
Continue if a Find with Replace not found ... enabled ( < UE v13.10a+2)
Continue if search string not found ........... enabled (>= UE v13.10a+2)
Hotkey = none
You can assign a hotkey to macro SortLanguage if it is used frequently. Never run the submacros manually!
Remove the green comment lines with the blank lines before copying the instructions to the macro edit window. The comments are only for experts who want to know how the macros work.
The submacro WrapLines sets the maximum numbers of characters per line to 106 which is the best value for printing with Courier New 8 with 1.5 cm left and right border on a European A4 sheet. This line length is also good for lower resolutions (1024x768) and at least one additional view open on left or right side and a normal font size used for displaying the text. A wordfile for the UE/UES community should not work with larger line lengths to be readable by most users without the need to scroll the lines horizontally. But if you don't want this line length limit, remove in macro SortLanguage the command
- Code: Select all
PlayMacro 1 "WrapLines"
THIS MACRO SET IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO USE, RESULTS AND PERFORMANCE OF THE MACRO SET IS ASSUMED BY YOU AND IF THE MACRO SET SHOULD PROVE TO BE DEFECTIVE, YOU ASSUME THE ENTIRE COST OF ALL NECESSARY SERVICING, REPAIR OR OTHER REMEDIATION. UNDER NO CIRCUMSTANCES, CAN THE AUTHOR BE HELD RESPONSIBLE FOR ANY DAMAGE CAUSED IN ANY USUAL, SPECIAL, OR ACCIDENTAL WAY OR BY THE MACRO SET.
You can download the macros as a ready to use macro file with the description written here. File to download is SyntaxTools.zip.
Fixed problem with more than one space between the words resulting in additional empty lines for each additional space on sorted language.
A workaround for a bug of UltraEdit v10.xx was not well down because if the caret was already on the language definition line of which words should be sorted and this language definition is the last one in a wordfile with more than 1 language definition, the previous language definition was sorted. Found this problem by myself and fixed it now with a rewritten code block for finding start of current language definition. This new solution is even easier than the previous one.
Second, I rewrote the code block for selecting the whole language definition. The command Find RegExp "%*^p^p" was used in previous versions to select all lines of the language definition. This worked only because of a bug of UltraEdit. The new language selecting code block is much more complicated but with additional code it now allows also blank lines within the language definition. Note: Such blank lines are removed during macro execution.
Because of a bug in UltraEdit/UEStudio the Cancel dialog can cause crashes when calling a submacro. To avoid those crashes the macro property Show Cancel Dialog for this macro is not set any more in all macros.
Many users creating a syntax highlighting language definition for the first time write the keyword Nocase wrong. The keywords are case-sensitive and so nocase and NoCase are ignored by UE/UES. The macro SortLanguage detects now also a wrongly written Nocase keyword and corrects it automatically before sorting is executed.
With UltraEdit v11 and with UEStudio the keyword XML_LANG was introduced which has the same special meaning for words starting with < or </ as HTML_LANG. The macro SortLanguage recognizes now this special keyword too.
Before macro SortLanguage is executed the caret must be set anywhere within a language definition. If the caret is set on a blank line above a language definition and the file contains more than one language definition, the language definition above the caret was sorted by previous versions of this macro. If no other language definition is in the file and the caret is set on a blank line above the only language definition, the previous versions of the macro SortLanguage have done nothing. The macro SortLanguage was modified to first set the caret on a line which does contain any character before selecting the whole language definition. Now always the language below the current caret position is sorted if the caret is set on a blank line (= line which contains no or only whitespace characters).
Last some spelling mistakes were corrected in the documentation and the style of the documentation changed also a little.
The macros were designed for being executed on files with DOS line terminations because the syntax highlighting wordfile must be also a DOS file. The SortLanguage macro creates twice a new file. If the user has specified in the configuration dialog that the Default file type for new files is UNIX or MAC and not DOS and additionally has not selected the option Automatically convert to DOS format, new files were created not with CR/LF as line terminations and the macros failed. To solve this problem the command UnixMacToDos was inserted immediately after the 2 NewFile commands to make sure that the new file is always a DOS file.
Added to this documentation where to insert the macro command UnixReOn or PerlReOn if the user prefers the UNIX or Perl compatible regular expression engine instead of the UltraEdit regular expression engine which is used for these macros. Search for UnixReOn to find the 2 exit positions.
The order of the macros within the macro file has changed. The main macro SortLanguage is now the first macro in the file. That allows the user to run the macro SortLanguage also with Play Again from the Macro menu immediately after loading the macro file. So there is no need any more to select the macro SortLanguage from the macro list before execution after loading the macro file.
The macro file SyntaxTools.mac now contains also 3 additional macros to test a language definition for duplicate words. See the ultimate test for duplicate words macro for details about this additional macro set.
Last some small mistakes were corrected in the documentation.
There were 2 small errors in the macro codes for SortLanguage and ExpandSubstring. In both macros there was 1 Else which should be EndIf. These errors did not have an effect on the function of the macros.
The UltraEdit versions 12.10+3 to v12.10b and the UEStudio versions 5.50 and 5.50a move the focus always to nearest left tab in the file tab order instead of the last used file according to the window history when closing a file. With release of UltraEdit v12.20 and UEStudio v6.00 the focus handling after a file tab close can be customized with the option Move to nearest left tab after current tab is closed at Configuration - Application Layout - File Tabs. If this option is set or one of the UE/UES versions is used which always sets the focus to nearest left file tab after a file is closed, the wordfile with the language definition to sort has had to be the most right tab or the macro pasted the sorted language definition to the wrong file after closing the temporary files at end of the macro SortLanguage.
Now the macro SortLanguage has been improved and works independent of which file gets the focus after closing the 2 temporary file tabs. The macro now searches for the still existing selection of the whole language definition in all open windows before it pastes the sorted language definition over the unsorted definition.
In the source file the selected part of the language definition line does not start with a slash. Also in the temp file right before copying the sorted language back the language definition line has no slash at start. But for the loop to find correct file after closing last temp file a regular expression search was inserted which should find the start of the language definition line and should copy the language name with its language number to clipboard 8. This search was never successful and so the "find correct file" loop was executed with last content of clipboard 8 which could successfully find the correct file, but could also lead to an endless window switching loop. Fixed this bug by deleting the slash character in the regular expression search for the language definition number and name.
Last some small spelling mistakes were corrected in the documentation.
The macro file was renamed from SyntaxSort.mac to SyntaxTools.mac. Also the zip file was renamed to SyntaxTools.zip. And the file SyntaxSort.htm was renamed to SortLanguage.htm. The macro file contains now also an additional macro to test a language definition for invalid words. See the ultimate test for invalid words macro for details about this additional macro. The macro source code is also available as UEM file - see top of this text file. The macros for sorting the words were not modified.
Near the end of macro SortLanguage UltraEdit does not find under certain conditions (depending on PC hardware, version, source file) the language number and language name at top of the first temporary file and so does not copy those data to user clipboard 8. This could result in an endless loop because the correct window is never found because of wrong content in clipboard 8. A workaround was added for this very special UltraEdit problem.
Changed the regular expression for finding the language definition line to find also such lines which have no language name, only the language number. And made small modifications in comments and code, but without any effect on execution or result of the macro and therefore not really worth to document them in detail. Most changes were made on this description with lots of new information for interested readers.
In macro ExpandSubstring changed the method used to delete the remaining space at start of lines with substrings because of a bug detected with UltraEdit v15.00.0.1048.
Added the attention at top of the post.
Updated the macro to support also languages with up to 20 color groups.
Modified the macros SortLanguage and ReconvertWords to get in HTML and XML wordfiles the strings <? and ?> listed as <? ?> instead of ?> <? on a line after sort. This modification has no effect on syntax highlighting. <? ?> is just the better order for these 2 strings.
Rewrote the submacros CollectCase and CollectNocase and added submacro WrapLines for faster and better collecting words starting with same character on lines wrapped after column 106.
Modified main macro SortLanguage once more for better sorting HTML/XHTML and XML tags. The sort order is now
<tag <tag> </tag>
<tag> <tag </tag>
These 2 changes resulting in a better output after using macro SortLanguage have no effect on syntax highlighting based on wordfiles sorted already before with this macro.
Space characters at beginning of lines with words are now removed too. In previous versions such a space character at beginning of lines with words resulted in a blank line within a color group.