How to sort a text file or CSV file with UltraEdit
One of UltraEdit’s hallmark features is its ability to sort text with a wide variety of options and flexibility for different formats. Using UltraEdit, you can sort an entire file, a selection of lines within a file, or by a certain column of text or a field of text (like within a comma-separated value [CSV] or tab-separated value [TSV] file for example). Additionally, UltraEdit’s large file editing capabilities allow you to open and sort huge amounts of data where other editors or sorting mechanisms fall short.
UltraEdit includes several advanced options for sort that allow you to sort alphabetically or numerically, with options for case sensitivity and removal of duplicates. We’ll take a look at several different methods of sorting text below.
Sort a text file, or selection of lines
Let’s start with a basic, simple sort. Often we need to sort an entire file, line by line, alphabetically, starting with the first character on the line. To do this, click the Edit tab, then click the Sort button near the end.
Clicking Sort will cause the entire file to be sorted, line-by-line, with the last-used sort options. If you haven’t yet changed the sort options, then the defaults are used: a simple alphabetical (“a” to “z”) sort.
To sort only a portion of the text file, we’ll simply select the lines we want to sort. In the example below, we want to sort everything except the header line, so we’ve selected all lines except line 1:
With the selection active, we can click Sort, and only the text selection is sorted:
Now this is great, but what if you need to tweak your sort settings to get a slightly different result? Let’s take a look at UltraEdit’s advanced sort options.
Advanced sort options
You can access the sort options by clicking the down-arrow beneath the Sort command and selecting Advanced sort/options….
Here is a brief overview of each option:
Sort order (ascending / descending)
The most obvious option, this setting allows you to determine whether the data is sorted in ascending or descending order. For example, when doing an alphabetical sort, if “descending” is selected here then the word “zebra” would be sorted above the word “cat”.
Check this to remove all lines within the sorted text that are duplicates of each other. Keep in mind a duplicate is determined based upon the sort criteria you specify elsewhere in the sort options, so it’s important you understand what your sort settings are before checking this option! We’ll look at
If Where all selected keys match is selected with “Remove duplicates,” then all sort keys with the “RD” column checked must be exactly the same between two lines in order for them to be considered duplicates, and one will be deleted.
If Where any selected keys match is selected with “Remove duplicates,” then if only one of the sort keys with the “RD” column checked is exactly the same between two lines, they will be considered duplicates and one will be deleted.
By default, UltraEdit’s sort is case sensitive. If you have the word “cat” and “CATEGORY”, and this option is not checked, then “CATEGORY” would be sorted above “cat” because capital letters come before their lowercase counterparts in the ASCII table. If you check this option, however, “cat” would be sorted above “CATEGORY”. This option is especially useful if you’re sorting data where case doesn’t matter, like email addresses, website URLs, etc.
When you do an alphabetical sort in UltraEdit and you have numbers, the values 1,11,111 would be sorted before 2,22,222 as 1 comes before 2. This option causes numbers to be sorted in numeric order, resulting in “1,2,11,12,111,222” instead. You should only use this option with numeric characters, otherwise you likely won’t get the results you expect.
Tab delimited sort / Custom delimited sort
We will look at these options in the sort text by field (CSV) section below.
Sort columns (keys)
We will look at these options in the Sort lines by key (text in a specific position) section below.
Often times you need to sort lines based upon a specific range or column of text instead of the entire line. For example, let’s say you have some source code where you’re declaring a few variables, and you like to keep things neat and tidy (albeit inefficient), so you want these variables to be declared in alphabetical order.
Use locale (slower)
If you check this option, then UltraEdit will sort using the locale you’ve selected under Advanced » Settings » File handling » Encoding, under the “Locale” drop down. This is useful if you are sorting text in a non-English language. For example, German text may include umlauts which need to be properly accounted for in a sort. The “Use locale” option, along with the aforementioned setting, allow you to do this. Keep in mind that this will require more processing of the sort and therefore may take longer, especially with large sets of data.
Sort lines by key (text in a specific position)
Often times you need to sort lines based upon a specific range or column of text instead of the entire line. For example, let’s say you have some data lined up in columns, and you want to sort all of the data – the full lines – by the second column. In our example below, the second column is comprised of last names, which is how we want to sort these lines.
We want to sort the lines based upon the data from column 22 to column 40. This is called our sort “key”, and we can define it by going into the advanced sort options dialog and setting it there.
Note: If you first select the text in the key column on one of the lines in the file, then when you go into the advanced sort options dialog, the start and end columns will automatically populate the “Key 1” field.
Click the Sort button, and the lines are sorted according to the data in that column.
You can set up to four of these sort keys. If you don’t want to use all the keys, make sure that each unused key’s start and end columns are both set to “0”.
If Remove duplicates is checked above, then you can check the “RD” column so that the corresponding sort key is evaluated in whether or not the line is a duplicate.
There are two special values that you can use for sort key columns:
- A start column of “0” represents the current location of the caret in the file, and
- An end column of “-1” represents the end of the line.
You can use both of these values in combination with actual column numbers, or together with each other. So for example, a start column of “20” and an end column of “-1” means “Sort starting at column 20 to the end of each line.” And a start column of “0” and an end column of “-1” means “Sort from the current caret position in the file to the end of each line.”
Sort text by field (CSV)
Often times, data is structured in a character-separated value (CSV) file. In this format, each field might be of varying width, which makes key or column-based sorting impossible.
Fortunately, UltraEdit provides a solution! The “Tab delimited sort” and “Custom delimited sort” options in the sort options dialog allow you to sort your data based on logical field instead of column positions.
Take for example the following CSV data:
Suppose we need to sort this by the unique identifier (UID) in the third field in the file. In order to do this, we can check the “Custom delimited sort” option and then specify “,” (a comma) as the delimiter. We’ll also set the first “Field” entry to “3”.
If the data fields are separated by tabs (as they are when copied and pasted from Excel), then you can check the “Tab delimited sort” option. Otherwise, use the “Custom delimited sort” and make sure to enter the field delimiter for the data.
For the “Start character” and “end character” fields, we will leave these set to “0” so that all characters in the field are evaluated. However, these options do allow you to specify only a portion of the field that should be evaluated for the sort, if you need it.
Now we can click Sort and all lines are sorted by the third “token” field.
If we convert the CSV data to fixed columns, we can more easily see that the file has been sorted by the data in field 3:
To learn more about how UltraEdit can help you edit CSV files, visit our working with CSV files power tip.