IDM PowerTips

Regular expressions

Note: For many years now, UltraEdit and UEStudio have included full support for Perl-compatible regular expressions. We highly recommend that you learn and use this type of regular expression syntax, as it is far more powerful (and in some cases even simpler) than the regular expression types covered in this power tip.

Perl regular expressions power tips:  Getting started  |  Backreferences  |  Digging deeper  |  Non-greedy regex

Within the context of UltraEdit and UEStudio, regular expressions (or regex, for short) are patterns (rather than specific strings) that are used with find and replace. There are many ways that regular expressions may be used to streamline operations and enhance efficiency. We have listed below a reference key for both UltraEdit’s proprietary legacy regex and Unix style regex, as well as some examples to demonstrate how to use them in the editor.

    Regex in UltraEdit / UEStudio
    UltraEdit symbolUnix symbolFunction
    %^Matches beginning of line (positional match).
    $$Matches end of line (positional match).
    ?.Matches any single character except a new line character. Does not match repeated new lines.
    * Matches any number of occurrences of any character except new line.
    ++Matches one or more of the preceding single character/character set. At least one occurrence of the preceding character or at least one of the characters in preceding character set must be found.
    ++*Matches the preceding single character/character set zero or more times.
    ^\Indicates the next character has a special meaning. "n" on its own matches the character "n". "^n" (UE syntax) or "\n" (Unix syntax) matches a line feed (LF, or hex 0A) character. See examples below.
    [ ][ ]Matches any single character or range in the brackets.
    [~xyz][^xyz]A negative character set. Matches any characters NOT between brackets.
    ^b\fMatches a page break/form feed character.
    ^p\pMatches a newline (CR/LF, or hex 0D 0A) (paragraph) (Windows line terminators).
    ^r\rMatches a newline (CR, or hex 0D) (Mac legacy line terminators).
    ^n\nMatches a newline (LF, or hex 0A) (Unix line terminators).
    ^t\tMatches a tabstop.
    [0-9]\dMatches any digit character.
    [~0-9]\DMatches any non-digit character.
    [ ^t^b]\sMatches any whitespace including space, tab, form feed, etc., but not new line characters.
    [~ ^t^b]\SMatches any non-whitespace character but will still match new line characters.
    \vMatches a vertical tab character.
    [0-9a-z_]\wMatches any alphanumeric character including underscore.
    [~0-9a-z_]\WMatches any character except alphanumeric characters and underscore.
    ^{A^}^{B^}(A|B)Matches expression A or B.
    ^\Overrides the following regular expression character.
    ^(...^)(...)Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
    ^1\1Numerical reference to tagged expressions. Text matched with tagged expressions may be used in replace commands with this format.
    Note: ^ refers to the character '^', not Ctrl + value.

    UltraEdit / Unix regular expressions examples

    Simple string matching

    Simple string matching is probably the most basic form of regular expressions but can allow you to quickly exploit different patterns so that you can search for more than one string at a time rather than doing multiple Find operations.

    UltraEdit legacy regex:

    Find what: m?n
    Matches: “man” and “men” but not “moon”

    Find what: t*t
    Matches: “test”, “tonight” and “tea time” (the “tea t” portion) but not “tea
    time” (newline between “tea ” and “time”).

    Find what: Te+st
    Matches: “test”, “teest”, “teeeest”, etc. but does not match “tst”

    Unix regex:

    Find what: m.n
    Matches: “man” and “men” but not “moon”

    Find what: t.*t
    Matches: “test”, “tonight” and “tea time” (the “tea t” portion) but not “tea
    time” (newline between “tea ” and “time”).

    Find what: Te+st
    Matches: “test”, “teest”, “teeeest”, etc. but does not match “tst”

    Character Sets

    A character set is a group of characters encapsulated by “[” and “]”. These may be used to designate specific characters to be matched or ranges, for example [aeud] or [a-z].

    UltraEdit legacy regex:

    Find what: [aeiou]
    Matches: every vowel

    NOTE: Regular expressions in UltraEdit are not case-sensitive unless Match case is enabled in the find dialog.

    Find what: [,.^?]
    Matches: a literal “,”, “.” or “?”.

    Because the “?” is a symbol used in expressions it must be “escaped” for the literal character to be matched rather than interpreted as an expression.

    Find what: [0-9a-z]
    Matches: any digit or letter

    Find what: [~0-9]
    Matches: any character except a digit (~ means NOT the following)

    Unix regex:

    Find what: [aeiou]
    Matches: every vowel

    Find what: [,\.?]
    Matches: a literal “,”, “.” or “?”.

    Because the “.” is a special symbol used in expressions it must be “escaped” for the literal character to be matched rather than interpreted as an expression.

    Find what: [0-9a-z]
    Matches: any digit or letter

    Find what: [^0-9]
    Matches: any character except a digit (^ means NOT the following)

    “Or” Expressions

    Currently UltraEdit legacy regex allows for only two strings in an “or” expression.

    UltraEdit legacy regex:

    Find what: ^{John^}^{Tom^}

    Unix regex:

    Find what: (John|Tom|Dick|Harry)

    There should be nothing between the two expressions. You may combine “A or B” and “C or D” in the same search as follows:

    UltraEdit legacy regex:

    Find what: ^{John^}^{Tom^} ^{Smith^}^{Jones^}

    Unix regex:

    Find what: (John|Tom) (Smith|Jone)

    This will search for “John” or “Tom” followed by “Smith” or “Jones”.

    Deleting blank lines

    With regular expressions selected in the replace dialog this will match the a CR/LF (DOS line terminator) immediately followed by the end of a line (i.e., a blank line) and replace it with nothing, effectively deleting it:

    UltraEdit legacy regex:

    Find what: ^p$
    Replace With: (literally nothing)

    Unix regex:

    Find what: \p$
    Replace With: (literally nothing)

    Reformatting text with tagged expressions

    Example 1:

    Tagged expressions may be used to mark various data members so that they may be reorganized, reformatting the data. For example, it might be useful to be able to rearrange:

    John Smith, 385 Central Ave., Cincinnati, OH, 45238

    into:

    45238, Smith, John, 385 Central Ave., Cincinnati, OH

    UltraEdit legacy regex:

    Find what: %^([a-z]+^) ^([a-z]+^), ^(*^), ^(*^), ^(*^), ^([0-9]+^)
    Replace With: ^6, ^2, ^1, ^3, ^4, ^5

    Unix regex:

    Find what: ^([a-z]+) ([a-z]+), (.*), (.*), (.*), ([0-9]+)
    Replace With: \6, \2, \1, \3, \4, \5

    Example 2:

    If you have a web-based registration system it might be useful to rearrange the order data into a format easily used by a database:

    name = John Smith
    address1 = 385 Central Ave.
    address2 =
    city = Cincinnati
    state = OH
    zip = 45238

    into:

    John Smith, 385 Central Ave.,, Cincinnati, OH, 45238,

    This can be done with the following expression:

    UltraEdit legacy regex:

    Find what: name = ^([a-z ]+^)^paddress1 = ^([a-z 0-9.,]+^)^paddress2 = ^([a-z 0-9.,]++^)^pcity = ^([a-z]+^)^pstate = ^([a-z]+^)^pzip = ^([0-9^-]+^)
    Replace With:^1, ^2, ^3, ^4, ^5, ^6

    Unix regex:

    Find what: name = ([a-z ]+)\paddress1 = ([a-z 0-9.,]+)\paddress2 = ([a-z 0-9.,]*)\pcity = ([a-z]+)\pstate = ([a-z]+)\pzip = ([0-9^-]+)
    Replace With:\1, \2, \3, \4, \5, \6