IDM PowerTips

Use Perl regex backreferences to reformat your text

Perl regular expressions by themselves are very powerful, but when used in tandem with UltraEdit’s powerful Find/Replace engine, you can take your searches to a new level. One of the most useful features of Perl regexes is the backreference, which allows you to recall and use data from your Find regex with your Replace regex. It’s a simple but powerful way to reformat large sets of data with a single Replace operation…which could potentially save you hours of painstaking manual editing!

If you’re not already familiar with Perl regular expressions, make sure to read our powertips on getting started with Perl regex and advanced Perl regex techniques. This power tip assumes a basic familiarity with Perl regular expression syntax.

What is a backreference?

A backreference, in the context of UltraEdit, is a reference to a piece of text that was matched by a portion of your regular expression. This portion is defined in your regex Find string by parentheses. So, for example, if you search for:

Ultra(\w+)

…and the regex matches “UltraEdit”, then “Edit” will be the data that can be backreferenced.

You can use multiple backreferences, and each backreference can be accessed by using \1, \2, \3, etc.; where the number following the backslash corresponds to the respective set of parentheses in your regular expression. So in the following example:

(Ultra)(\w+)

\1 would backreference “Ultra” (matched by the first set of parentheses), and \2 would backreference “Edit” (matched by the second set of parentheses). In UltraEdit, you can use a maximum of 9 backreferences.

Using backreferences

Now that you understand the concept of Perl regex backreferences, you may be wondering how to actually use them. The most common use for backreferences will be in your Replace string, however they can be used in your Find string as well.

Let’s say you have a list of URLs, like the following, which you wish to enclose in HTML link tags.

Here is the link for http://ultraedit.com/index.html
You may also be interested in http://www.google.com
http://en.wikipedia.org/wiki/Main_Page is also another informative site

You could manually add the HTML link code to each URL, but this could prove to be extremely time-consuming if your list is quite large and also creates a margin for error. Using a Perl regular expression replace with backreferences, however, will make this an incredibly simple task!

The first step is to create the Perl regex which will identify a URL pattern. Of course, you can always test your Perl regex with a regular find. You may come up with something like the following:

https?://[\w\d/=+&~:#@!,;\.\$\?\[\]\(\)-]*\.?

…which will match all of the URLs in our sample data.

The next step is to enclose the portion of the regular expression we wish to backreference in parentheses. In this case, we want to backreference the entire URL. That leaves us with the following:

(https?://[\w\d/=+&~:#@!,;\.\$\?\[\]\(\)-]*\.?)

Now to use the backreference, we’ll need to determine how we want to modify the data. In this case, we want to enclose the URL with an HTML hyperlink to the URL, and then use the URL again as the hyperlinked text. For example, “http://ultraedit.com/index.html” should become:

<a href=”http://ultraedit.com/index.html”>http://ultraedit.com/index.html</a>

We can start building our replace string by simply replacing the URL with itself, because we want to preserve the URL as part of the replace:

Find What: (https?://[\w\d/=+&~:#@!,;\.\$\?\[\]\(\)-]*\.?)
Replace With: \1

We know we need to add HTML code around the URL to hyperlink it, so let’s add that to our replace string:

Find What: (https?://[\w\d/=+&~:#@!,;\.\$\?\[\]\(\)-]*\.?)
Replace With: <a href=””>\1</a>

…but this still isn’t complete. We still need to add the URL as the actual hyperlink. To do this, we can use the backreference again, like so:

Find What: (https?://[\w\d/=+&~:#@!,;\.\$\?\[\]\(\)-]*\.?)
Replace With: <a href=”\1″>\1</a>

…and that should do it. With a single click on the “Replace All” button, all URLs in our text have been converted to hyperlinks!