IDM PowerTips

Perl regex tutorial: non-greedy expressions

Have you ever built a complex Perl-style regular expression, only to find that it matches much more data than you anticipated? If you've ever found yourself pulling your hair out trying to build the perfect regular expression to match the least amount of data possible, then non-greedy Perl regex are what you need.

By default, Perl regular expressions are "greedy", meaning they will match as much data as possible before a new line. Even if the conditions of the regular expression have been met, but a line break has not yet occurred, the regular expression will continue searching for data that satisfies the search criteria.

By using "non-greedy" Perl-style regular expressions, you can prevent this from occurring and stop the search as soon as the search criteria has been satisfied. Read on to find out how this unique feature of Perl-style regular expressions can save you time and frustration!

For more information on Perl-style regular expressions, visit our power tip on this subject.

Non-Greedy Perl Regular Expressions

Typically, when using Perl-style regular expressions to match strings of data, normal Perl-style regular expression syntax will match as much data as possible. For example, if you want to search for an HTML hyperlink using the following Perl-style regular expression:

<a href=".*</a>

On the following source code:

Click to visit <a href=""></a>. Click to visit <a href=""></a>.

Everything from the first "<a href..." to the last "</a>" on the same line (as highlighted in red) will be matched by the regular expression. This is undesirable as the purpose of the regular expression is to match one hyperlink at a time, whereas this regular expression is matching two hyperlinks and the normal text between on the same line.

Non-Greedy Perl Regular Expressions

This is where non-greedy regular expressions are useful. To use non-greedy Perl-style regular expressions, the "?" (question mark) may be added to the syntax, usually where the wildcard expression is used.

In our above example, our wildcard character is the ".*" (dot and star). The dot will match any character except a null (hex 00) or new line. The star will match the previous character zero or more times. So a dot followed by a star in Perl regex syntax literally means match any character zero or more times.

To add in the non-greedy operator, we simply need to add a "?" to the end of our wildcard operators. So, our new, non-greedy regular expression would look like this:

<a href=".*?

Our non-greedy "?" operator now tells the regular expression engine to match as little data as possible. As soon as all conditions of the regular expression have been met, the search will end. So now using our above example, only the highlighted text below would be matched:

Click to visit <a href=""></a>. Click to visit <a href=""></a>.

Non-Greedy Perl Regular Expressions

As you can see from our above example, using non-greedy Perl-style regular expressions can prevent much heartache when doing search and replace functions on HTML, XML, PHP, and virtually any other file where matched data must be limited.