Perl regex tutorial: non-greedy expressions

Have you ever built a complex Perl-style regular expression, only to find that it matches much more data than you anticipated? If you’ve ever found yourself pulling your hair out trying to build the perfect regular expression to match the least amount of data possible, then non-greedy Perl regex are what you need.

By default, Perl regular expressions are greedy, meaning they will match as much data as possible before a new line. Even if the conditions of the regular expression have been met, but a line break has not yet occurred, the regular expression will continue searching for data that satisfies the search criteria.

By using non-greedy Perl-style regular expressions, you can prevent this from occurring and stop the search as soon as the search criteria has been satisfied. Read on to find out how this unique feature of Perl-style regular expressions can save you time and frustration!

For more information on Perl-style regular expressions, visit our power tip on this subject.

Non-greedy Perl regular expressions

Typically, when using Perl regular expressions to match strings of data, normal Perl regular expression syntax will match as much data as possible. For example, if you want to search for an HTML hyperlink using the following Perl regular expression:

<a href=".*</a>

On the following text:

<ul class=”dropdown dd”>
<li><a href=”support/tutorials-power-tips/” title=”Power tips”>Power Tips &amp; Tutorials</a></li><li><a href=”http://wiki.ultraedit.com/Main_Page” title=”UltraEdit text editor wiki”>Wiki documentation</a></li><li><a href=”http://forums.ultraedit.com/” title=”User forums”>User forums</a></li><li><a href=”support/faq/” title=”IDM software FAQ”>FAQ</a></li><li><a href=”resources.html” title=”Resources for IDM software”>Resources</a></li><li><a href=”support/” title=”Technical support”>Tech support</a></li>
</ul>

…then, everything from the first <a href... to the last </a> on the same line (as highlighted in red) is matched by the regular expression. This is undesirable as the purpose of the regular expression is to match one hyperlink at a time, whereas this regular expression is matching two hyperlinks and the normal text between on the same line.

This is where non-greedy regular expressions are useful. To use non-greedy Perl-style regular expressions, the ? (question mark) may be added to the syntax, usually where the wildcard expression is used.

In our above example, our wildcard character is the .* (period and asterisk). The period will match any character except a null (hex 00) or new line. The asterisk will match the previous character zero or more times. So a dot followed by a star in Perl regex syntax literally means match any character zero or more times.

To add in the non-greedy operator, we simply need to add a ? to the end of our wildcard operators. So, our new, non-greedy regular expression would look like this:

<a href=".*?</a>

Our non-greedy ? operator tells the Perl regular expression engine to match as little data as possible. As soon as all conditions of the regular expression have been met, the search will end. So now using our above example, only the highlighted text below would be matched:

<ul class=”dropdown dd”> ;<li><a href=”support/tutorials-power-tips/” title=”Power tips”>Power Tips &amp; Tutorials</a></li><li><a href=”http://wiki.ultraedit.com/Main_Page” title=”UltraEdit text editor wiki”>Wiki documentation</a></li><li><a href=”http://forums.ultraedit.com/” title=”User forums”>User forums</a></li><li><a href=”support/faq/” title=”IDM software FAQ”>FAQ</a></li><li><a href=”resources.html” title=”Resources for IDM software”>Resources</a></li><li><a href=”support/” title=”Technical support”>Tech support</a></li> </ul>

As you can see from our above example, using non-greedy Perl-style regular expressions can prevent much heartache when doing search and replace functions on HTML, XML, PHP, and virtually any other file where matched data must be limited.