Regular Expression: Greedy

Summary: in this tutorial, you’ll learn about greedy quantifiers and how they work under the hood.

All quantifiers work in a greedy mode by default. This means that quantifiers will match their preceding elements as much as possible.

The following example illustrates how greedy quantifiers work.

The greedy quantifier example

Suppose you have an HTML string that represents a button element:

const button = '<button type="submit" class="btn">Send</button>';Code language: JavaScript (javascript)

And you want to match the texts surrounded by double quotes ("") like submit and btn.

To do that, you use the double quotes (“), dot (.) character class and the (+) quantifier to construct the following pattern:

/".+"/gCode language: JavaScript (javascript)

This pattern means that:

  • " starts with "
  • . matches any character except the newline
  • + matches the preceding character one or more times
  • " ends with "
  • g flag returns all matches

The following uses the match() method to match the string s with the pattern:

const s = '<button type="submit" class="btn">Send</button>';
const pattern = /".+"/g;

const result = s.match(pattern);
console.log(result);Code language: JavaScript (javascript)

Output:

['"submit" class="btn"']Code language: JavaScript (javascript)

It returns '"submit" class="btn"' instead of submit” and btn.

The reason is that in the greedy mode, the quantifier (+) tries to match the preceding element (".) as much as possible.

How greedy quantifiers work

First, the regex engine starts matching from the first character in the string s.

Next, because the first character < does not match the quote ("), the regex engine continues to match the next characters until it finds the first quote ("):

Then, the regex engine matches the string with the next rule .+ in the regular expression.

Since the .+ rule matches one or more characters, the regex engine matches the characters until it encounters the end of the string:

After that, the regex engine checks the last rule in the regular expression, which is a quote (“). However, there’s no more character to match because it already reached the end of the string. This means that the regex engine is too greedy by going too far.

Finally, the regex engine goes back from the end of the string to find the quote (“). This step is often referred to as backtracking.

As a result, the match is the following substring which is not what you might expect:

"submit" class="btn"Code language: JavaScript (javascript)

To resolve this issue, you need to instruct the quantifier (+) to use the non-greedy (or lazy) mode instead of the greedy mode. To do that, you add a question mark (?) after the quantifier like this:

/".+?"/gCode language: JavaScript (javascript)

The following script returns the expected result:

const s = '<button type="submit" class="btn">Send</button>';
const pattern = /".+?"/g;

const result = s.match(pattern)
console.log(result);Code language: JavaScript (javascript)

Output:

['"submit"', '"btn"']Code language: JavaScript (javascript)

Summary

  • Quantifiers use the greedy mode by default.
  • Greedy quantifiers match their preceding elements as much as possible.
Was this tutorial helpful ?