Summary: in this tutorial, you will learn about the sets and ranges in regular expressions.
Sets
The square brackets search for any character in a set. For example, [aeiou]
matches any of the five characters: 'a'
, 'e'
, 'i'
, 'o'
and 'u'
. The [...]
is called a set.
For example, the regular expression /[cbr]ats/g
matches cats
, bats
, and rats
:
let str = 'How cats, rats, and bats became Halloween animals';
let re = /[cbr]ats/g;
let results = str.match(re);
console.log(results);
Code language: JavaScript (javascript)
Output:
["cats", "rats", "bats"]
Code language: JavaScript (javascript)
Ranges
The square brackets can contain character ranges. For example, [a-z]
is a character range from a
to z
. And [0-9]
is a digit from 0 to 9.
The [a-zA-Z0-9_]
is the same as \w
. The [0-9]
is the same as \d
.
Excluding ranges
To negate a range, you use the excluding range like: [^...]
.
For example, [^0-9]
matches any character except a digit. It is the same as \D
.
Or, the [^aeiou]
matches any character except 'a'
, 'e'
, 'i'
, 'o'
and 'u'
.
Escape special characters
Typically, you use a backslash to escape a special character e.g., \.
. However, in square brackets, you don’t need to escape most of the special characters except they have a meaning for the square brackets.
For example, if the caret (^
) is at the beginning of a string, you need to escape it:
[\^#$]
Code language: JavaScript (javascript)
If the caret is not at the beginning of a string (^
), you do not need to escape:
[#^$]
Code language: JavaScript (javascript)
The flag u
If a set has surrogate pair, you need to add the flag u
to the regular expression to make it work correctly:
let result = 'It is ๐'.match(/[๐๐
๐]/);
console.log(result);
Code language: JavaScript (javascript)
Output:
["๏ฟฝ"]
Code language: JavaScript (javascript)
In this example, the [๐๐ ๐] has six characters, not three:
let str = '๐๐
๐';
for(let i=0; i<str.length; i++) {
console.log(str.charCodeAt(i));
}
Code language: JavaScript (javascript)
Output:
55356
57166
55356
57157
55356
57171
Code language: JavaScript (javascript)
If you add the flag u
, then the behavior will be correct:
let result = 'It is ๐'.match(/[๐๐
๐]/u);
console.log(result);
Code language: JavaScript (javascript)
Output:
["๐"]
Code language: JavaScript (javascript)
Summary
- Use
[...]
to construct a set to match any character in it. - Use the
-
inside a set to construct a range to match any character in the range. - Use the
^
to negate a range.