JavaScript Regex Guide

JavaScript's regex engine has caught up with the rest of the world over the last decade — lookbehind, named groups, Unicode property escapes, sticky matching, and matchAll all landed in modern engines. This guide is a practical reference: which method to call, which flag to set, which gotcha to watch for, and where the engine differs from PCRE, Python, or Java. Pair it with the regex tester and the explainer to learn by experimenting.

Two Ways to Create a Regex

JavaScript has two regex syntaxes: a literal with slashes, and the RegExp constructor:

// Literal — pattern is fixed at parse time
const literal = /\d{3}-\d{4}/g;

// Constructor — pattern can be built from variables
const dynamic = new RegExp(`\\d{${minDigits}}`, 'g');

Use the literal form whenever the pattern is known at write time. It is more readable and the parser checks the pattern at parse time rather than at first use. Use new RegExp() only when you need to build the pattern from runtime input — but be careful to escape user input first, or you create a regex injection vulnerability.

The literal needs single backslashes (/\d/) while the constructor needs doubled ones ("\\d") because the string parser eats one. This trips up everyone at least once.

Flags

JavaScript supports seven regex flags. Combine freely:

const re = /pattern/gimsuy;
  • g (global) — find all matches, not just the first. Required for matchAll, replaceAll, and most extraction tasks.
  • i (case-insensitive) — match letters regardless of case.
  • m (multiline) — make ^ and $ match line boundaries instead of string boundaries.
  • s (dotAll) — make . match newline characters. Without this, . skips \n.
  • u (Unicode) — enable proper Unicode handling. Treats surrogate pairs as one character and unlocks \p{...} property escapes.
  • y (sticky) — match only at the position indicated by lastIndex. Useful for tokenizers; rare in everyday code.
  • d (hasIndices) — added in ES2022. Each match includes indices with the start/end positions of every capture group.

The g flag is the source of most regex bugs in JavaScript. See the section on lastIndex below.

String Methods vs RegExp Methods

JavaScript exposes regex through both the String prototype and RegExp objects:

String methods (regex is the argument)

str.match(re)        // returns array of matches, or null
str.matchAll(re)     // returns iterator of detailed match objects (g flag required)
str.replace(re, fn)  // replace first match (or all if g flag set)
str.replaceAll(re, fn) // replace all (g flag required if regex)
str.split(re)        // split by matches
str.search(re)       // index of first match, or -1

RegExp methods (regex is the receiver)

re.test(str)         // returns true/false — the cheapest check
re.exec(str)         // returns one detailed match; advances lastIndex if g flag set

Picking between them:

  • Just checking a match exists? re.test(str) — fastest and clearest.
  • Want all matches with capture groups? str.matchAll(re) — returns an iterator of full match objects.
  • Replacing or transforming? str.replace or str.replaceAll with a callback.
  • Tokenizing with positional control? re.exec in a loop with the y flag.

The lastIndex Trap

Regex objects with the g or y flag carry a mutable lastIndex property. Every call to exec or test updates it to the position after the match, and the next call resumes from there. If you reuse the same regex, this produces results that look random:

const re = /\d+/g;
re.test('abc123');  // true  — matched, lastIndex = 6
re.test('abc123');  // false — resumed from index 6, no more matches
re.test('abc123');  // true  — wrapped to 0, matched again

Three solutions, in order of preference:

  1. Use matchAll or match instead of exec in a loop. They handle iteration internally without exposing lastIndex.
  2. Construct the regex inline if you only use it once: str.match(/\d+/g).
  3. Reset explicitly: re.lastIndex = 0 before reuse — but this is easy to forget.

This bug is so common that many style guides forbid the g flag with test entirely. If you only need a boolean, use test without g.

Named Capture Groups

JavaScript supports named groups since ES2018:

const m = '2026-04-30'.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
m.groups.year;   // '2026'
m.groups.month;  // '04'
m.groups.day;    // '30'

// In replace callbacks
'2026-04-30'.replace(
  /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
  (_, ...args) => {
    const groups = args[args.length - 1];
    return `${groups.day}/${groups.month}/${groups.year}`;
  }
);
// '30/04/2026'

// In replacement strings
'2026-04-30'.replace(
  /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
  '$<day>/$<month>/$<year>'
);

Backreferences to named groups use \k<name> in the pattern. The named-group syntax is identical to .NET's, but differs from older Python which uses (?P<name>). When porting regex from Python, do find-and-replace: (?P<(?<, (?P=name)\k<name>.

Lookahead and Lookbehind

Lookarounds match a position based on what comes before or after, without consuming any characters:

// Positive lookahead — followed by " dollars"
'100 dollars'.match(/\d+(?= dollars)/);  // ['100']

// Negative lookahead — NOT followed by " euros"
'100 dollars'.match(/\d+(?! euros)/);    // ['100']

// Positive lookbehind — preceded by "$"
'$50'.match(/(?<=\$)\d+/);                // ['50']

// Negative lookbehind — NOT preceded by "$"
'50 items'.match(/(?<!\$)\d+/);            // ['50']

JavaScript's lookbehind landed in ES2018 (Chrome 62, Firefox 78, Safari 16.4, Node 10). Earlier browsers throw a SyntaxError. Unlike Java's regex engine, JavaScript supports variable-length lookbehind, so patterns like (?<=\$\d*) work fine.

Full coverage in the lookahead and lookbehind guide.

Unicode and the u Flag

Without u, JavaScript treats regex as a sequence of UTF-16 code units. Characters above U+FFFF (emoji, rare scripts) are surrogate pairs that the engine sees as two characters, breaking the otherwise-intuitive idea that . matches one character. The u flag fixes this:

// Without u — broken for non-BMP characters
/^.$/.test('💩');      // false — emoji is two code units

// With u — correct
/^.$/u.test('💩');     // true

The u flag also enables Unicode property escapes\p{...} and \P{...} for any character property defined in the Unicode standard:

/\p{Letter}/u           // any letter from any script
/\p{Script=Greek}/u     // Greek letters
/\p{Number}/u           // digits in any script (including Eastern Arabic, Devanagari, etc.)
/\p{Emoji}/u            // emoji

Always use u when matching anything potentially non-ASCII. The performance cost is negligible.

Replacement Callback Patterns

The replace callback gets the full match, capture groups, offset, original string, and (for named groups) a groups object:

'$15.50'.replace(/\$(\d+)\.(\d{2})/, (match, dollars, cents, offset, str, groups) => {
  return `${dollars} dollars and ${cents} cents`;
});
// '15 dollars and 50 cents'

Common patterns:

// HTML escape
str.replace(/[&<>"']/g, c => ({
  '&':'&amp;', '<':'&lt;', '>':'&gt;', '"':'&quot;', "'":'&#39;'
}[c]));

// camelCase → kebab-case
str.replace(/([A-Z])/g, '-$1').toLowerCase();

// Strip leading/trailing whitespace (or use String.prototype.trim)
str.replace(/^\s+|\s+$/g, '');

// Highlight matches in HTML (escape user input first!)
text.replace(new RegExp(escape(query), 'gi'), m => `<mark>${m}</mark>`);

Common Gotchas

  • Forgetting to escape user input in new RegExp() — a malicious input like .* turns your filter into a wildcard. Use a small escapeRegex helper or a library like lodash.escapeRegExp.
  • Catastrophic backtracking — patterns like (a+)+b can hang the JS engine on long inputs. Avoid nested quantifiers; prefer atomic alternatives or the *? lazy quantifier.
  • Forgetting the g flag with replace — without it, only the first match is replaced. replaceAll requires g if the first argument is a regex (otherwise it throws).
  • Confusing $& and $1 in replacement strings — $& is the whole match; $1 is the first capture group; $$ is a literal $.
  • Greedy quantifiers eating too much — for "match between two delimiters," use .*? (lazy) or a negated character class like [^<]+.

Try Patterns Live

Paste any pattern from this guide into the regex tester with sample text to see matches in real time. For unfamiliar tokens, the regex explainer gives a plain-English breakdown of every piece. Start from the pattern library for ready-made email, URL, IP, and date patterns.

Frequently Asked Questions

Should I use String methods or RegExp methods in JavaScript?

Use String methods (match, matchAll, replace, replaceAll, split, search) when the regex is the input and the string is what you have on hand — they read naturally as text operations. Use RegExp methods (test, exec) when you have a long-lived regex object you reuse, or when you need exec's positional state via lastIndex for tokenizer-style iteration. test() is also the cheapest way to check if a pattern matches at all, since it returns a boolean and can stop on the first match.

Why does my regex with the g flag behave inconsistently?

RegExp objects with the g flag carry a lastIndex property that exec and test mutate after each call. If you reuse the same regex across calls, the next call resumes from where the last one left off, which produces surprising results. Solutions: use String.prototype.matchAll instead of repeated exec calls, reset lastIndex to 0 between batches, or construct a fresh regex per batch. matchAll returns an iterator and avoids the stateful lastIndex pitfall entirely.

Does JavaScript support lookbehind?

Yes — lookbehind landed in ES2018 and is supported in all modern browsers (Chrome 62+, Firefox 78+, Safari 16.4+) and Node.js 10+. Use (?<=pattern) for positive lookbehind and (?<!pattern) for negative lookbehind. Unlike some engines, JavaScript supports variable-length lookbehind, so patterns like (?<=\$\d*) work. If you must support very old browsers, fall back to a capturing group and slice the result.

How do I match Unicode characters in JavaScript regex?

Add the u flag for proper Unicode handling: surrogate pairs are treated as one character, character classes like \w respect ASCII rules unless extended, and Unicode property escapes \p{...} become available. With the u flag, /\p{Letter}/u matches any letter from any script, /\p{Emoji}/u matches emoji, and /\p{Script=Greek}/u matches Greek letters. Without the u flag, JavaScript uses the older byte-level regex semantics, which mishandles characters above U+FFFF.

What are named capture groups in JavaScript regex?

Named groups use the syntax (?<name>pattern), and matches are accessed via match.groups.name or in a replace callback's groups argument. They make complex patterns far more readable than relying on positional groups. JavaScript's syntax matches .NET and modern Python, but differs from older Python which uses (?P<name>pattern). When migrating regex from Python, replace (?P< with (?< and (?P= with \k< for backreferences.