PHP Regex Guide

PHP's regex functions are the preg_ family, backed by PCRE — the most feature-complete regex engine in wide use, and the de-facto reference flavor that other languages emulate. The PHP-specific things to learn are the delimiter syntax (patterns are wrapped in /.../), the modifier letters that follow the closing delimiter, and the return-value quirks of preg_match. This guide is a practical reference. Pair it with the regex tester and the explainer to confirm any pattern.

Delimiters: The PHP-Specific Wrapper

Unlike most languages, a PHP regex pattern is a string that includes delimiter characters wrapping the actual pattern, with modifier letters after the closing delimiter:

//        ┌─ opening delimiter
//        │    ┌─ closing delimiter
//        │    │┌─ modifiers
preg_match("/\d+/i", $text);
//         └──┘ the actual pattern is \d+

The slash is conventional, but any non-alphanumeric, non-whitespace, non-backslash character works as a delimiter. Pick a different one when your pattern contains slashes, to avoid escaping them:

// Matching a URL path with slash delimiter — ugly, must escape every /
preg_match("/https?:\/\/example\.com\//", $url);

// Same pattern with # delimiter — clean
preg_match("#https?://example\.com/#", $url);

// Bracket-style delimiters also pair up
preg_match("~\d{4}-\d{2}-\d{2}~", $date);
preg_match("{\d+}", $text);

preg_match — First Match

preg_match writes results into a $matches array passed by reference, and returns 1, 0, or false:

$text = "Contact jane@example.com today";

if (preg_match('/(\w+)@(\w+\.\w+)/', $text, $matches) === 1) {
    echo $matches[0];   // jane@example.com  (full match)
    echo $matches[1];   // jane             (first group)
    echo $matches[2];   // example.com      (second group)
}

Note the === 1 strict comparison — see the gotchas section for why if (preg_match(...)) alone is risky.

preg_match_all — Every Match

$text = "jane@example.com and bob@test.org";
preg_match_all('/(\w+)@(\w+\.\w+)/', $text, $matches);

// Default PREG_PATTERN_ORDER: $matches[0] = all full matches,
// $matches[1] = all first-groups, etc.
print_r($matches[0]);
// ["jane@example.com", "bob@test.org"]
print_r($matches[1]);
// ["jane", "bob"]

// PREG_SET_ORDER groups each match's captures together — usually handier
preg_match_all('/(\w+)@(\w+\.\w+)/', $text, $matches, PREG_SET_ORDER);
print_r($matches[0]);
// ["jane@example.com", "jane", "example.com"]
print_r($matches[1]);
// ["bob@test.org", "bob", "test.org"]

The PREG_SET_ORDER flag flips the array shape so each element is one complete match with its groups — almost always what you want when iterating results in a foreach.

Named Groups

$text = "jane@example.com";
preg_match('/(?<user>\w+)@(?<domain>[\w.]+)/', $text, $m);

echo $m["user"];     // jane     — access by name
echo $m["domain"];   // example.com
echo $m[1];          // jane     — numbered access still works too

Both (?<name>...) and (?P<name>...) syntaxes work. The $matches array contains both string-keyed and integer-keyed entries for each named group, so you can use whichever is clearer.

Replacing

// preg_replace with backreferences ($1, $2, or ${1} before a digit)
echo preg_replace('/(\w+)@(\w+)/', '$2.$1', 'jane@example');
// "example.jane"

// preg_replace_callback for logic a template string can't express
echo preg_replace_callback('/\d+/', function ($m) {
    return $m[0] * 2;
}, 'a1 b2 c3');
// "a2 b4 c6"

// preg_split — split a string by a pattern
print_r(preg_split('/[\s,]+/', "a, b,c   d"));
// ["a", "b", "c", "d"]

In replacement strings, use ${1} with braces when the backreference is immediately followed by a literal digit (otherwise $11 is ambiguous between group 11 and group 1 followed by "1").

Pattern Modifiers

"/hello/i"    // i — case-insensitive
"/^line/m"    // m — multi-line: ^ and $ match at line boundaries
"/a.b/s"      // s — dotall: . matches newlines
"/\d+/u"      // u — UTF-8 mode (use whenever input may be non-ASCII)
"/ \d+ /x"    // x — extended: ignore whitespace, allow # comments
"/cat/D"      // D — $ matches only at very end, not before trailing \n

// Stack them after the closing delimiter
preg_match("/hello/imu", $text);

The u modifier deserves special attention — without it, PCRE operates byte-by-byte and will corrupt multi-byte UTF-8 characters. On the modern web, add u by default unless you specifically need byte-level matching.

Common Gotchas

The return-value trap

preg_match returns 1, 0, or false. Because both 0 and false are falsy, if (!preg_match($p, $s)) can't distinguish "no match" from "error." Use === 1 to test for a match and check === false separately for errors. Call preg_last_error() after a false return to find the cause.

Backtrack limits on large input

PHP caps regex backtracking via pcre.backtrack_limit (default 1,000,000). A catastrophic-backtracking pattern against large input doesn't hang — it returns false with PREG_BACKTRACK_LIMIT_ERROR. This silent failure is worse than a crash because the code continues with a false that looks like "no match." Test === false and check preg_last_error().

Forgetting the u modifier

Matching . against "café" without u treats the é as two bytes, so a substring extraction can split it and produce mojibake. Add u whenever input may contain non-ASCII characters.

Escaping the delimiter inside the pattern

If your pattern contains the delimiter character as a literal, you must escape it — or better, choose a delimiter that doesn't appear in the pattern. preg_quote($string, '/') escapes both regex metacharacters and the given delimiter when you build a pattern from a variable.

Try It Live

The regex tester lets you prototype the bare pattern — in PHP you'd wrap it in delimiters and add modifiers. The regex explainer breaks any pattern down token by token. For the same depth in other languages, see the Python and JavaScript regex guides.

Frequently Asked Questions

What are delimiters in PHP regex patterns?

PHP regex patterns must be wrapped in matching delimiter characters — the pattern string itself includes them. The conventional delimiter is the forward slash: preg_match("/\d+/", $text). The pattern modifiers (i, m, s, u, etc.) go after the closing delimiter: "/hello/i". When your pattern contains many literal slashes (like matching a URL path), pick a different delimiter to avoid escaping them all — common alternatives are #, ~, or @: "#https?://#" is cleaner than "/https?:\/\//". Brackets can also pair as delimiters: (...), {...}, [...], <...>.

What is the difference between preg_match and preg_match_all?

preg_match finds the first match and stops — it returns 1 if a match is found, 0 if not, or false on error. The matches are written into the third argument (passed by reference). preg_match_all finds every match in the string and returns the count. Their $matches array shapes differ: preg_match gives a flat array where index 0 is the full match and 1+ are capture groups; preg_match_all gives an array of arrays. Use preg_match for "does this match / extract the first" and preg_match_all for "give me every occurrence."

Why should I use the u modifier in PHP regex?

The u modifier turns on UTF-8 mode, which makes PCRE treat the pattern and subject as UTF-8 encoded rather than a sequence of bytes. Without it, . matches a single byte, so a multi-byte character (accented letter, emoji, CJK) is matched as several separate "characters" and can be split mid-character — corrupting the output. With u, . matches a whole code point, \w and \d can be made Unicode-aware, and Unicode property escapes like \p{L} (any letter) become available. Always add u when your input might contain non-ASCII text, which on the modern web is almost always.

How do I use named capture groups in PHP?

Define them with (?<name>...) or the equivalent (?P<name>...). After a match, the $matches array contains both the numbered index and the name as a string key — so $matches["user"] and $matches[1] both work for the first named group. In preg_replace replacement strings, reference a named group with $1 by number or ${1} when followed by a digit. Named groups make the $matches array self-documenting, which is especially valuable in PHP where the array is the only way to read results.

Why does preg_match return false instead of 0?

preg_match returns three possible values: 1 (matched), 0 (no match), or false (an error occurred — like a malformed pattern or a backtrack-limit overflow). Because 0 and false are both falsy, a careless if (!preg_match(...)) treats "no match" and "error" the same, hiding real bugs. Use strict comparison — if (preg_match(...) === 1) for matched, or check === false explicitly for errors. After a false return, preg_last_error() tells you what went wrong (commonly PREG_BACKTRACK_LIMIT_ERROR on a catastrophic-backtracking pattern against large input).