Rust Regex Guide

Rust's regex support lives in the regex crate, not the standard library. Its engine is built for guaranteed linear-time matching — which means no backreferences and no lookaround, the same trade-off Go's RE2 makes. The basics are clean, but a few Rust-specific habits (compiling once, handling the Result from Regex::new, Unicode-by-default) matter for correctness and speed. This guide is a practical reference. Pair it with the regex tester and the explainer to confirm any pattern you write.

Add the Crate

Rust's regex engine is not in the standard library. Add it to your project:

cargo add regex
# or add to Cargo.toml manually:
# [dependencies]
# regex = "1"

The crate is maintained by the Rust project itself (rust-lang/regex) and is the de facto standard. There is no second contender for general-purpose regex in Rust.

No Backtracking — and Why That Matters

The regex crate uses finite-automata matching, the same family as Go's RE2. The headline consequence: there are no backreferences and no lookaround assertions. In exchange, matching is guaranteed linear in the size of the input — there is no input that can cause exponential blow-up, so the entire class of catastrophic-backtracking and ReDoS bugs simply does not exist.

use regex::Regex;

// This compiles and runs in linear time:
let re = Regex::new(r"\b\w+@\w+\.\w+\b").unwrap();

// This FAILS to compile the pattern — backreference \1 is not supported:
let bad = Regex::new(r"(\w+) \1");  // Err(...) at runtime

If you truly need backreferences or lookaround, reach for the fancy-regex crate, which layers a backtracking engine on top of regex for exactly those features. For everything else, the standard crate is faster and safer.

The Core Methods

A compiled Regex exposes a small, predictable surface:

use regex::Regex;

let re = Regex::new(r"\d+").unwrap();

re.is_match("abc 123")          // true  — does it match anywhere?
re.find("abc 123")              // Some(Match { 4..7 }) — first match
re.find_iter("a1 b22 c333")     // iterator of Matches
re.captures("x=42")             // Option<Captures> — groups for first match
re.captures_iter("x=1 y=2")     // iterator of Captures
re.replace("a1 b2", "#")        // Cow<str> — replace first
re.replace_all("a1 b2", "#")    // Cow<str> — replace all
re.split("a1b22c")              // iterator of &str pieces

Note that find and captures return Option, and the iterator methods are lazy — they do not allocate a vector unless you collect one. On large inputs, prefer find_iter/captures_iter over collecting everything up front.

Raw Strings

Like Python, Rust has raw string literals so you do not have to double every backslash. Use r"...", and switch to r#"..."# when the pattern itself contains a double quote:

let re = Regex::new(r"\b\d{4}-\d{2}-\d{2}\b").unwrap();   // raw string
let re = Regex::new(r#"\"(\w+)\""#).unwrap();              // pattern contains "

Without the r prefix you would write "\\b\\d{4}", which is noisier and error-prone. Make raw strings your default for every pattern.

Compile Once with LazyLock

Building a Regex parses and compiles the pattern, which is far more expensive than matching against it. Never construct a regex inside a loop or a hot function. Compile it once into a static and reuse it:

use regex::Regex;
use std::sync::LazyLock;   // stable since Rust 1.80

static EMAIL: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(r"[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}").unwrap()
});

fn is_email(s: &str) -> bool {
    EMAIL.is_match(s)
}

On Rust versions before 1.80, use the once_cell crate's Lazy, which is identical in spirit. Because Regex is Sync, this single instance is safe to share across threads without a mutex. Compiling once is the most important performance habit in the crate — benchmarks that "show regex is slow" almost always compile inside the timing loop by mistake.

Named and Numbered Groups

The crate supports both (?P<name>...) and the shorter (?<name>...) syntax. Access captures by index or by name:

use regex::Regex;

let re = Regex::new(r"(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})").unwrap();
let caps = re.captures("2026-06-10").unwrap();

&caps["year"]            // "2026" — panics if the group did not participate
caps.name("month")       // Some(Match "06") — the safe, Option-returning form
&caps[0]                 // "2026-06-10" — the whole match
&caps[1]                 // "2026" — by number

Indexing with caps["name"] panics if the group is missing; caps.name("name") returns an Option and is the safer choice when a group is optional.

Replacement and Expansion

replace and replace_all support $name / ${name} and $1 expansion in the replacement string:

let re = Regex::new(r"(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})").unwrap();

re.replace_all("2026-06-10", "$d/$m/$y");      // "10/06/2026"
re.replace_all("2026-06-10", "${m}-${d}");     // braces disambiguate

A literal $ in the output is written $$. When the next character after $name could be read as part of the name, use the brace form ${name}. For computed replacements, pass a closure that receives the Captures:

let re = Regex::new(r"\b\w+\b").unwrap();
let shouted = re.replace_all("hello world", |caps: &regex::Captures| {
    caps[0].to_uppercase()
});
// "HELLO WORLD"

Flags and RegexBuilder

Inline flags work at the start of a pattern, and there is a builder for setting them programmatically:

Regex::new(r"(?i)hello").unwrap();                 // case-insensitive
Regex::new(r"(?im)^\d+").unwrap();                 // case-insensitive + multiline
Regex::new(r"(?x) \d{4} - \d{2} # ignored").unwrap(); // verbose (whitespace ignored)

use regex::RegexBuilder;
let re = RegexBuilder::new(r"^\d+$")
    .case_insensitive(true)
    .multi_line(true)
    .build()
    .unwrap();
  • (?i) — case-insensitive.
  • (?m)^ and $ match at line boundaries.
  • (?s). matches newlines.
  • (?x) — verbose mode: insignificant whitespace and # comments are ignored.
  • (?u) / (?-u) — enable or disable Unicode mode (on by default).

Bytes vs &str

The default regex::Regex matches against &str and assumes valid UTF-8. When you need to match arbitrary bytes — log files with invalid UTF-8, binary protocols — use regex::bytes::Regex, whose API is the same but operates on &[u8]:

use regex::bytes::Regex;

let re = Regex::new(r"(?-u)\x00+").unwrap();   // disable Unicode for byte matching
let data: &[u8] = b"a\x00\x00b";
re.is_match(data);                              // true

Note the (?-u) flag: byte regexes often disable Unicode mode so that classes like . match single bytes rather than whole UTF-8 code points.

Common Gotchas

  • Compiling in a loop. Building a Regex every iteration dwarfs the cost of matching. Hoist it into a LazyLock static.
  • Expecting backreferences or lookaround. They are unsupported by design. Restructure the pattern, or switch to fancy-regex.
  • unwrap() on a user-supplied pattern. An invalid pattern from input will panic. Propagate the Result with ? instead.
  • Indexing caps["name"] on an optional group. It panics if the group did not match. Use caps.name("name") for the Option.
  • Forgetting $$ for a literal dollar sign in replacement strings — a bare $ starts a group reference.
  • Assuming ASCII semantics. \w and \d are Unicode-aware by default; add (?-u) or use ASCII classes like [0-9] if you need byte-oriented behaviour.

Try Patterns Live

The regex tester uses JavaScript syntax, but the character-class and quantifier syntax overlaps almost completely with the Rust crate, so most patterns transfer directly — just remember that the crate has no backreferences or lookaround. Check the differences in the regex cheat sheet, and use the regex explainer for a token-by-token breakdown of any unfamiliar pattern.

Frequently Asked Questions

Does Rust's regex crate support backreferences and lookahead?

No. The regex crate uses finite-automata matching (the same approach as Go's RE2) to guarantee linear-time matching with respect to the input size, which structurally rules out backreferences and lookaround assertions. This is a deliberate design choice that prevents catastrophic backtracking and ReDoS. If you genuinely need backreferences or lookaround, use the fancy-regex crate, which wraps the regex crate and falls back to a backtracking engine for those features. For the vast majority of patterns, the standard regex crate is the right choice.

Why is the regex crate not in the Rust standard library?

Rust keeps its standard library small and pushes most functionality to the crates ecosystem. The regex crate is maintained by the Rust project itself (rust-lang/regex) and is the de facto standard, but you add it explicitly with cargo add regex or by listing regex in Cargo.toml. This keeps the standard library lean and lets the regex crate evolve on its own release cadence rather than being tied to Rust's six-week train.

How do I compile a Rust regex only once?

Building a Regex is relatively expensive, so never call Regex::new inside a loop or a frequently called function. Compile it once and reuse it. The idiomatic way in modern Rust is std::sync::LazyLock (stable since Rust 1.80) to create a lazily-initialized static, or the once_cell crate on older versions. Because Regex is Sync, a single compiled instance can be shared safely across threads. Compiling once and matching many times is the single most important performance habit with the regex crate.

Are Rust regexes Unicode-aware by default?

Yes. By default the regex crate matches on Unicode scalar values and the character classes are Unicode-aware — \w matches Unicode word characters, \d matches Unicode decimal digits, and you can use Unicode property escapes like \p{L}. If you want to match on raw bytes instead (for example, on data that is not valid UTF-8), use the regex::bytes module, whose Regex operates on &[u8] rather than &str. You can also disable Unicode mode per-pattern with the (?-u) flag.

How do I handle the Result returned by Regex::new?

Regex::new returns a Result because the pattern can be invalid at runtime. For a hard-coded pattern you trust, .unwrap() is acceptable inside a LazyLock initializer because a bad literal pattern is a programmer error you want to fail fast on. For a pattern that comes from user input or configuration, propagate the error with the ? operator or match on it, because an invalid pattern should not panic your program. The rule of thumb: unwrap trusted literals, handle dynamic patterns.