Java Regex Guide

Java's java.util.regex package splits regex into two classes — Pattern (the compiled regex) and Matcher (the engine applied to one input). Once you internalize that split and the double-backslash escaping that Java string literals force on you, the API is powerful and complete. This guide is a practical reference for the common operations and the Java-specific traps. Pair it with the regex tester and the explainer to confirm any pattern.

Pattern and Matcher: The Two-Class Split

Java separates the compiled regex (Pattern) from the act of applying it to an input (Matcher). This split is the core thing to understand:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

// Compile the pattern once (expensive — do it once and reuse)
Pattern pattern = Pattern.compile("(\\w+)@(\\w+)");

// Create a Matcher for each input (cheap, but stateful & not thread-safe)
Matcher matcher = pattern.matcher("jane@example bob@test");

while (matcher.find()) {
    System.out.println(matcher.group());      // jane@example, then bob@test
    System.out.println(matcher.group(1));     // jane, then bob
    System.out.println(matcher.group(2));     // example, then test
}

A Pattern is immutable and thread-safe — share one across threads freely. A Matcher holds the current match position and is not thread-safe — create a fresh one per input. This is why the idiom is "static final Pattern, local Matcher."

The Double-Backslash Trap

Java has no raw string literal, so the compiler interprets backslash escapes before the regex engine ever sees your pattern. Every backslash in a regex must be doubled in the Java source:

// What you want the regex engine to receive:  \d+\.\d+
// What you must write in Java source:
Pattern p = Pattern.compile("\\d+\\.\\d+");

// A word boundary \b becomes:
Pattern word = Pattern.compile("\\bhello\\b");

// A literal backslash in the input (rare) needs FOUR backslashes:
Pattern backslash = Pattern.compile("\\\\");   // matches a single \

This is the #1 Java regex mistake. If a pattern "isn't matching," the first thing to check is whether every backslash is doubled. Java 15+ text blocks ("""...""") help with patterns that contain quotes but still process backslash escapes, so they don't eliminate the doubling.

matches vs find vs lookingAt

Three methods, three behaviors — choosing the wrong one is a common bug:

Pattern p = Pattern.compile("\\d+");

p.matcher("abc123").matches();    // false — needs the WHOLE string to match
p.matcher("abc123").find();       // true  — finds "123" anywhere
p.matcher("123abc").lookingAt();  // true  — matches at the START, not whole
p.matcher("abc123").lookingAt();  // false — doesn't start with a match
  • matches() — full-string match. Use for validation ("is this entire string a valid email?").
  • find() — search anywhere, callable in a loop for all matches. Use for extraction.
  • lookingAt() — anchored at start, not end. Rarely the one you want.

Named Groups

Pattern p = Pattern.compile("(?<user>\\w+)@(?<domain>[\\w.]+)");
Matcher m = p.matcher("jane@example.com");

if (m.find()) {
    System.out.println(m.group("user"));     // jane
    System.out.println(m.group("domain"));   // example.com
}

// Named backreference in a replacement
"jane@example.com".replaceAll(
    "(?<user>\\w+)@(?<domain>[\\w.]+)",
    "${user} at ${domain}"
);
// "jane at example.com"

Named groups (Java 7+) make multi-capture patterns readable. Group names must start with a letter and contain only letters and digits — no underscores or hyphens.

Replacing

// String convenience method (recompiles the pattern each call — avoid in loops)
"jane@example".replaceAll("(\\w+)@(\\w+)", "$2.$1");
// "example.jane"

// Reusable Pattern + Matcher
Pattern p = Pattern.compile("(\\w+)@(\\w+)");
Matcher m = p.matcher("jane@example bob@test");
String result = m.replaceAll("$2.$1");
// "example.jane test.bob"

// Programmatic replacement (Java 9+) for logic templates can't express
String out = p.matcher("jane@example").replaceAll(
    mr -> mr.group(1).toUpperCase() + "_AT_" + mr.group(2)
);
// "JANE_AT_example"

Backreferences in the replacement string use $1, $2 for numbered groups and ${name} for named groups. A literal dollar sign in the replacement must be escaped as \\$, and a literal backslash as \\\\.

Flags

// Pass flags to compile, combined with bitwise OR
Pattern p = Pattern.compile("hello",
    Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

// Common flags:
Pattern.CASE_INSENSITIVE   // (?i) — ASCII case folding
Pattern.UNICODE_CASE       // (?u) — add for non-ASCII case folding
Pattern.MULTILINE          // (?m) — ^ and $ match at line boundaries
Pattern.DOTALL             // (?s) — . matches newlines
Pattern.COMMENTS           // (?x) — ignore whitespace, allow # comments
Pattern.UNICODE_CHARACTER_CLASS  // (?U) — \w \d \s match Unicode

// Or inline at the pattern start:
Pattern.compile("(?im)^hello");

For correct case-insensitive matching of international text, you need both CASE_INSENSITIVE and UNICODE_CASE — the former alone only folds ASCII A-Z. Similarly, \w and \d only match ASCII unless you add UNICODE_CHARACTER_CLASS.

Common Gotchas

String.matches() is whole-string and recompiles

"abc123".matches("\\d+") returns false — String.matches anchors the whole string like Matcher.matches(), not find(). It also recompiles the pattern on every call, so never use it in a loop. For repeated use, compile a Pattern once.

Matcher is single-use per position

After iterating with find() in a loop, the matcher is exhausted. To search the same input again, call matcher.reset() or create a new matcher. Reusing an exhausted matcher silently finds nothing.

group() before a successful match throws

Calling matcher.group() before a successful find() or matches() throws IllegalStateException. Always check the boolean return of find()/matches() before reading groups.

Catastrophic backtracking is possible

Unlike Go's RE2, Java's engine backtracks and can hang on pathological patterns like (a+)+$ against a long non-matching input. For untrusted input, prefer possessive quantifiers (a++) or atomic groups ((?>a+)) to prevent runaway backtracking, or set a timeout around the match.

Try It Live

The regex tester lets you prototype a pattern interactively — just remember Java needs every backslash doubled in the source string (the tester shows the bare pattern). The regex explainer breaks any pattern down token by token. For the same depth in other languages, see the Python and JavaScript regex guides.

Frequently Asked Questions

Why do Java regex patterns need double backslashes?

Java has no raw string literal (before Java 15 text blocks, and even those interpret escapes), so the Java compiler processes backslash escapes in a string before the regex engine sees the pattern. To pass a literal \d to the regex engine, you write "\\d" in your source — the compiler turns \\ into a single backslash, then the regex engine receives \d. This means a regex like \d+\.\d+ becomes the Java string "\\d+\\.\\d+". It's the single most common Java regex mistake. Java 15+ text blocks (triple-quoted) still process escapes, so they don't fully solve it, but they help with patterns containing quotes.

What is the difference between matches, find, and lookingAt in Java?

matcher.matches() requires the entire input to match the pattern — it returns true only for a full-string match. matcher.find() scans for the next match anywhere in the input and can be called repeatedly in a loop to iterate all matches. matcher.lookingAt() requires a match at the start of the input but not the entire input (like an anchored-at-start search). Most code wants find() for searching and matches() for validation. A frequent bug is using matches() expecting it to find a substring — it won't, because it demands the whole string match.

How do I use named capture groups in Java?

Define them with (?<name>...) and access them with matcher.group("name") after a successful find() or matches(). Named groups were added in Java 7. In the replacement string of replaceAll, reference a named group with ${name}. Named groups make code with many captures far more readable than numbered group(1), group(2) access, and they survive refactoring that reorders the pattern. Group names must be alphanumeric and start with a letter.

Should I reuse a compiled Pattern in Java?

Yes — compile once, reuse many times. Pattern.compile() parses the pattern into a state machine, which is the expensive step; creating a Matcher from an existing Pattern is cheap. Store the compiled Pattern as a static final field if the pattern is constant, then call pattern.matcher(input) per use. A Pattern is immutable and thread-safe, so a static instance is safe to share across threads; a Matcher is stateful and is NOT thread-safe, so create a fresh one per input. The convenience methods like String.matches() recompile the pattern every call, so avoid them in loops.

How do I make a Java regex case-insensitive?

Pass the flag to Pattern.compile: Pattern.compile("hello", Pattern.CASE_INSENSITIVE). Combine flags with the bitwise OR operator: Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL. You can also embed flags inline at the start of the pattern with (?i), or scope them to part of the pattern with (?i:hello). For correct case-insensitive matching of non-ASCII text (accented letters, other scripts), add Pattern.UNICODE_CASE alongside CASE_INSENSITIVE, since the default only folds ASCII letters.