Ruby Regex Guide
Ruby has regex built into the language with first-class literal syntax (/pattern/) and a powerful Onigmo backtracking engine. Most of it feels familiar — until two Ruby-specific surprises bite: ^ and $ always match line boundaries, not string boundaries, and the /m flag means "dot matches newline," not "multiline" as it does elsewhere. Both have caused real security bugs. This guide covers the matching methods, named captures, substitution, and the traps. Pair it with the regex tester and the explainer to confirm any pattern you write.
Regex Literals
Ruby has dedicated syntax for regular expressions, so you rarely construct them from strings. The forward-slash literal is by far the most common:
re = /\d{4}-\d{2}-\d{2}/ # the usual form
re = %r{https?://[^\s]+} # %r when the pattern contains slashes
re = Regexp.new("\\d+") # from a string (note the doubled backslash)
Because the literal form does not go through string escaping, you write \d directly — there is no "raw string" concern as in Python. Reach for %r{...} when the pattern itself contains / (URLs, paths) so you do not have to escape every slash. Only use Regexp.new when the pattern is built dynamically at runtime.
The ^ and $ Trap
This is the single most important thing to know about Ruby regex. ^ and $ always match at line boundaries, not at the boundaries of the whole string — and there is no flag to change that:
"first\nsecond" =~ /^second/ # 6 — ^ matches at the start of the 2nd line!
"first\nsecond" =~ /\Asecond/ # nil — \A is the true start of the string
This breaks naive validation in a way attackers exploit. A check like /^https:/ can be slipped past with an input such as "evil\nhttps://ok", because ^ happily matches at the start of the second line. Always anchor validations to the whole string:
"evil\nhttps://x" =~ /^https:/ # matches — BAD for validation
"evil\nhttps://x" =~ /\Ahttps:/ # nil — GOOD, rejects the input
\A— start of the entire string.\z— end of the entire string.\Z— end of the string, but allows a single trailing newline.
Rule: use \A and \z for anything that validates a complete value. Reserve ^ and $ for genuinely line-oriented matching.
The /m Flag Means Dotall
Ruby's /m modifier does not do what it does in Python, Java, or JavaScript. In Ruby it makes the dot match newlines — what other engines call "dotall" or "single-line" mode. It has no effect on ^ and $, which already match line boundaries unconditionally:
"a\nb" =~ /a.b/ # nil — . does not match the newline by default
"a\nb" =~ /a.b/m # 0 — /m lets . match the newline
If you came from another language expecting /m to change anchor behaviour, you will be confused. In Ruby, anchors are already line-based; /m only changes the dot. This is one of the most common cross-language regex mistakes.
Matching Methods
Ruby spreads regex operations across String and Regexp methods plus a couple of operators:
str.match?(/\d+/) # true/false, NO global side effects (Ruby 2.4+) — preferred
str =~ /\d+/ # index of first match or nil; sets $~, $1, ...
str.match(/(\d+)/) # MatchData or nil
str.scan(/\d+/) # array of all matches: ["1", "22", "333"]
str[/\d+/] # the matched substring (or nil) — handy one-liner
str.split(/\s*,\s*/) # split on a pattern
str.gsub(/\d/, "#") # replace all; sub replaces the first only
Prefer match? for boolean checks: it is faster and does not touch the global match variables, which keeps code free of hidden state:
if line.match?(/\AERROR/)
handle_error(line)
end
Named Captures
Ruby uses (?<name>...) for named groups. Access them through MatchData with a symbol or string key:
m = "2026-06-10".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/)
m[:year] # "2026"
m[:month] # "06"
m[0] # "2026-06-10" — the whole match
m.named_captures # {"year"=>"2026", "month"=>"06", "day"=>"10"}
Ruby has a special trick: when a regex literal appears on the left of =~ and contains named captures, each becomes a local variable automatically:
if /(?<year>\d{4})-(?<month>\d{2})/ =~ "2026-06-10"
puts year # "2026" — year is now a local variable
puts month # "06"
end
This only works with a literal on the left of =~ known at parse time — not with String#match and not when the regex is stored in a variable. It is convenient but surprising, so use it deliberately.
Substitution with gsub and sub
gsub replaces every match, sub replaces the first. Backreferences in the replacement string use \1 or \k<name>:
"2026-06-10".gsub(/(\d{4})-(\d{2})-(\d{2})/, '\3/\2/\1')
# "10/06/2026"
"2026-06-10".gsub(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, '\k<d>/\k<m>/\k<y>')
# "10/06/2026"
Use single-quoted replacement strings so \1 is not consumed by Ruby's string escaping (in a double-quoted string you would need "\\1"). For computed replacements, pass a block — its value replaces each match:
"hello world".gsub(/\w+/) { |word| word.capitalize }
# "Hello World"
gsub also accepts a hash for keyed replacement, which is perfect for escaping:
escapes = { "&" => "&", "<" => "<", ">" => ">" }
html.gsub(/[&<>]/, escapes)
Flags and Inline Modifiers
Modifiers go after the closing slash of a literal:
/hello/i # case-insensitive
/a.b/m # dot matches newline (dotall)
/ \d{4} /x # extended/verbose: ignore whitespace, allow # comments
/pattern/o # interpolate any #{...} only once, on first use
i— case-insensitive.m— dot matches newline (NOT a multiline anchor switch).x— extended mode: unescaped whitespace and#comments are ignored, so you can lay a pattern out across lines.o— perform#{}interpolation only once, even in a loop.
Inline forms work mid-pattern too: (?i:Hello) scopes case-insensitivity to one section, and (?i) / (?-i) turn it on and off from that point.
The Onigmo Engine
Ruby's regex is powered by Onigmo, a backtracking engine, so unlike Go or Rust it supports the full advanced feature set:
- Backreferences —
\1or\k<name>inside the pattern. - All four lookarounds —
(?=...),(?!...),(?<=...),(?<!...). - Atomic groups
(?>...)and possessive quantifiers*+,++for controlling backtracking. - Subexpression calls
\g<name>for recursive patterns, e.g. matching balanced brackets. - Unicode property escapes like
\p{L}for any letter,\p{Han}for a script.
The flip side of a backtracking engine is the risk of catastrophic backtracking on patterns like /(a+)+b/. Avoid nested quantifiers on untrusted input, and use atomic groups or possessive quantifiers to bound the work. For deeper coverage of assertions, see the lookahead and lookbehind guide.
Common Gotchas
- Using
^/$to validate a whole value. They are line anchors in Ruby. Use\Aand\z— this is a security issue, not a style nit. - Expecting
/mto change anchors. In Ruby/monly makes.match newlines. - Relying on
$1aftermatch?.match?deliberately does not set the global match variables — use=~ormatchwhen you need captures. - Double-quoted replacement strings.
"\1"is consumed by string escaping; use single quotes'\1'or"\\1"ingsub. - Expecting auto-assigned named-capture locals from
String#match. That magic only happens with a literal on the left of=~. - Catastrophic backtracking on nested quantifiers with hostile input. Prefer atomic groups or possessive quantifiers.
Try Patterns Live
The regex tester uses JavaScript syntax, where ^ and $ default to string boundaries and m is the multiline anchor flag — the opposite of Ruby's conventions for those two points. The character classes, quantifiers, and groups are otherwise the same. Check the differences in the regex cheat sheet, and use the regex explainer for a token-by-token breakdown of any unfamiliar pattern.
Frequently Asked Questions
Why do ^ and $ match line boundaries instead of string boundaries in Ruby?
In Ruby, ^ and $ always anchor to the start and end of a line, never the whole string, and there is no flag to change that. This is different from most languages, where ^ and $ default to string boundaries and a multiline flag switches them to line boundaries. To anchor to the whole string in Ruby you must use \A for the very start and \z for the very end (or \Z, which also allows a trailing newline). This matters for security: a validation like /^https:/ can be bypassed by an input containing a newline followed by a malicious second line, because ^ matches at the start of that second line. Always use \A and \z for validation.
What does the /m flag do in Ruby regex?
In Ruby, the /m flag means "multiline mode" in Ruby's own terminology but its actual effect is to make the dot (.) match newline characters — what other languages call "dotall" or "single-line" mode. It does NOT change how ^ and $ behave, because those already match at line boundaries unconditionally. So if you come from Python, Java, or JavaScript, do not reach for /m expecting ^ and $ to change; reach for it when you want . to span newlines. This naming difference is one of the most common cross-language regex mistakes.
What is the difference between match? and =~ in Ruby?
String#match? and Regexp#match? (added in Ruby 2.4) return a plain boolean and, crucially, do not set the global match variables like $~, $1, or $&. That makes them faster and side-effect-free, so they are the right choice whenever you only need a yes/no answer. The =~ operator returns the integer index of the match or nil, and it does populate the global match data as a side effect. Use match? for conditionals (if str.match?(/\d/)), use =~ or match when you actually need the captured groups, and prefer the named methods over the global variables for readable code.
How do named captures become local variables in Ruby?
When you use the =~ operator with a regex literal on the left-hand side and the pattern contains named captures, Ruby automatically assigns each capture to a local variable of the same name. For example: if /(?
How do I do a case-insensitive match in Ruby?
Append the i modifier to the regex literal: /hello/i matches regardless of case. You can combine modifiers, so /hello/im is both case-insensitive and dot-matches-newline. With Regexp.new you pass options as the second argument, for example Regexp.new("hello", Regexp::IGNORECASE). Ruby also supports inline modifiers scoped to part of a pattern with the (?i:...) syntax and the (?i) on/(?-i) off switches, which is useful when only one section of the pattern should ignore case.