How to Parse a User Agent String

A user agent string packs the browser, rendering engine, operating system, and device into one dense, oddly-formatted line — and parsing it reliably is harder than it looks, because the format is inconsistent and easy to fake. This guide walks through the anatomy of a real UA string, shows how to extract browser, OS, device, and engine in JavaScript and Python with maintained libraries, covers the modern User-Agent Client Hints replacement, and explains why hand-rolling a comprehensive parser is a mistake. Paste any string into the user agent parser tool to see the breakdown instantly.

Anatomy of a User Agent String

A user agent (UA) string is a single HTTP header the browser sends with every request, identifying itself. Here is a typical modern Chrome-on-Windows string:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36

Reading it token by token:

  • Mozilla/5.0 — a meaningless compatibility prefix that every major browser carries (see below).
  • (Windows NT 10.0; Win64; x64) — the platform comment: operating system (Windows NT 10.0 = Windows 10/11), 64-bit OS, 64-bit CPU architecture.
  • AppleWebKit/537.36 — the rendering engine lineage. Chromium forked from WebKit, so it still reports it.
  • (KHTML, like Gecko) — more compatibility cruft: WebKit derives from KDE's KHTML and claims to be "like Gecko" (Firefox's engine) so old sniffers serve it modern content.
  • Chrome/124.0.0.0 — the actual browser and version. This is usually the token you care about.
  • Safari/537.36 — another compatibility token; Chromium reports Safari because it shares the WebKit heritage.

From a string like this, the things you typically want to extract are: browser name and version, OS name and version, device type (desktop, mobile, or tablet), and the rendering engine (Blink, WebKit, Gecko). The format is a loose convention rather than a strict grammar, which is precisely what makes parsing hard.

The "Mozilla/5.0" Mess

Nearly every UA string begins with Mozilla/5.0, and it tells you nothing about the actual browser. The reason is historical. In the mid-1990s, Netscape Navigator (codename "Mozilla") supported HTML frames, and servers sniffed the UA for "Mozilla" to decide whether to send frames-capable pages. When competing browsers added frame support, they each put "Mozilla" in their own UA so they would receive the same rich content. Every browser since has kept the prefix for compatibility, then layered its own identifying tokens after it.

The lasting consequence is that you cannot identify a browser from the front of the string — you have to read the tokens near the end, and you have to know that Safari in a string does not mean Safari, like Gecko does not mean Gecko, and Chrome might actually be Edge, Opera, or Brave. This accumulated ambiguity is the single biggest reason to reach for a library instead of a regex.

Don't Hand-Roll a Comprehensive Parser

The most important advice in this guide, up front: do not write your own general-purpose UA parser. Real-world UA strings number in the tens of thousands of distinct shapes, the format is inconsistent across vendors, browsers deliberately impersonate one another, and new devices and versions appear constantly. A regex you write today silently breaks tomorrow.

Instead, use a maintained library backed by a community-curated database. The major libraries across languages share the same underlying regex database — the uap-core project's regexes.yaml — which gets updated as the browser landscape shifts. Hand-rolled regex is acceptable only for a single narrow, known case in a controlled environment (for example, "is this our internal kiosk browser?"), never for analytics or feature decisions across the open web.

JavaScript: ua-parser-js

The de-facto JavaScript library is ua-parser-js. It returns a structured object with browser, OS, device, engine, and CPU:

// npm install ua-parser-js
import { UAParser } from 'ua-parser-js';

const ua = 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) ' +
  'AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Mobile/15E148 Safari/604.1';

const result = new UAParser(ua).getResult();
console.log(result);
// {
//   browser: { name: 'Mobile Safari', version: '17.4', major: '17' },
//   os:      { name: 'iOS', version: '17.4' },
//   device:  { vendor: 'Apple', model: 'iPhone', type: 'mobile' },
//   engine:  { name: 'WebKit', version: '605.1.15' },
//   cpu:     { architecture: undefined }
// }

// On the client, omit the argument to parse the current visitor:
const me = new UAParser().getResult();

The device.type field is mobile, tablet, or undefined for desktop — desktop browsers simply have no type token to report.

The modern path: User-Agent Client Hints

Chromium is reducing (freezing) the legacy UA string to limit passive fingerprinting. The replacement is User-Agent Client Hints (UA-CH), exposed as navigator.userAgentData. Low-entropy fields are available synchronously; high-entropy ones require an explicit async request:

if (navigator.userAgentData) {
  // Low-entropy, available immediately
  console.log(navigator.userAgentData.brands);
  console.log(navigator.userAgentData.mobile);    // boolean
  console.log(navigator.userAgentData.platform);  // 'Windows', 'macOS', ...

  // High-entropy, must be requested explicitly
  const hints = await navigator.userAgentData.getHighEntropyValues([
    'platformVersion', 'model', 'architecture', 'fullVersionList',
  ]);
  console.log(hints.platformVersion, hints.model);
}

The catch: navigator.userAgentData is Chromium-only. It does not exist in Safari or Firefox. So even as Client Hints become the future, you still need the legacy UA string and a parser library as a cross-browser fallback. The practical pattern is: use Client Hints when present, fall back to parsing navigator.userAgent otherwise.

Python: the user-agents library

In Python, the user-agents package gives you a friendly object API. It is built on ua-parser, which uses the same shared uap-core regex database:

# pip install user-agents
from user_agents import parse

ua_string = ('Mozilla/5.0 (Linux; Android 14; Pixel 8) '
             'AppleWebKit/537.36 (KHTML, like Gecko) '
             'Chrome/124.0.0.0 Mobile Safari/537.36')

ua = parse(ua_string)

print(ua.browser.family, ua.browser.version_string)  # Chrome Mobile 124.0.0
print(ua.os.family, ua.os.version_string)             # Android 14
print(ua.device.family)                               # Pixel 8

print(ua.is_mobile)   # True
print(ua.is_tablet)   # False
print(ua.is_pc)       # False
print(ua.is_bot)      # False
print(str(ua))        # human-readable summary

The boolean helpers (is_mobile, is_tablet, is_pc, is_bot, is_touch_capable) are convenient for classifying traffic without inspecting the version tokens yourself. The bot flag covers well-known crawlers that self-identify, but remember that nothing forces a bot to be honest.

Server-Side and Log Analysis at Scale

Parsing UA strings out of access logs is a common batch job — counting browsers, OSes, and crawlers across millions of lines. A few practical notes:

  • Use the shared database. The uap-core regexes.yaml database is the source of truth used by parsers in JavaScript, Python, Ruby, Java, Go, and more, so your offline batch job and your front-end can agree on classifications.
  • Cache parsed results. Logs contain the same UA string thousands of times. Memoize on the raw string (a dictionary keyed by the UA) so each distinct string is parsed once — parsing is regex-heavy and this is usually the biggest single speedup.
  • Extract the field first. If you are pulling the UA out of a combined-log-format line, isolate the quoted UA field before parsing it. The same field-extraction technique applies to any log column — see the related log-parsing guide.
  • Update the database periodically. A parser is only as current as its regex database. Pin and bump it on a schedule so new browser versions classify correctly.

A Minimal Manual Regex (and Why It Breaks)

For a single narrow case, a regex is fine. Here is one that pulls the Chrome version out of a string:

const ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
  '(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36';

const match = ua.match(/Chrome\/(\d+)/);
const chromeMajor = match ? Number(match[1]) : null;
console.log(chromeMajor);  // 124

This works — until you point it at Edge, Opera, or Brave, all of which are Chromium-based and also contain Chrome/ in their UA:

// Microsoft Edge — also matches Chrome/124!
// ...Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0

// Opera — also matches Chrome/124!
// ...Chrome/124.0.0.0 Safari/537.36 OPR/110.0.0.0

Your regex confidently reports all three as "Chrome 124." A correct parser checks the most specific tokens first — Edg/, OPR/, SamsungBrowser/ — and only falls back to Chrome when none of them appear. Keeping that precedence list accurate across every Chromium fork is exactly the maintenance burden a library exists to absorb. The manual regex is a fine illustration of why naive parsing fails, not a foundation to build on.

Bots, Spoofing, and Security

The UA string is sent by the client and the client can set it to anything. Browser dev tools have a one-click device emulator that rewrites it; command-line HTTP clients let you pass --user-agent; bots impersonate real browsers as a matter of course. Therefore:

  • Never use the UA for security or trust decisions. Not for authentication, not for access control, not for rate-limiting that matters. It is an unverified, attacker-controlled value.
  • Bot detection from UA alone is weak. The is_bot helpers catch crawlers that self-identify (Googlebot, Bingbot), which is genuinely useful for analytics hygiene, but malicious bots simply send a normal browser UA. Real bot mitigation relies on behavior, rate, and network signals, not the UA.
  • Prefer feature detection over UA sniffing. If you want to know whether a capability exists, test for the API or behavior directly ('IntersectionObserver' in window) rather than inferring it from the browser name. Feature detection is accurate, forward-compatible, and immune to spoofing.

The honest summary: UA parsing is good for analytics, debugging, and choosing a graceful fallback. It is bad for anything that needs to be trustworthy.

Pitfalls to Know

UA freezing and reduction

Chrome's "reduced UA" trims the OS version and device model to coarse, fixed values to cut down on fingerprinting. Code that relied on a precise platform version from the legacy string will silently get a generic value — request that detail via Client Hints instead.

Chromium browsers masquerading as Chrome

Edge (Edg/), Opera (OPR/), Samsung Internet (SamsungBrowser/), and Brave (no distinguishing token at all) all carry Chrome/. A correct parser checks specific tokens first; Brave in particular is intentionally indistinguishable from Chrome by design.

iPadOS reports as desktop Safari

Since iPadOS 13, iPads send a UA that looks like desktop Safari on macOS to request desktop sites by default. A naive parser classifies an iPad as a Mac. Distinguishing them often requires touch-capability or Client Hints, not the UA string alone.

In-app browsers

Web views inside apps (Facebook, Instagram, WeChat, etc.) append their own tokens — FBAN, Instagram, MicroMessenger — and behave differently from the standalone browser. If your analytics or feature support depends on it, detect these explicitly.

Maintenance burden

Whatever you use, it is only as accurate as its database. Browsers ship new versions constantly and occasionally change their UA format outright. Pin your parser, bump it on a schedule, and never assume last year's classifications still hold.

Try It Live

The user agent parser tool breaks any UA string into browser, OS, device, and engine right in your browser — paste a string from your logs, or let it auto-fill with your own visitor UA, and see the parsed fields instantly. To build the narrow manual-parsing cases discussed above, test your patterns in the regex tester first so you know exactly which tokens they match before you ship them.

Frequently Asked Questions

What is the best way to parse a user agent string?

Use a maintained, database-backed library rather than writing your own regex. In JavaScript the de-facto choice is ua-parser-js; in Python it is the user-agents package (built on ua-parser). These libraries are backed by the shared uap-core regex database, which is updated as new browsers and devices appear. Hand-rolled parsing is fine only for one narrow, known case — for example detecting a single browser in a controlled environment — because the full space of real-world UA strings is enormous, inconsistent, and constantly changing.

Why does every user agent string start with "Mozilla/5.0"?

It is a historical compatibility artifact. In the 1990s servers sniffed for "Mozilla" to decide whether to send frames-capable HTML, so every competing browser added "Mozilla" to its UA to receive the richer content. Each new browser then layered its own tokens on top while keeping the prefix for compatibility, which is why Chrome, Safari, Edge, and others all still begin with Mozilla/5.0 and contain tokens like AppleWebKit, KHTML, like Gecko, and Safari that have little to do with the actual browser. The prefix is now meaningless for identification — you have to read the later tokens.

What is the difference between the user agent string and User-Agent Client Hints?

The user agent string is a single fixed header sent on every request. User-Agent Client Hints (UA-CH) split that information into discrete, structured pieces a server or script can request individually — for example navigator.userAgentData.getHighEntropyValues(["platformVersion", "model"]) in JavaScript. Chromium is reducing (freezing) the legacy UA string to limit passive fingerprinting, pushing sites toward Client Hints. The catch: navigator.userAgentData is Chromium-only and is absent in Safari and Firefox, so you still need the legacy string and a parser library as a fallback.

Can I trust the user agent string for security decisions?

No. The user agent string is trivially spoofed — any client can send any value, and browser dev tools, command-line HTTP clients, and bots routinely do. Never use it for authentication, access control, or any trust decision. It is fine for analytics, debugging, and choosing a fallback when feature detection is not possible, but treat it as an unverified hint. For deciding whether a capability exists, prefer feature detection (testing for the API or behavior directly) over UA sniffing.

Why do Edge, Brave, and Opera all contain "Chrome" in their user agent?

They are all built on Chromium, the same engine as Google Chrome, and they keep "Chrome" in the UA so that sites which sniff for Chrome serve them the same content. Edge appends Edg/, Opera appends OPR/, and Brave deliberately mimics Chrome with no distinguishing token at all. This is exactly why a naive regex that matches Chrome/ mis-identifies all of them as Chrome — a correct parser checks for the more specific tokens first and falls back to Chrome only if none match.