How to Generate a JSON Schema

A JSON Schema describes the shape and constraints of your JSON so you can validate it, document it, and generate forms or types from it. There are two ways to get one — infer it from a sample for a fast start, or hand-write it for precision — and this guide covers both, the core keywords you'll use, the tooling in JavaScript and Python, and how to validate against the result with Ajv and jsonschema. Generate a starting schema instantly with the JSON Schema generator, which infers Draft 2020-12 from a sample object including oneOf for mixed-type unions.

What JSON Schema Is

A JSON Schema is a JSON document that describes the structure and constraints of other JSON documents. It is data about data: instead of holding values, it holds rules — which keys must be present, what type each value has, what range or pattern is allowed.

That single artifact does four jobs at once:

Validation — reject malformed payloads at the edge of your system before bad data spreads.
Documentation — a precise, machine-readable contract that humans and tools can both read.
Form generation — libraries like JSON Forms or react-jsonschema-form render an entire UI from a schema.
API contracts — OpenAPI describes every request and response body with a JSON Schema dialect, so the schema is the source of truth for your API.

Because it is just JSON, a schema is easy to store, diff, and version alongside your code.

Worked Example

Start with a sample object — a user record:

{
  "id": 42,
  "name": "Ada Lovelace",
  "email": "ada@example.com",
  "active": true,
  "roles": ["admin", "editor"],
  "profile": {
    "bio": "Mathematician",
    "joined": "1843-01-01"
  }
}

A JSON Schema that describes it (Draft 2020-12):

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "id": { "type": "integer" },
    "name": { "type": "string" },
    "email": { "type": "string", "format": "email" },
    "active": { "type": "boolean" },
    "roles": {
      "type": "array",
      "items": { "type": "string" }
    },
    "profile": {
      "type": "object",
      "properties": {
        "bio": { "type": "string" },
        "joined": { "type": "string", "format": "date" }
      },
      "required": ["bio", "joined"]
    }
  },
  "required": ["id", "name", "email", "active", "roles", "profile"]
}

Notice the shape: every object gets type, a properties map, and a required list; every array gets an items subschema; nested objects recurse the same way. This is the skeleton of almost every schema you will write.

The Core Keywords

You can describe most real-world JSON with a small vocabulary. The essentials:

type — the JSON type: "object", "array", "string", "number", "integer", "boolean", or "null". Can be an array of types for unions.
properties — a map of key name to subschema, for objects.
required — an array of property names that must be present. Properties not listed here are optional.
items — the subschema every element of an array must match.
enum — a fixed list of allowed values: { "enum": ["red", "green", "blue"] }.
const — a single required value: { "const": "v1" }.
additionalProperties — whether keys not named in properties are allowed. Set false to forbid extras.

String constraints — minLength, maxLength, pattern (a regex), and format (such as email, date-time, uri, uuid):

{
  "type": "string",
  "minLength": 3,
  "maxLength": 20,
  "pattern": "^[a-z0-9_]+$"
}

Number constraints — minimum, maximum, exclusiveMinimum, exclusiveMaximum, and multipleOf:

{
  "type": "integer",
  "minimum": 1,
  "maximum": 100
}

Two Ways to Get a Schema

There are two routes, and they trade speed for precision.

(a) Infer it from a sample. Feed a real JSON document to an inference tool and get a schema that matches it. This is the fastest way to a working draft — seconds of work — and it is exactly what the Janeer JSON Schema generator does. Use it whenever you have example data and want a head start.

(b) Hand-write it. Author the schema from your knowledge of the domain. More work, but the only way to express rules that no sample can reveal: which fields are optional, the full set of enum values, sensible min/max bounds, and cross-field constraints.

The realistic workflow is both: infer a draft, then tighten it. Inference has hard limits you must correct by hand:

Required vs optional — a single sample makes every present field look required. The tool cannot know what is optional unless you give it samples where the field is absent.
Enums — if the sample has "status": "active", inference produces { "type": "string" }, not an enum. It cannot guess the other allowed values.
Bounds — no inferred minimum, maximum, minLength, or pattern. One example tells the tool nothing about the valid range.

Treat an inferred schema as a scaffold, never a finished contract.

Inferring a Schema: Tooling

JavaScript

Popular inference libraries include to-json-schema and genson-js; quicktype can emit JSON Schema as one of its output targets too (it is best known for generating types). A minimal example with to-json-schema:

// npm install to-json-schema
import toJsonSchema from 'to-json-schema';

const sample = { id: 42, name: 'Ada', roles: ['admin'] };
const schema = toJsonSchema(sample, {
  required: true,        // list present keys in `required`
  arrays: { mode: 'first' },
});

console.log(JSON.stringify(schema, null, 2));

The Janeer JSON Schema generator infers Draft 2020-12 entirely in your browser — no upload, no install — and crucially emits oneOf when a field holds more than one type across the sample, which most one-line inference helpers flatten away.

Python

genson is the standard inference library. Its SchemaBuilder lets you add multiple samples, and it merges them — a field missing from any sample correctly drops out of required:

# pip install genson
from genson import SchemaBuilder

builder = SchemaBuilder()
builder.add_object({"id": 1, "name": "Ada", "active": True})
builder.add_object({"id": 2, "name": "Grace"})  # no `active` key

schema = builder.to_schema()
# `active` is now optional because it was absent from one sample
print(schema)
# {'$schema': 'http://json-schema.org/schema#',
#  'type': 'object',
#  'properties': {'id': {'type': 'integer'},
#                 'name': {'type': 'string'},
#                 'active': {'type': 'boolean'}},
#  'required': ['id', 'name']}

Feeding genson several representative documents is the single most effective way to get a usable required list without editing by hand.

Validating Against the Schema

A schema earns its keep at validation time. Here is the payoff in both languages.

JavaScript — Ajv

Ajv compiles the schema once into a fast validator function:

// npm install ajv ajv-formats
import Ajv from 'ajv';
import addFormats from 'ajv-formats';

const ajv = new Ajv({ allErrors: true });
addFormats(ajv);  // enables `format` keywords like email, date

const validate = ajv.compile(schema);
const valid = validate(data);

if (!valid) {
  console.error(validate.errors);
}

For a Draft 2020-12 schema, import the 2020 build instead: import Ajv from 'ajv/dist/2020';.

Python — jsonschema

# pip install jsonschema
import jsonschema
from jsonschema import Draft202012Validator

# Simple: raises ValidationError on the first failure
jsonschema.validate(instance=data, schema=schema)

# Collect every error instead of stopping at the first
validator = Draft202012Validator(schema)
for error in validator.iter_errors(data):
    print(error.message)

Both libraries follow the spec closely, so a schema that passes one generally passes the other — provided both are pointed at the same draft.

Draft Versions and Reuse

JSON Schema has gone through several drafts, and tools support different ones:

draft-07 (2018) — still the most widely supported across tooling and the safest default for maximum compatibility.
2019-09 and 2020-12 — newer revisions. 2020-12 changed array handling (items now applies to every element; tuple validation moved to prefixItems) and standardized $defs for reusable subschemas.

Always declare the version with $schema so validators know which rules to apply:

{ "$schema": "https://json-schema.org/draft/2020-12/schema" }

Then pick a validator that supports that draft — Ajv needs ajv/dist/2020, and Python jsonschema exposes a per-draft Draft202012Validator. A mismatch between the draft your generator emits and the draft your validator understands is a common, silent source of wrong results.

For reuse, define subschemas once under $defs and reference them with $ref:

{
  "$defs": {
    "address": {
      "type": "object",
      "properties": { "city": { "type": "string" } }
    }
  },
  "type": "object",
  "properties": {
    "home": { "$ref": "#/$defs/address" },
    "work": { "$ref": "#/$defs/address" }
  }
}

Mixed-Type Fields and Unions

Real JSON is not always single-typed. A field might hold a string in some records and a number in others, or be nullable. JSON Schema offers a few ways to express that.

Multiple allowed types — make type an array. This is the standard way to mark a field nullable:

{ "type": ["string", "null"] }

oneOf / anyOf — when the alternatives are whole subschemas, not just primitive types. oneOf requires the value to match exactly one branch; anyOf requires at least one:

{
  "oneOf": [
    { "type": "string" },
    { "type": "object", "properties": { "id": { "type": "integer" } } }
  ]
}

The Janeer JSON Schema generator detects mixed-type fields across your sample and emits oneOf automatically, so a field that is sometimes a string and sometimes an object produces a schema that accepts both — instead of silently picking whichever type it saw first.

Pitfalls

additionalProperties defaults to true

Unless you say otherwise, a schema accepts extra keys it never mentions. An inferred schema is therefore permissive — it validates documents with unexpected fields. To lock an object down to exactly its declared properties, add "additionalProperties": false.

required must be explicit — and inference over-includes

A property is optional unless its name appears in required. Inference from one sample does the opposite of what you usually want: it marks everything required, because every key was present. Feed multiple samples (genson merges them) and then edit the required array by hand.

format is mostly annotation

Keywords like "format": "email" are not enforced by default in every validator — they are treated as annotations unless you opt in. Ajv requires the ajv-formats package; some validators ignore unknown formats entirely. If you depend on format checking, confirm your validator actually enforces it and enable it explicitly.

Draft mismatches

If your generator emits 2020-12 but your validator is configured for draft-07, keywords can be interpreted differently or ignored — most visibly around items and prefixItems. Keep the $schema declaration and the validator's draft in agreement.

Try It Live

The JSON Schema generator infers a Draft 2020-12 schema from any sample object in your browser — including oneOf for mixed-type unions — giving you a scaffold to tighten by hand. Run your sample through the JSON formatter first to catch syntax errors and pretty-print it, then generate, then add the required edits, enums, and bounds that only you know. Everything runs client-side, so you can use it with sensitive payloads without sending them anywhere.

Frequently Asked Questions

What is a JSON Schema used for?

A JSON Schema is itself a JSON document that describes the structure and constraints of other JSON — which keys are required, what type each value is, what range or pattern is allowed. The four common uses are validation (reject malformed data before it enters your system), documentation (a precise, machine-readable contract for an API payload), form generation (tools like JSON Forms render a UI directly from a schema), and API contracts — OpenAPI uses a JSON Schema dialect to describe every request and response body. One schema can serve all four at once.

Can I generate a JSON Schema automatically from a sample?

Yes — inference tools read one or more sample documents and emit a schema that matches them. In JavaScript use to-json-schema, genson-js, or quicktype; in Python use genson. The Janeer JSON Schema generator infers Draft 2020-12 in your browser, including oneOf for fields that hold more than one type across the sample. But inference is a starting point, not a finished schema: a single sample cannot tell the tool which fields are optional, what the real enum values are, or any min/max bounds. Always tighten the generated schema by hand.

What is the difference between draft-07 and 2020-12 JSON Schema?

They are different versions (drafts) of the JSON Schema specification. draft-07 (2018) is still the most widely supported in tooling. 2019-09 and 2020-12 are newer and refine several keywords — notably 2020-12 changed how array items and prefixItems work and standardized $defs for reusable subschemas. Always declare the draft with the $schema keyword, for example https://json-schema.org/draft/2020-12/schema, and pick a validator that supports it (Ajv needs ajv/dist/2020 for 2020-12; Python jsonschema exposes Draft202012Validator).

How do I validate JSON against a schema in JavaScript and Python?

In JavaScript, Ajv is the standard: const validate = new Ajv().compile(schema); const ok = validate(data); — read validate.errors when it returns false. For Draft 2020-12 import from ajv/dist/2020. In Python, use the jsonschema package: jsonschema.validate(instance=data, schema=schema) raises a ValidationError on failure, or use Draft202012Validator(schema).iter_errors(data) to collect every error instead of stopping at the first. Both libraries are mature and follow the spec closely.

Why does my inferred JSON Schema mark every field as required?

Because inference works from examples, and if a field is present in the sample, the tool assumes it is always present. With a single sample it has no way to know which fields are optional — so it tends to list everything in required, which is usually too strict. The fix is to feed the inference tool multiple samples (genson's SchemaBuilder merges them, and a field missing from any sample drops out of required), then edit the required array by hand. The same caution applies to enum values and numeric bounds — one sample can never reveal the full allowed set.