How to Convert CSV to JSON
CSV looks simple — rows of comma-separated values — but a correct parser has to handle quoted fields, embedded commas, escaped quotes, and newlines inside cells, which is why line.split(',') breaks on real data. This guide covers the standard mapping of CSV rows to an array of JSON objects, the RFC 4180 rules that trip people up, and working code in JavaScript and Python — plus type coercion, headers, and nested data. Test any output against the CSV to JSON converter tool, which handles all of this in your browser; for the reverse direction use the JSON to CSV converter.
The Core Mapping
Almost every CSV-to-JSON conversion produces the same shape: an array of objects, one object per data row, with keys taken from the header row. Given this CSV:
name,role,active
Ada Lovelace,Engineer,true
Alan Turing,Researcher,false
The expected JSON output is:
[
{ "name": "Ada Lovelace", "role": "Engineer", "active": "true" },
{ "name": "Alan Turing", "role": "Researcher", "active": "false" }
]
Note that "true" is a string, not a boolean — CSV has no types, so by default every value stays text. Converting those to real numbers and booleans is a separate, explicit step covered below. The header row supplies the keys; if there is no header, you fall back to numeric indices or an array-of-arrays output instead.
Why You Can't Just Split on Commas
The single biggest mistake is treating CSV as "split each line on commas." Real CSV follows RFC 4180, which allows several things a naive split gets wrong:
- Quoted fields with embedded commas.
"Smith, Jane",36is two fields. The comma inside the quotes is data, not a separator. - Escaped quotes. A literal double quote inside a quoted field is written as two double quotes:
"She said ""hi"""is the valueShe said "hi". - Embedded newlines. A quoted field may contain a literal line break:
"line one\nline two"is one field spanning two physical lines. This is why reading the file line by line fails — a single record can span multiple lines. - CRLF line endings. RFC 4180 specifies
\r\nbetween records, but real files use\ntoo. A parser must accept both.
Here is a small file that breaks line.split(',') in three different ways at once:
product,description,price
Widget,"Small, blue, and round",9.99
Gadget,"Has ""premium"" finish",19.99
Gizmo,"Multi-line
description here",4.99
A correct parser yields three clean records; a split-based one yields garbage. The rule is simple: never parse CSV by hand — use a real parser that tracks quote state.
JavaScript
The browser has no built-in CSV parser (unlike JSON.parse), so you either pull in a library or hand-roll a quote-aware parser. The de-facto library is Papa Parse:
// npm install papaparse
import Papa from 'papaparse';
const csv = `name,age,active
Ada,36,true
Alan,41,false`;
const result = Papa.parse(csv, {
header: true, // first row supplies object keys
dynamicTyping: true, // coerce numbers and booleans
skipEmptyLines: true,
});
console.log(JSON.stringify(result.data, null, 2));
// [
// { "name": "Ada", "age": 36, "active": true },
// { "name": "Alan", "age": 41, "active": false }
// ]
With header: true Papa Parse returns the array-of-objects shape directly. It is RFC 4180 compliant, so quoted commas, escaped quotes, and embedded newlines all just work.
For files too large to hold in memory, Papa Parse streams — pass a step callback that fires per row, or chunk for batches, and feed it a File or a readable stream instead of a string:
Papa.parse(file, {
header: true,
step: (row) => {
// process one record at a time; never buffers the whole file
handle(row.data);
},
complete: () => console.log('done'),
});
If you cannot add a dependency, the browser's quote-state machine is straightforward to write, but it must handle the same quoting rules. The Janeer CSV to JSON converter does exactly this client-side — view source to copy the implementation.
Python
Python ships a CSV parser in the standard library, so for most jobs you need no install at all. csv.DictReader reads each row into a dict keyed by the header, and json.dumps serializes the list:
import csv
import json
with open('people.csv', newline='') as f:
reader = csv.DictReader(f) # first row becomes the keys
rows = list(reader)
print(json.dumps(rows, indent=2))
# [
# {"name": "Ada", "age": "36", "active": "true"},
# {"name": "Alan", "age": "41", "active": "false"}
# ]
Note the newline='' argument to open — it is required so the csv module can handle embedded newlines inside quoted fields itself. All values come out as strings; the stdlib does no type guessing.
When you are already doing data work, pandas is more convenient and infers types as it reads:
import pandas as pd
df = pd.read_csv('people.csv')
# orient='records' gives the array-of-objects shape
json_str = df.to_json(orient='records', indent=2)
print(json_str)
# [
# {"name":"Ada","age":36,"active":true},
# {"name":"Alan","age":41,"active":false}
# ]
pandas coerces age to an integer and active to a boolean automatically. That convenience is also a trap — see the next section.
Type Coercion: The Sharpest Edge
CSV is all strings. JSON has numbers, booleans, and null. Turning "36" into 36 is what dynamicTyping (Papa Parse) and pd.read_csv inference (pandas) do for you — but automatic coercion silently corrupts values that look numeric but are not:
- Leading-zero codes. A ZIP code
01234becomes the integer1234— the leading zero is gone for good. - Phone numbers.
+1 (555) 010-9999may parse partially or get mangled depending on the library. - Large IDs. An ID like
9007199254740993exceeds JavaScript's safe integer range and loses precision once it becomes a Number. - Version-like strings.
1.10becomes1.1;3.0becomes3.
The fix is to keep those columns as strings. In Papa Parse, replace the blanket dynamicTyping: true with a per-column object: dynamicTyping: { age: true, zip: false }. In pandas, force string dtype on import:
df = pd.read_csv('people.csv', dtype={'zip': str, 'phone': str, 'id': str})
The general rule: only coerce columns you are certain are quantitative. Identifiers, codes, and anything with a leading zero should stay text. When in doubt, leave everything as strings and convert deliberately downstream.
Headers, No Headers, and Output Shape
The header row drives the whole conversion, so a few cases need a decision:
No header row
If the file has no header, you cannot key objects by name. Either supply the names yourself (Papa Parse: pass an array to transformHeader or set header: false and map manually; csv.reader instead of DictReader in Python) or emit an array of arrays instead of array of objects:
// array-of-arrays output (header: false)
[
["Ada Lovelace", "Engineer", "true"],
["Alan Turing", "Researcher", "false"]
]
Duplicate header names
If two columns share a name, object keys collide — the second silently overwrites the first. Parsers handle this differently: pandas appends .1, .2 suffixes; csv.DictReader keeps the last value. Rename duplicate columns before converting, or switch to array-of-arrays so no data is lost.
Array of objects vs array of arrays
Array of objects is self-describing and what most JSON consumers want. Array of arrays is more compact and preserves column order and duplicates, but the consumer has to know what each position means. Choose objects unless size or duplicate headers force arrays.
Nested Data: A Real Limitation
CSV is a flat grid — there is no native way to express a nested object or an array inside a cell. When the target JSON needs structure, you pick a convention and both sides have to honor it:
- Dotted keys. A column named
address.cityis expanded by some converters into{"address": {"city": ...}}. This is a convention, not a standard, so the converter has to opt in. - JSON-in-a-cell. Store a serialized JSON string in one column and
JSON.parseit after import. Works, but the embedded quotes must be CSV-escaped, which gets ugly fast. - Stay flat. Often the simplest answer is to leave the JSON flat and accept that CSV cannot model the hierarchy.
If your data is genuinely hierarchical — deeply nested objects, variable-length arrays per row — CSV is the wrong source format. Keep it as JSON, or use a format that nests natively.
Common Pitfalls
Wrong delimiter
"CSV" is not always comma-separated. Files exported in European locales often use a semicolon (because the comma is the decimal separator), and TSV uses a tab. Papa Parse auto-detects the delimiter, or you can set delimiter: ';' explicitly; Python's csv.Sniffer can guess, or pass delimiter='\t'. Guessing wrong produces a single giant column.
Byte order mark (BOM)
Files exported from Excel on Windows often start with a UTF-8 BOM (). Left in, it becomes part of the first header name — name turns into name and your key lookups quietly miss. Strip it: open with encoding='utf-8-sig' in Python, or set Papa.parse(text.replace(/^/, ''), ...).
Encoding
Assume UTF-8 unless told otherwise, but legacy exports may be Latin-1 / Windows-1252. Mis-decoded bytes show up as mojibake (é instead of é). Detect the encoding or ask the source; do not just hope.
Trailing empty lines
A file ending in a blank line produces a stray empty record. Use skipEmptyLines: true in Papa Parse, or filter empty rows in Python. An empty final row often shows up as an object with all-empty values rather than being dropped.
Excel quirks
Spreadsheet exports introduce surprises: numbers reformatted with thousands separators (1,234 becomes two fields unless quoted), dates serialized inconsistently, leading apostrophes used to force text, and =cmd-style formula injection if the data is untrusted. Validate after import rather than trusting the export blindly.
Try It Live
The CSV to JSON converter tool follows the conventions described above — paste any CSV and get a clean JSON array, with quote-aware parsing, header detection, and optional type coercion, all running in your browser so sensitive data never leaves your machine. For the reverse direction, the JSON to CSV converter turns an array of objects back into RFC 4180 CSV. Pair either with the JSON formatter to pretty-print and validate the output before you ship it.
Frequently Asked Questions
How do I convert a CSV file to a JSON array of objects?
Treat the first row as the header and turn every following row into an object whose keys come from the header. A CSV with columns name,age and a row Ada,36 becomes {"name": "Ada", "age": "36"}, and the whole file becomes an array of those objects. In JavaScript use Papa Parse with header: true; in Python use csv.DictReader and pass the result to json.dumps. Both build the array-of-objects shape for you, so you almost never need to write the row-to-object loop by hand.
Why does splitting a CSV on commas give wrong results?
Because CSV lets a field contain a comma if the field is wrapped in double quotes, so "Smith, Jane",36 is two fields, not three. The RFC 4180 format also allows double quotes inside a quoted field (escaped by doubling them, "") and even literal newlines inside a quoted field. A plain line.split(',') splits inside quoted commas and breaks completely on embedded newlines, because it reads the file line by line. Always use a real CSV parser — Papa Parse, the Python csv module, or the Janeer tool — which tracks quote state correctly.
How do I get numbers and booleans instead of strings when parsing CSV?
CSV has no types — every value is text — so you have to convert explicitly. Papa Parse offers dynamicTyping: true, which turns 36 into a number and true into a boolean; Python pandas does similar inference with pd.read_csv. The catch is that automatic coercion mangles values that look numeric but are not, such as a ZIP code 01234 (the leading zero is lost), a phone number, or an ID larger than JavaScript can represent exactly. Keep those columns as strings — disable type guessing for them or quote them in the source.
Can a CSV represent nested JSON objects or arrays?
Not directly — CSV is a flat grid of rows and columns, so it has no native way to nest. The common workarounds are dotted or bracketed column names that a converter expands (a column address.city becomes {"address": {"city": ...}}), or storing a JSON string inside a single cell and parsing it after import. Neither is standardized, so both sides have to agree on the convention. If your data is genuinely hierarchical, CSV is the wrong source format and you should keep it as JSON.
What is the best library to convert CSV to JSON?
In JavaScript, Papa Parse is the de-facto choice — it is RFC 4180 compliant, handles headers and type coercion, and can stream files too large to fit in memory. The browser has no built-in CSV parser, so a hand-rolled one must track quote state itself; the Janeer converter does this client-side. In Python, the standard-library csv module covers most cases with no install, and pandas (pd.read_csv(...).to_json(orient="records")) is convenient when you are already doing data work. All of them produce the same array-of-objects shape.