File formats

We read and write a lot of CSV and JSON files. Their format should be consistent.

JSON

Input

In most cases, simply use the standard library.

with open(path) as f:
    data = json.load(f)

For critical paths involving small files, use orjson.

Note

We can switch to the Python bindings for simdjson. Read the Trade-offs section.

For large files, use the same techniques as OCDS Kit to stream input using ijson, stream output using iterencode, and postpone evaluation using iterators. See its brief tutorial on streaming and reuse its default method.

Note

ijson uses YAJL. simdjson is limited to files smaller than 4 GB and has no streaming API.

Output

Indent with 2 spaces and use UTF-8 characters. Example:

with open(path, "w") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)
    f.write("\n")

Or, in a compact format:

with open(path, "w") as f:
    json.dump(data, f, separators=(",", ":"))

CSV

Input

with open(path) as f:
    reader = csv.DictReader(f)
    fieldnames = reader.fieldnames
    rows = [row for row in reader]

Output

Use LF (\n) as the line terminator. Example:

with open(path, "w") as f:
    writer = csv.DictWriter(f, fieldnames, lineterminator="\n")
    writer.writeheader()
    writer.writerows(rows)