TOON: Token-Oriented Object Notation for LLM Prompts

JSON is a terrible format for LLM prompts. It's verbose by design — keys repeated on every row, curly braces everywhere, commas that add nothing. When you're passing a list of 500 records to a model, you're burning tokens on structure that could be implicit.

TOON is a format that addresses this directly. It borrows YAML's indentation for nested objects and CSV's row layout for uniform arrays, giving you a lossless JSON encoding that's meaningfully smaller.

What it does

TOON encodes and decodes the full JSON data model — objects, arrays, primitives, nulls — with no data loss. The key encoding difference: when you have an array of objects with the same keys (the common case for database results, API responses, log entries), TOON collapses the key names into a header line and writes each row as comma-separated values. Keys appear once instead of once per record.

Why I starred it

The token math is real. The benchmark results in benchmarks/results/token-efficiency.md show TOON at ~49,900 tokens for uniform employee records vs ~127,000 for JSON — a 60.7% reduction. For mixed structures with some nesting, it sits around 22% cheaper than JSON. Those aren't trivial numbers when you're paying per token or hitting context limits.

What's also interesting is that TOON stays close to CSV for flat data (only about 6% more tokens) while adding explicit structure that JSON-trained models can parse reliably. CSV can't represent nested data at all. TOON can.

How it works

The TypeScript SDK in packages/toon/src/ is cleanly organized around a generator-based encoder and a streaming decoder.

The encoder starts in packages/toon/src/encode/encoders.ts with encodeJsonValue, which dispatches to specialized generators based on type. The tabular detection logic is the core insight — extractTabularHeader checks whether every object in an array has the same keys and only primitive values at those positions:

// packages/toon/src/encode/encoders.ts
export function isTabularArray(
  rows: readonly JsonObject[],
  header: readonly string[],
): boolean {
  for (const row of rows) {
    const keys = Object.keys(row)
    if (keys.length !== header.length) return false
    for (const key of header) {
      if (!(key in row)) return false
      if (!isJsonPrimitive(row[key])) return false
    }
  }
  return true
}

If the array qualifies, it emits a header line like users[3]{id,name,email}: and then one comma-separated row per object. If not, it falls back to YAML-style list items. The whole encoder uses Generator<string> throughout, so it yields lines lazily — you can stream directly to stdout without building the full string in memory.

The decoder in packages/toon/src/decode/scanner.ts does incremental line parsing, tracking depth by counting leading spaces. Strict mode throws on tabs or non-multiple-of-indent indentation. There's a decodeStream function that takes an AsyncIterable<string> and emits typed JSON events (startObject, key, primitive, endArray, etc.) — the same event model as a SAX parser, which means you can process large TOON files without loading everything into memory.

There's also a keyFolding option that collapses single-key wrapper chains into dotted paths. An object like { data: { metadata: { version: 1 } } } becomes data.metadata.version: 1. The folding logic in packages/toon/src/encode/folding.ts tracks a collision set of existing literal dotted keys to avoid ambiguity — if a sibling key already has a dot in it, folding stops.

Using it

Install and use via the SDK or the CLI:

npm install @toon-format/toon
npx @toon-format/toon-cli data.json          # encode JSON to TOON
npx @toon-format/toon-cli data.toon -d       # decode back to JSON
npx @toon-format/toon-cli data.json --stats  # show token count comparison

In code:

import { encode, decode } from '@toon-format/toon'

const data = {
  metrics: [
    { date: '2025-01-01', views: 6138, clicks: 174, revenue: 2712.49 },
    { date: '2025-01-02', views: 4616, clicks: 274, revenue: 9156.29 },
  ]
}

console.log(encode(data))
// metrics[2]{date,views,clicks,revenue}:
//   2025-01-01,6138,174,2712.49
//   2025-01-02,4616,274,9156.29

Roundtrip is lossless — decode(encode(data)) gives back the original object. Key folding is opt-in with keyFolding: 'safe' and pairs with expandPaths: 'safe' on the decode side for full roundtrip fidelity.

Rough edges

The spec is in a separate repo (toon-format/spec) which is v3.0 and described as stable but also "an idea in progress." That combination is a flag — the format could still evolve, and you'd need to track two repos. Community implementations exist in other languages but the reference TypeScript SDK is the only fully conformant one.

The tabular format only applies when all values in an array are primitives at the same key positions. Nested objects within array rows fall back to the list-item format, so real-world API responses with even light nesting lose most of the token benefit. The benchmark's "mixed-structure" track shows TOON beating JSON compact only when tabular density is 33%+, and actually losing to JSON compact on semi-uniform event logs (+19.9%).

Bottom line

If you're building prompts that pass batches of uniform records to LLMs — search results, database rows, time-series data — TOON is worth adding as a serialization step. The gains are real for the right data shape, the SDK is solid, and encoding is a one-liner.