dfseo: A CLI That Makes DataForSEO Usable From a Terminal (and an Agent)

We built this at Klixpert. Here's an honest look at what it is and how it works.

What it does

dfseo is a single-binary Go CLI that maps every DataForSEO v3 endpoint to a hierarchical subcommand. It defaults to TOON output (a compact structured text format), strips the DataForSEO response envelope automatically, supports JSONPath filtering, and caches every response to disk so repeat queries cost zero credits and come back in under 50ms.

Why I starred it

The official DataForSEO MCP server works — but the responses are enormous. A single SERP call can return tens of thousands of tokens: nested envelope, task metadata, status codes, lists of 100+ SERP elements, each with 20+ fields. If you're piping that into an LLM, you've burned most of your context before the model has done anything useful.

The CLI exists to close that gap. Between the envelope strip, the --filter flag, --summary, and the default field projections curated per endpoint, you can get a SERP result set down to the 5 fields you actually care about without any post-processing. That's the real pitch — not "CLI instead of HTTP", but "small deterministic payloads for agent workflows."

The secondary audience is developers who'd otherwise reach for curl | jq. This covers that case too, plus caching.

How it works

The interesting architectural decision is command synthesis. The CLI doesn't have 400 hand-written cobra.Command structs. Instead, registry/ contains a set of YAML files — one per DataForSEO category — each describing every endpoint as a declarative entry:

# registry/serp.yaml (excerpt)
entries:
  - category: serp
    subcommand: [google, organic, live]
    path: serp/google/organic/live/regular
    method: POST
    credit_cost: $0.002 per task (<=20 results) — confirm in plan
    example: dfseo serp google organic live --keyword "pizza" --location-code 2840 --language-code en
    flags:
      - name: keyword
        type: string
        body_key: ""

At startup, installRegistry() in cmd/dfseo/main.go loads these files via registry.Load() and calls registry.Install(rootCmd, reg, deps) to synthesize cobra.Command structs dynamically. The generated commands close over a Deps struct containing the cache store, HTTP client factory, logger, and task store — all lazily resolved via provider functions so the PersistentPreRunE can populate them after cobra has parsed the flags.

The core transformation lives in internal/pipeline/pipeline.go. Every API response passes through this chain:

Parse with json.Number preserved (large integer IDs survive the round-trip)
Strip the DataForSEO envelope — tasks[0].result[0] when single, merged result arrays when multi-task
Apply JSONPath filter via ohler55/ojg
Project default fields (curated per endpoint, only when no filter is set)
Truncate list to --limit (default 10)
Encode: TOON, JSON, or table

The envelope strip in stripEnvelope() handles three shapes: single task with single result, multiple tasks with concatenated results, and the degenerate case where result is empty (falls back to the task object). It's written defensively — looksLikeEnvelope() keys off the presence of tasks, tasks_count, or the status_code+status_message pair.

One thing worth calling out: the JSONPath handling in applyFilter() has a portability shim. ojg doesn't parse Perl-style regex literals (=~/pattern/flags), but LLMs reach for that syntax constantly. The rewritePerlRegex() function translates them to ojg's native =~ "(?i)pattern" form before the expression is compiled. It also handles a non-standard object-projection syntax items[*].{title,url} that TOON docs suggest but ojg doesn't support natively — the filter is split on the trailing .{...} suffix, the bare path runs through ojg, and then the projection happens manually via projectKeys().

The cache in internal/cache/ is SHA-256 keyed on method + path + canonical body, stored as two files per entry: the body JSON and a sidecar metadata file. The two-file design means cache stats can scan sizes and TTLs without loading every response body. Entries are bucketed into 2-character hex prefix directories (<env>/<category>/ab/<full-sha>.json) to avoid flat directories with 10k+ entries.

Using it

# Store creds once in the OS keychain
dfseo init

# Top 5 organic results for "open source SEO tools" in the US
dfseo serp google organic live \
  --keyword "open source SEO tools" \
  --location-code 2840 --language-code en \
  --filter 'items[?(@.type=="organic")].{rank_absolute,title,url}' \
  --limit 5

# Same call again — cache hit, zero credits, <50ms
dfseo serp google organic live \
  --keyword "open source SEO tools" \
  --location-code 2840 --language-code en \
  --filter 'items[?(@.type=="organic")].{rank_absolute,title,url}' \
  --limit 5

# Backlinks summary with --summary to collapse long lists
dfseo backlinks summary live --target example.com --summary

# Async task (half the credit cost)
dfseo serp google organic task --keyword pizza --location-code 2840 --language-code en
dfseo task wait <id>

# Escape hatch for any v3 path not yet in the registry
dfseo call appendix/user_data --method GET

Errors come back on stdout (not stderr) in the same TOON/JSON/table format as success, shaped as {ok: false, status_code, status_message, hint, endpoint, cached}. Exit codes are stable and documented: 2 for auth, 3 for API error, 4 for bad filter, 5 for partial batch failure. That consistency matters in scripts and agent harnesses.

Rough edges

The repo is three days old as of this writing — initial commit April 26, 2026. It has one star, no issues open, no CI beyond a lint config. Treat it as pre-v1.

The YAML registry covers 12 categories but the per-endpoint coverage is uneven. registry/serp.yaml has 3 live endpoints and notes that task-mode variants were deferred to v0.5. Other categories ship with more complete coverage, but the schema and --help samples on some leaves say "placeholder" where the response sample should be curated content. The --dry-run flag for batch operations is documented in the README but I didn't verify how far that's wired in every path.

No test coverage for the HTTP layer or the cache persistence round-trip — cache_test.go exists, synth_integration_test.go exists, but they're the only integration tests. The core pipeline tests in pipeline_test.go look solid.

The TOON format is a dependency on github.com/toon-format/toon-go at a commit hash rather than a tagged release, which is a mild dependency hygiene note.

Bottom line

If you're running DataForSEO queries from scripts, agents, or a terminal and the response size is killing your workflow, this is the right shape of tool. The caching and envelope strip alone save real friction. Come back in a month to see if the endpoint coverage has filled out.