ChromaDB PHP SDK: Vector Search Without Leaving PHP

February 14, 2026

|repo-review

by Florian Narr

ChromaDB PHP SDK: Vector Search Without Leaving PHP

What it does

helgesverre/chromadb is a PHP SDK for ChromaDB's v2 API. It covers collections, items, queries, multi-tenancy, database management, and five built-in embedding providers — all framework-agnostic with optional Laravel sugar on top.

Why I starred it

Vector databases have become table stakes for anything involving embeddings, and ChromaDB is one of the more approachable options. But the PHP ecosystem has been thin on clients. This package fills that gap cleanly. What caught my eye was the scope: it doesn't just wrap the CRUD endpoints. It bundles embedding generation from OpenAI, Voyage, Mistral, Jina, and Ollama directly into the SDK, so you can go from raw text to stored vectors without leaving PHP.

How it works

The architecture is built on Saloon, a PHP HTTP client framework. The main Chromadb class (src/Chromadb.php) extends Saloon\Http\Connector and acts as the entry point. It exposes five resource groups — collections(), items(), database(), tenant(), server() — each returning a resource class that extends BaseResource.

The interesting pattern is how multi-tenancy is handled. Instead of passing tenant and database strings through every method call, the connector uses an immutable clone approach:

public function withTenant(string $tenant): self
{
    $clone = clone $this;
    $clone->tenant = $tenant;
    return $clone;
}

This means you can hold references to multiple tenant-scoped clients without them interfering with each other. Each resource method falls back to the connector's tenant/database defaults but allows per-call overrides. The Collections resource in src/Resources/Collections.php:27 shows this pattern — every method accepts optional $tenant and $database params that override the connector defaults via null coalescing.

Each API endpoint maps to a dedicated request class under src/Requests/. Take QueryItems (src/Requests/Items/QueryItems.php) — it builds the v2 API path dynamically:

public function resolveEndpoint(): string
{
    $tenant = $this->tenant ?? 'default_tenant';
    $database = $this->database ?? 'default_database';
    return "/api/v2/tenants/{$tenant}/databases/{$database}/collections/{$this->collectionId}/query";
}

The embedding layer sits in src/Embeddings/ and is decoupled from the ChromaDB client itself. Each provider implements EmbeddingFunction — a single-method interface (generate(array $texts): array). The Embeddings factory class offers static constructors (Embeddings::openai(...), Embeddings::voyage(...), etc.) and a fromConfig() method that reads Laravel's config system when available.

One detail I appreciated: the HttpClient helper (src/Embeddings/HttpClient.php) replaces Guzzle's default 120-byte error body truncation with 50k characters. When an embedding API returns a detailed error, you actually get to read it:

$stack->unshift(
    Middleware::httpErrors(new BodySummarizer($truncateAt)),
    'http_errors'
);

Small thing, but anyone who has debugged truncated API errors knows the frustration.

The Items resource (src/Resources/Items.php) has two convenience methods worth noting: addWithEmbeddings() and queryWithText(). These auto-generate embeddings from documents or query strings, bridging the gap between "I have text" and "I need vectors." If no embedding function is configured, they throw a clear RuntimeException rather than silently failing. The addWithEmbeddings method also auto-generates item IDs via uniqid() when none are provided — opinionated but practical.

Using it

composer require helgesverre/chromadb

Basic usage without Laravel:

$chromadb = new Chromadb(
    token: 'test-token-chroma-local-dev',
    host: 'http://localhost',
    port: '8000'
);

$response = $chromadb->collections()->create('articles');
$collectionId = $response->json('id');

// With automatic embeddings
$chromadb = $chromadb->withEmbeddings(
    Embeddings::openai(apiKey: 'sk-...')
);

$chromadb->items()->addWithEmbeddings(
    collectionId: $collectionId,
    documents: ['PHP 8.4 adds property hooks', 'Laravel 12 ships with Volt']
);

$results = $chromadb->items()->queryWithText(
    collectionId: $collectionId,
    queryText: 'new PHP features',
    nResults: 5
);

The Docker setup is straightforward — the repo ships a docker-compose.yml that runs ChromaDB with auth pre-configured.

Rough edges

The package hard-depends on saloonphp/laravel-plugin and spatie/laravel-data in its composer.json require section, even though it claims to be framework-agnostic. Anyone using this in a Symfony or vanilla PHP project pulls in Laravel-specific packages they don't need. These should be in suggest or behind a separate Laravel bridge package.

The count() methods on both Collections and Items parse plain text responses by casting to int ((int) $response->body()). No validation that the response is actually numeric — a server error returning HTML would silently become 0.

Test coverage is solid at 37 test files, but integration tests require a running ChromaDB instance and are skipped in CI. The unit tests mock HTTP responses, which is the right call for a client library.

The search() method on Items is documented as Chroma Cloud-only but there's no guard or exception for local usage — it'll just fail with whatever error the server returns.

Documentation is thorough but lives entirely in the README, which has gotten long. No separate docs site.

Bottom line

If you're building RAG or semantic search in PHP and want a single package that handles both the ChromaDB API and embedding generation, this is the most complete option available. The Saloon foundation makes it testable and the multi-tenancy pattern is well-designed — just be aware of the unnecessary Laravel dependencies if you're running outside that ecosystem.

HelgeSverre/chromadb on GitHub
HelgeSverre/chromadb