LiveKit: a production WebRTC SFU written in Go

LiveKit is a WebRTC SFU (Selective Forwarding Unit) — a media server that receives tracks from publishers and routes them to subscribers without decoding or re-encoding the media. It ships as a single Go binary, supports horizontal scaling via Redis, and has grown into the foundation of a broader ecosystem targeting voice AI applications.

Why I starred it

Building real-time audio/video from scratch with WebRTC means dealing with signaling, ICE negotiation, DTLS, SRTP, simulcast, bandwidth estimation, NACK retransmission — it's a mountain of protocol complexity before you write a single line of product logic. LiveKit absorbs all of that. The fact that their Agents SDK can plug directly into rooms and listen or emit audio tracks made it interesting for anyone building AI voice pipelines, which is increasingly the main use case.

How it works

The core lives in two packages: pkg/sfu handles the low-level RTP plumbing, and pkg/rtc sits on top managing rooms, participants, and subscriptions.

pkg/sfu is dense. The DownTrack in pkg/sfu/downtrack.go is the main abstraction — one instance per subscriber-per-track. It implements TrackSender, rewrites RTP sequence numbers and timestamps for each downstream connection, handles keyframe injection on codec switches, and manages blank frame generation for mute states. The blank frame constants tell the story: they embed actual codec-compliant silent/empty frames directly in Go byte slices — VP8 8x8 keyframes, H.264 SPS/PPS/IDR NAL units, Opus silence, PCMU silence — all hardcoded in downtrack.go so the server can send valid media during mutes without asking publishers for anything.

The Forwarder in pkg/sfu/forwarder.go handles layer selection for simulcast. It tracks the current spatial and temporal layer, manages pause/resume based on bandwidth, and defines tight switching thresholds — ResumeBehindThresholdSeconds = 0.2, LayerSwitchBehindThresholdSeconds = 0.05 — controlling when it's safe to cut between simulcast layers without causing artifacts.

Bandwidth estimation lives in pkg/sfu/streamallocator/streamallocator.go. It runs as a state machine with two states: STABLE and DEFICIENT. When bandwidth is tight, it prioritizes tracks by type — screenshare gets cPriorityMax = 255, regular video gets cPriorityMin = 1 — then downgrades video layers starting from lowest priority. It probes upward periodically using RTCP padding, switching to shorter ping intervals (cPingShort = 100ms) when probing and backing off to cPingLong = 15s when stable.

The Sequencer in pkg/sfu/sequencer.go manages the packet metadata cache per downtrack. Each packetMeta entry stores both the source and target sequence numbers, codec rewrite bytes, dependency descriptor bytes, and NACK tracking — packed into a fixed-size struct with inline byte arrays ([8]byte) to avoid heap allocations on the hot path.

Routing across nodes uses Redis: pkg/routing/redisrouter.go registers each node in a nodes hash and maps room names to node IDs in room_node_map. Signaling messages are pub/subbed over Redis channels, and the LocalRouter handles in-process dispatch when participants are on the same node. The result is that any node can accept a new participant connection and forward their signaling to wherever the room is actually hosted.

// pkg/routing/redisrouter.go
const (
    NodesKey    = "nodes"
    NodeRoomKey = "room_node_map"
)

func (r *RedisRouter) RegisterNode() error {
    data, err := proto.Marshal(r.currentNode.Clone())
    // ...
    return r.rc.HSet(r.ctx, NodesKey, string(r.currentNode.NodeID()), data).Err()
}

The packet pool in pkg/sfu/sfu.go is a small but telling detail:

var PacketFactory = &sync.Pool{
    New: func() any {
        b := make([]byte, 1460)
        return &b
    },
}

1460 bytes — exactly the Ethernet MTU minus IP and UDP headers. Every RTP packet is allocated from this pool and returned after forwarding, keeping GC pressure low on the hot forwarding path.

Using it

Start a dev server:

brew install livekit
livekit-server --dev
# API Key: devkey / API Secret: secret

Generate a JWT and join a room:

lk token create \
    --api-key devkey --api-secret secret \
    --join --room my-room --identity bot1 \
    --valid-for 24h

lk room join \
    --url ws://localhost:7880 \
    --api-key devkey --api-secret secret \
    --identity bot1 \
    --publish-demo my-room

The --publish-demo flag streams a looped test video into the room. Useful for verifying the full forwarding path without needing a browser client.

For building an AI voice agent, you'd connect via the Python or Node Agents SDK, subscribe to an audio track, and pipe samples through a speech model:

from livekit.agents import Agent, AgentSession

async def entrypoint(ctx: JobContext):
    session = AgentSession()
    await session.start(ctx.room, agent=MyAgent())

Rough edges

The codebase is large and the architecture is not well-documented beyond the README. Understanding the DownTrack/Forwarder/StreamAllocator interaction requires reading several thousand lines of Go. There's no architecture doc.

The test coverage is uneven. pkg/sfu has tests for the forwarder, sequencer, and receiver, but the stream allocator — the most behaviorally complex part — has no dedicated test file. pkg/rtc has room and subscription manager tests but nothing for the distributed routing paths.

The go.mod uses go 1.25.0, which is ahead of the current stable release, so you need a recent toolchain. Some dependencies like livekit/protocol and livekit/mediatransportutil track near-HEAD versions with dated pseudo-versions rather than tagged releases, which means dependency pinning is fragile if you fork or vendor.

Bottom line

If you're building anything that needs real-time audio/video — especially voice AI applications where you need programmable server-side participants — LiveKit is the most complete open-source option available. The SFU internals are production-grade and actively maintained; the rough parts are mostly around testing and documentation rather than the core media path.