ComfyUI: A Node Graph Engine for Diffusion Models

ComfyUI is a visual node editor for running diffusion model pipelines — Stable Diffusion, Flux, HunyuanVideo, and a growing list of others. You wire together nodes (load checkpoint, encode text, sample, decode VAE, save image) and hit run. No code required, but the entire graph is serializable JSON and fully scriptable via API.

Why I starred it

The surface area is a GUI, but I was curious about what's underneath. A lot of these visual workflow tools are thin wrappers that just call subprocess. ComfyUI is not that. It has a full async execution engine with topological graph resolution, multiple cache strategies, and a memory management layer that dynamically offloads model weights based on available VRAM. At 108k stars it's clearly the standard for local diffusion pipelines — I wanted to understand why the engineering holds up at that scale.

How it works

The entry point is main.py, which sets up a PromptServer (an aiohttp server in server.py) and a PromptExecutor in execution.py. When you queue a workflow in the UI, it POSTs a JSON prompt to /prompt. The executor picks it up from a queue and runs it.

The graph execution lives in comfy_execution/graph.py. The DynamicPrompt class holds the workflow as a dict — original nodes plus any ephemeral_prompt nodes spawned during execution (for dynamic subgraphs). ExecutionList extends TopologicalSort and walks the dependency graph, staging nodes when all their inputs are satisfied:

class ExecutionList(TopologicalSort):
    """
    ExecutionList implements a topological dissolve of the graph. After a node is staged
    for execution, it can still be returned to the graph after having further dependencies added.
    """
    async def stage_node_execution(self):
        assert self.staged_node_id is None
        if self.is_empty():
            return None, None, None
        available = self.get_ready_nodes()
        while len(available) == 0 and self.externalBlocks > 0:
            await self.unblockedEvent.wait()
            ...

The "only reruns what changed" optimization is more interesting than it sounds. Each node has an optional IS_CHANGED classmethod (or the newer fingerprint_inputs for v3 nodes). The IsChangedCache in execution.py evaluates this before execution and uses the result as part of the cache key. If inputs haven't changed and IS_CHANGED returns the same value as last run, the node's outputs are served from cache.

The caching layer in comfy_execution/caching.py has four modes: classic (dump ASAP), LRU, RAM pressure-aware, and null. The default HierarchicalCache uses CacheKeySetInputSignature — which hashes the entire ancestry of a node's inputs, not just the node itself. This is what lets it skip an entire branch of the graph if none of the upstream inputs changed:

async def get_node_signature(self, dynprompt, node_id):
    signature = []
    ancestors, order_mapping = self.get_ordered_ancestry(dynprompt, node_id)
    signature.append(await self.get_immediate_node_signature(dynprompt, node_id, order_mapping))
    for ancestor_id in ancestors:
        signature.append(await self.get_immediate_node_signature(dynprompt, ancestor_id, order_mapping))
    return to_hashable(signature)

The to_hashable function recursively converts inputs to frozensets for hashing, with an Unhashable sentinel for tensors (which can't be content-addressed). Nodes containing tensors in their outputs correctly fall back to ID-based caching.

Memory management is in comfy/model_management.py. The VRAMState enum tracks six levels from DISABLED to HIGH_VRAM, and load_models_gpu() at line 718 decides how much of each model to keep on-device. The low VRAM path (lowvram_model_memory) computes the maximum fraction of a model that fits while leaving minimum_inference_memory() headroom for the actual compute. This is what makes ComfyUI usable on a card with 4GB VRAM running SDXL — it pages weights in and out per forward pass rather than requiring the full model to fit.

Node registration is simple to the point of being elegant. Nodes declare their interface with class attributes:

class CLIPTextEncode(ComfyNodeABC):
    @classmethod
    def INPUT_TYPES(s) -> InputTypeDict:
        return {
            "required": {
                "text": (IO.STRING, {"multiline": True, "dynamicPrompts": True}),
                "clip": (IO.CLIP, {})
            }
        }
    RETURN_TYPES = (IO.CONDITIONING,)
    FUNCTION = "encode"
    CATEGORY = "conditioning"

Custom nodes drop a Python file into custom_nodes/ and register themselves by populating NODE_CLASS_MAPPINGS and NODE_DISPLAY_NAME_MAPPINGS. No plugin manifest, no config file — the loader in nodes.py imports everything in that directory at startup. This is both its greatest strength (anyone can write a node in 20 lines) and a significant attack surface (loading arbitrary Python from third-party repos on startup).

Using it

pip install comfy-cli
comfy install
comfy launch

# Or via API
curl -X POST http://127.0.0.1:8188/prompt \
  -H "Content-Type: application/json" \
  -d '{"prompt": {"3": {"class_type": "KSampler", "inputs": {...}}}}'

The API server returns a prompt ID and you poll /history/{prompt_id} for results. The script_examples/ directory has a working Python client that demonstrates this — it's actually clean, readable code.

The frontend is a separate repo (Comfy-Org/ComfyUI_frontend) bundled into core every two weeks. The UI is built on LiteGraph.js with substantial customizations. Workflows are saved as JSON and can be embedded directly into generated PNG files as metadata — loading an image re-imports the exact workflow that created it.

Rough edges

The custom node ecosystem is the obvious one. There are thousands of community nodes covering ControlNet, face swapping, upscaling, video, audio, 3D — but they have zero standardization on error handling, dependency management, or API stability. A major core update regularly breaks a subset of them. The pytest.ini and tests-unit/ directory show real test coverage exists, but it's concentrated on the execution engine and server — the actual model integration paths have minimal testing.

The hook_breaker_ac10a0.py at the repo root is a monkey-patching module for breaking import hooks, which suggests the dependency loading story has accumulated some technical debt. Documentation is thin outside the README — the API is mostly discoverable by reading the source or the community wiki.

The frontend and core release cycles being decoupled (frontend ships every two weeks, core weekly) means the bundled frontend in any given core release might lag by up to two weeks. Not a real problem in practice, but worth knowing if you're building on the API and need UI parity.

Bottom line

If you're running diffusion models locally and want control over every step of the pipeline without writing a thousand lines of Python, ComfyUI is the tool. The execution engine is genuinely well-engineered for the problem — the hierarchical cache and dynamic VRAM management aren't afterthoughts. The ecosystem of custom nodes is enormous but fragile; treat third-party nodes as untrusted code.