LangChain: The Runnable Interface That Ate the LLM Stack

LangChain is a Python framework for building applications on top of language models. The description used to be "chaining LLM calls" — now it's "the agent engineering platform." Both are accurate depending on which layer you look at.

Why I starred it

Early 2023, before "agent" became a marketing term, LangChain was solving a real problem: LLM calls don't compose cleanly. You have a prompt template, a model call, an output parser, maybe a retrieval step — and gluing these together with ad-hoc Python was messy, untestable, and hard to swap out. LangChain introduced a uniform interface so any of these steps could be chained without writing glue code every time.

What made it interesting wasn't the pre-built chains. It was the underlying abstraction.

How it works

Everything in LangChain descends from Runnable, defined in libs/core/langchain_core/runnables/base.py. The interface is deliberately minimal:

class Runnable(ABC, Generic[Input, Output]):
    def invoke(self, input: Input, config: RunnableConfig | None = None) -> Output: ...
    def batch(self, inputs: list[Input], ...) -> list[Output]: ...
    def stream(self, input: Input, ...) -> Iterator[Output]: ...
    async def ainvoke(self, input: Input, ...) -> Output: ...
    async def abatch(self, inputs: list[Input], ...) -> list[Output]: ...
    async def astream(self, input: Input, ...) -> AsyncIterator[Output]: ...

Any object implementing this interface — a prompt template, a chat model, an output parser, a retrieval step — can be composed with the | operator. That | overload is implemented in Runnable.__or__ at line 618:

def __or__(self, other) -> RunnableSerializable[Input, Other]:
    return RunnableSequence(self, coerce_to_runnable(other))

coerce_to_runnable (line 6176) handles the implicit wrapping — pass a plain Python callable and you get a RunnableLambda; pass a dict and you get a RunnableParallel. That's why you can write:

chain = prompt | model | StrOutputParser()

...and it just works. Each | creates a new RunnableSequence that flattens itself — so chaining three sequences together doesn't produce a tree of sequences, it stays a flat list of steps (first, middle, last in the Pydantic model).

The RunnableSequence.invoke method at line 3131 walks that list, patching the callback context for each step so tracing tools (LangSmith, or just ConsoleCallbackHandler) can see exactly which step produced what:

for i, step in enumerate(self.steps):
    config = patch_config(config, callbacks=run_manager.get_child(f"seq:step:{i + 1}"))
    with set_config_context(config) as context:
        input_ = context.run(step.invoke, input_, config)

The async variant does the same thing with functools.partial and coro_with_context to preserve the context variable across async boundaries — a non-obvious problem that most home-rolled async pipelines get wrong.

RunnableParallel — constructed from a dict literal — runs its branches concurrently using asyncio.gather and collects results into a dict keyed by branch name. Combined with RunnableSequence, this gives you fan-out/fan-in for free:

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

Using it

from langchain.chat_models import init_chat_model
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = init_chat_model("openai:gpt-4o-mini")

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer concisely."),
    ("human", "{question}"),
])

chain = prompt | model | StrOutputParser()

# Sync
result = chain.invoke({"question": "What is a Runnable?"})

# Stream tokens as they arrive
for chunk in chain.stream({"question": "Explain RAG"}):
    print(chunk, end="", flush=True)

# Batch multiple inputs in parallel
results = chain.batch([
    {"question": "What is LangChain?"},
    {"question": "What is LangGraph?"},
])

batch defaults to a thread pool, so you're not paying the overhead of asyncio machinery for simple sync code. For async workloads, abatch uses asyncio.gather.

For streaming through multi-step chains, the transform protocol matters. Any step that implements transform(Iterator[Input]) -> Iterator[Output] gets streamed input — meaning tokens flow through the whole chain as they arrive. Steps that don't implement transform (including RunnableLambda) block until their predecessor finishes.

Rough edges

The package split is confusing. langchain-core holds the Runnable primitives and base classes. langchain (now langchain_classic internally) holds the legacy Chain classes, agents, and higher-level constructs. langchain-community holds third-party integrations. langchain-openai, langchain-anthropic, etc. are separate packages. Starting a new project means figuring out which combination to install — and the import paths have shifted enough that old tutorials break without warning.

The legacy Chain classes predate Runnable and have a different invocation interface (chain.run(...) vs chain.invoke(...)). They still exist in langchain_classic/chains/ and work, but the abstraction mismatch shows when you try to compose them with newer LCEL code.

Streaming is great in theory but only as consistent as the slowest non-transform step in your chain. If you add a RunnableLambda in the middle for some processing logic, streaming stalls at that point.

Dependency footprint is substantial. The core package pulls in pydantic, tenacity, requests, httpx, and more. Integration packages layer on top. For a simple LLM wrapper, that's a lot.

Bottom line

If you're building anything non-trivial on top of an LLM — RAG, multi-step agents, structured extraction pipelines — the Runnable interface solves real composition problems that you'd otherwise solve worse yourself. The framework has grown large enough that you don't need most of it; pick langchain-core for the primitives and add integration packages as needed.