awesome-llm-apps: 100k stars of working LLM patterns

A 100k-star collection of runnable LLM apps across RAG, agents, MCP, and voice. Each subdirectory is a standalone project — its own requirements.txt, its own Streamlit app, its own README. Think of it less as a library and more as a living pattern catalog.

Why I starred it

Most "awesome" lists are link dumps. This one ships working code. If you want to understand how corrective RAG actually behaves at runtime, you clone rag_tutorials/corrective_rag/ and run it. If you want to see how mixture-of-agents aggregation looks in practice, there's a Streamlit app for that too.

The repo has been actively maintained through 2025 — recent commits added MCP agent routing, self-evolving agents using EvoAgentX, and integrations for Gemini's interaction API. It's not abandoned. It's being updated faster than most tutorials can be written.

How it works

The repo is organized by pattern type, not by model vendor. You get starter_ai_agents/, advanced_ai_agents/, rag_tutorials/, mcp_ai_agents/, and voice_ai_agents/. Each subdirectory is flat: one main Python file, one requirements.txt.

The mixture-of-agents pattern in starter_ai_agents/mixture_of_agents/mixture-of-agents.py is a clean 80-line demonstration of the MoA architecture. It fans out to four open-source models in parallel via Together's async API, then feeds all responses into an aggregator:

async def run_llm(model):
    response = await async_client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": user_prompt}],
        temperature=0.7,
        max_tokens=512,
    )
    return model, response.choices[0].message.content

async def main():
    results = await asyncio.gather(*[run_llm(model) for model in reference_models])
    # ...
    finalStream = client.chat.completions.create(
        model=aggregator_model,
        messages=[
            {"role": "system", "content": aggregator_system_prompt},
            {"role": "user", "content": ",".join(response for _, response in results)},
        ],
        stream=True,
    )

No framework overhead. Just asyncio.gather on four Together calls, then a synchronous aggregation pass. It's a good reference when you want to understand MoA without LangChain wrapping everything.

The corrective RAG in rag_tutorials/corrective_rag/corrective_rag.py is more involved. It builds a LangGraph state machine with five nodes: retrieve → grade_documents → decide_to_generate → (transform_query → web_search)? → generate. The grading step sends each retrieved chunk to Claude with a JSON-returning prompt and filters out irrelevant documents before generation:

workflow = StateGraph(GraphState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", grade_documents)
workflow.add_node("generate", generate)
workflow.add_node("transform_query", transform_query)
workflow.add_node("web_search", web_search)

workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {"transform_query": "transform_query", "generate": "generate"},
)
workflow.add_edge("transform_query", "web_search")
workflow.add_edge("web_search", "generate")

The decide_to_generate function checks a run_web_search flag that grade_documents sets when it filters out at least one document. If any chunk gets marked irrelevant, the query gets rewritten and sent to Tavily before attempting generation. Qdrant handles the vector store; embeddings are text-embedding-3-small.

The multi-MCP agent router in mcp_ai_agents/multi_mcp_agent_router/agent_forge.py takes a different angle. Instead of one agent with all tools, it defines specialized agents where each gets a curated subset of MCP servers:

AGENTS = {
    "code_reviewer": Agent(
        mcp_servers=[
            {"name": "github", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]},
            {"name": "filesystem", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]},
        ],
    ),
    "security_auditor": Agent(
        mcp_servers=[
            {"name": "github", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]},
            {"name": "fetch", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-fetch"]},
        ],
    ),
    # ...
}

The pattern is intentional: give the security auditor the fetch server so it can pull dependency advisories; give the code reviewer the filesystem server so it can read local files. Each domain gets only the tools that make sense for it, which cuts down on tool-selection noise.

Using it

git clone https://github.com/Shubhamsaboo/awesome-llm-apps
cd awesome-llm-apps/rag_tutorials/corrective_rag
pip install -r requirements.txt
streamlit run corrective_rag.py

Every project follows this same pattern. API keys go in the Streamlit sidebar — no .env file setup required, which makes onboarding fast. The sidebar collects Anthropic, OpenAI, Tavily, and Qdrant credentials at runtime.

The self-evolving agent in advanced_ai_agents/multi_agent_apps/ai_self_evolving_agent/ uses EvoAgentX to generate a workflow graph from a plain-English goal ("Generate html code for the Tetris game"), spin up agents from that graph, execute the workflow, then pass the output through a separate Claude-powered verification step. That two-phase generate-then-verify pattern is worth stealing even outside the EvoAgentX context.

Rough edges

The consistency is uneven. Some projects have detailed READMEs with architecture diagrams; others have a three-line description. Test coverage is zero — this is a demos repo, not a library, so that's expected, but worth knowing if you're evaluating it as a production reference.

API key management is entirely manual and varies by project. Some use .env files with python-dotenv, others collect keys through Streamlit's st.text_input. There's no shared configuration pattern across the collection.

The corrective RAG's grading prompt parses JSON from Claude's output with a re.search(r'\{.*\}', response) regex fallback — if the model returns a garbled response, it keeps the document to avoid dropping context. That's a reasonable failure mode, but the implementation is brittle in a way the code acknowledges with a comment but doesn't fully address.

Dependency versions between projects can conflict if you try to share a virtual environment. Each project really does need its own isolated environment.

Bottom line

If you're trying to understand a specific LLM pattern — corrective RAG, mixture-of-agents, MCP routing, memory-augmented agents — cloning the relevant subdirectory and running it is faster than reading a blog post about it. That's the actual value here: working code you can instrument and break.