X Bookmarks — 2025 KW05: DeepSeek R1, Web Scrapers, and a Whole Lot of Tom Dörr

January 30, 2025

|bookmarks

by Florian Narr

X Bookmarks — 2025 KW05: DeepSeek R1, Web Scrapers, and a Whole Lot of Tom Dörr

@Werner — DeepSeek R1 technical paper

The DeepSeek technical paper is a fascinating read https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

Werner Vogels pointing directly at the PDF rather than a blog post or a summary thread is the right call. The paper covers the RL training loop, the Group Relative Policy Optimization approach, and how they got the model to develop chain-of-thought reasoning without supervised fine-tuning. Worth reading the sections on "aha moments" — the moment the model starts to self-correct mid-reasoning is described pretty candidly.


@lgrammel — showing reasoning with useChat

Show reasoning w/ useChat: https://t.co/MDMN6nN0TO

Lars Grammel from the Vercel AI SDK team demoing how to surface model reasoning steps in a chat UI. That's interesting because reasoning tokens are usually hidden — you get the final answer but not the scratchpad. Threading them through useChat means you can expose them progressively as the model thinks, which changes how users interpret slow responses (it's not hanging, it's working).


@tom_doerr — AI framework for problem-solving

AI framework for problem-solving https://t.co/G71LLttCma

Another structured agentic framework — these keep shipping. Tom's reposts tend to track what's quietly gaining stars on GitHub before it hits the HN front page. At 2k+ bookmarks this one clearly resonated. Hard to evaluate without seeing the actual repo, but the framing of "problem-solving framework" rather than just "agent framework" suggests it might be opinionated about decomposition and planning rather than just tool-calling scaffolding.


@tom_doerr — AI-powered bookmark and knowledge manager

AI-powered bookmark and knowledge manager https://t.co/gVXL6Ih6Ah

Ironically bookmarked a post about a bookmark manager. The "AI-powered" angle here presumably means semantic search or auto-tagging rather than just storing URLs — otherwise it's just a bookmark manager. 1.8k saves suggests people are frustrated enough with their current setup to at least look at alternatives. Honestly, same.


@tom_doerr — Web scraper tool

Web scraper tool https://t.co/rNXyLwTpqE

7,153 bookmarks on a web scraper post is a lot. That's not "I might use this someday" numbers — that's "I need this now." Web scraping is one of those perennial pain points: every solution works until the target site updates its layout or starts serving JS-rendered content. Whatever this tool does differently, it clearly hit a nerve.


@tom_doerr — Open-source screen recording tool

Open-source screen recording tool https://t.co/VzAzHr5PAE

Screen recording is a solved problem — until you actually need it to be scriptable, headless, or self-hostable. 5k bookmarks suggests this fills a gap the commercial options don't. The "open-source" qualifier matters here: if you're building demos, walkthroughs, or CI test artifacts, you want something you can run in a pipeline, not a subscription app.


@tom_doerr — OCR and document analysis toolkit

OCR and document analysis toolkit https://t.co/7yChPCjagz

Nearly 2.8k bookmarks on an OCR toolkit. Document parsing is one of those things that sounds boring until you're actually doing it — PDFs with multi-column layouts, scanned images, mixed fonts. Most OCR tools handle clean text fine and fall apart on anything messier. The "document analysis" part is the interesting addition: whether that means layout detection, table extraction, or something else changes the picture considerably.


@tom_doerr — LLM course by Maxime Labonne

LLM course by Maxime Labonne https://t.co/HiAjjDKw1M

Maxime Labonne has been one of the more useful practitioners writing about LLMs — less "here's what GPT-4 can do" and more "here's how fine-tuning actually works." 2.5k bookmarks on a course link is a strong signal. If you've been meaning to go deeper on the training side rather than just the API side, this one's worth opening.