SerpBear: Self-Hosted SERP Rank Tracker

SerpBear is a self-hosted SERP rank tracking app. You point it at your domains, add keywords, wire up a scraping provider, and it tracks where you rank on Google — positions over time, per country, per device, with email notifications when things shift.

Why I starred it

Every managed rank tracking tool charges you by the keyword or the seat. Ahrefs, SEMrush, Sistrix — they're useful but they're also $50–$300/month before you hit any meaningful keyword volume. SerpBear is the escape hatch: run it on a $5 VPS or Fly.io free tier, bring your own proxy or a cheap third-party scraping API, and own your data.

The architecture is also worth looking at. It's a Next.js app running both the UI and the API routes, with a separate cron.js process handling scheduled scrapes outside the Next.js runtime. SQLite for persistence. Zero managed infrastructure required.

How it works

The scraping layer is cleanly abstracted. Every provider is a module implementing a ScraperSettings interface with three fields: headers(), scrapeURL(), and serpExtractor(). The entry point at scrapers/index.ts just exports an array of these modules — no dynamic loading, no config file, just explicit imports.

The proxy scraper (scrapers/services/proxy.ts) is the most transparent one: it constructs a direct Google search URL and parses the HTML response with Cheerio, searching for h3 elements under #main:

serpExtractor: (content) => {
  const $ = cheerio.load(content);
  const mainContent = $('body').find('#main');
  const children = $(mainContent).find('h3');

  for (let index = 0; index < children.length; index += 1) {
    const title = $(children[index]).text();
    const url = $(children[index]).closest('a').attr('href');
    const cleanedURL = url
      ? url.replaceAll(/^.+?(?=https:|$)/g, '').replaceAll(/(&).*/g, '')
      : '';
    if (title && url) {
      lastPosition += 1;
      extractedResult.push({ title, url: cleanedURL, position: lastPosition });
    }
  }
  return extractedResult;
}

That regex on cleanedURL strips the Google redirect wrapper (/url?q=) and query parameters. Fragile by definition — Google's HTML is unversioned and changes without notice — but it works until it doesn't, which is why the managed API providers exist as fallbacks.

The scheduling is handled entirely outside Next.js. cron.js is a plain Node.js process that reads settings from data/settings.json (decrypted via cryptr), builds cron schedules using the croner library, and fires internal HTTP requests to the Next.js API routes:

new Cron(scrapeCronTime, () => {
  const fetchOpts = { method: 'POST', headers: { Authorization: `Bearer ${process.env.APIKEY}` } };
  fetch(`${INTERNAL_BASE_URL}/api/cron`, fetchOpts)
    .then((res) => res.json())
    .catch((err) => { console.log('ERROR Making SERP Scraper Cron Request..'); });
}, { scheduled: true });

There's also an hourly retry cron that reads from data/failed_queue.json and resubmits failed keyword IDs. The failed queue file is JSON — if it gets corrupted, there's a guard in getAppSettings() that backs it up with a timestamp suffix instead of silently resetting.

Refreshing keywords from the UI goes through pages/api/refresh.ts, which sets updating: true on the keyword rows, kicks off refreshAndUpdateKeywords() in the background (non-blocking for bulk), and returns immediately. The frontend polls every 5 seconds via react-query until all updating flags clear — straightforward but means the UI is slightly behind real state during scrapes.

The data model is two Sequelize models: Domain and Keyword, backed by SQLite. Migrations are handled by umzug. Google Search Console integration runs as a separate daily cron and writes actual impressions/click data alongside the scraped positions.

Using it

Docker is the recommended path:

docker run --name serpbear \
  -p 3000:3000 \
  -e SECRET=your_secret_here \
  -e APIKEY=your_api_key_here \
  -e NEXTAUTH_URL=http://localhost:3000 \
  -e USER=admin \
  -e PASSWORD=yourpassword \
  -v $(pwd)/data:/app/data \
  towfiqi/serpbear

After that, visit localhost:3000, add a domain, go to Settings to configure your scraping provider, and add keywords. The built-in REST API (/api/keywords, /api/domains) is documented enough to pull data into external reporting pipelines.

For the free path, ScrapingRobot gives 5,000 lookups per month on their free tier — enough for a few hundred keywords tracked weekly.

Rough edges

The proxy scraper breaks whenever Google changes its HTML structure, which happens every few months. There's no alerting when scrapes silently return zero results — you'd only notice when you see positions disappear from the UI.

Test coverage is thin. The __tests__ directory has three test files covering basic page rendering. The scraper modules themselves have no tests, which matters because the Cheerio selectors are the most failure-prone part of the whole system.

The app is pinned to Next.js 12 ("next": "^12.3.4"). That's two major versions behind, and the Pages Router architecture means migrating is non-trivial. The project is still active — the latest release (3.1.0) landed recently — but the core stack hasn't moved in a while.

Settings are stored in a flat JSON file at data/settings.json rather than in SQLite alongside the rest of the data. The encrypted API keys live there too. Not a security hole, but it's an inconsistency that can bite you if you're managing the data directory separately from the database.

Bottom line

If you're tracking keyword positions for your own projects and don't want to pay per-keyword fees, SerpBear is the most complete self-hosted option available. The scraping provider abstraction is well-designed and makes switching between services trivial — that's the architectural decision that makes it maintainable long-term.