Skip to content

Hourly Improvements: Two Ways to Keep a Site Moving

10 min read
#cloudflare#ai#agents#claude-code#workers

Websites rot. Not in dramatic ways. In small ones. A missing rel="noopener", a stale lastmod, a cookie banner that covers the footer on first load, a screen reader that gets spammed by a rotating title every 400ms. Each of these is a five-minute fix. None of them are ever urgent enough to schedule.

I wanted a system that picked one of these off every hour, opened a PR, and let me review it with my coffee. Not a big weekly audit. Not a quarterly cleanup. Small, specific, mergeable changes on a loop.

I ended up building two versions of the same idea. One runs locally inside Claude Code. The other runs on Cloudflare Workers whether my laptop is open or not. This post is about both, and about the constraint that shapes them: one improvement per run, never more.

The One-Per-Run Rule

The temptation with any automated improvement system is to batch. Find ten issues, fix them all, open one big PR. It feels more efficient. It isn't.

Big PRs don't get merged. They sit. Small PRs do get merged, because there's nothing to review beyond the single change they make. A PR titled "Stop sidebar typewriter from spamming screen readers every rotation" takes 30 seconds to review. A PR titled "Accessibility, SEO, and performance fixes" takes 30 minutes, which means it waits until I have 30 minutes, which means never.

So both systems have the same hard rule at the top of their prompt: pick the one highest-impact improvement, implement it, open a PR. If there's nothing worth fixing, exit. Don't manufacture work.

That constraint shapes everything downstream. Triage has to rank issues, not collect them. The fixer has to touch as few files as possible. The reviewer has to reject scope creep as aggressively as it rejects bad fixes. The rule is the thing that keeps the PR list reviewable instead of a backlog of half-finished audits.

The Local Loop: Claude Code Scheduled Tasks

Claude Code has two ways to run things on a schedule. /loop runs a prompt or slash command on a recurring interval in the current session. /schedule runs a remote agent on a cron. Both hit the same scheduled_tasks.lock file in the project root so you can see what's running.

For the local version of this I use /loop 1h follow .claude/prompts/hourly-improvement.md exactly. The prompt is checked into the repo so it evolves with the site. Top of the prompt:

Your job: Assess the live site and codebase, pick the ONE highest-impact improvement, implement it, and open a PR.

The prompt walks Claude through four steps. Snapshot the live site via Cloudflare's Browser Rendering API (desktop and mobile, every key page). Review the codebase for what's behind whatever the screenshots surfaced. Pick one improvement using a ranked priority list (security, broken functionality, SEO, performance, design polish, accessibility, content, code quality). Implement it and open a PR with a structured body: what's wrong, why it matters, what changed, evidence.

There's a hard list of rules at the bottom:

  • ONE improvement per run. Never batch.
  • Verify the issue actually exists before fixing it.
  • Only edit existing files, never create new pages.
  • Check recent PRs to avoid duplicating work someone (or the last run) already shipped.
  • Read the full file before and after editing.

The "check recent PRs" rule earns its keep. Without it, the loop will happily open the same fix twice in a row if the first one hasn't merged yet. With it, every run starts with gh pr list --state open --limit 10 and moves on to the next-highest-priority issue if the top one is already in flight.

The prompt also tells Claude to update CLAUDE.md in the same PR if the change touches anything the doc describes. A new npm script, a new Cloudflare binding, a changed pattern. The doc is the onboarding surface for every future run of this loop, so drift compounds badly if you let it.

What makes this work on top of Claude Code specifically, rather than a shell script calling the Anthropic API: Claude Code already has the MCP servers, the Playwright/Browser Rendering access, the GitHub CLI, the repo context, and the instructions in CLAUDE.md. The loop inherits all of it. I didn't have to plumb auth for GitHub or build a screenshot pipeline. It was already there.

The Remote Loop: Vigil on Cloudflare Workers

The local loop has one weakness. It runs when my laptop is on and Claude Code is open. If I'm on a train or at an event, nothing happens.

So I built a second version that lives entirely on Cloudflare. I called it Vigil. It monitors multiple sites (currently iammattl.com and techpartprices.com), runs the same kind of checks, and opens PRs through the same GitHub API.

The architecture is a five-agent pipeline running on Cloudflare Workers, with Queues between each stage:

detected → triaged → diagnosed → fixed → review_passed → pr_created → deployed
                                   ↑           |
                                   └── review_failed (retry, max 3)

Each stage is a separate function that picks up messages from its inbound queue, calls Workers AI, updates state in D1, and enqueues the next stage. Cron triggers drive the detection side of the system on different schedules for different check types:

"triggers": {
  "crons": [
    "*/15 * * * *",  // Uptime — every 15 mins
    "0 2 * * *",     // SEO — daily
    "0 3 * * SUN",   // Broken links — weekly
    "0 4 * * 1",     // Lighthouse — weekly
    "0 5 * * 1",     // Security headers — weekly
    "0 6 * * 1,4",   // SERP tracking — twice weekly
    "0 3 * * 1",     // Dependency CVEs — weekly
    "0 9 * * 1"      // Weekly digest — Monday
  ]
}

Detection is dumb. It finds issues. The interesting work happens in the pipeline after.

Triage classifies each issue by severity, decides whether it's auto-fixable, and identifies the likely files. Uses a fast, cheap model (Qwen3) via AI Gateway because the job is classification, not writing.

Diagnostician fetches the real source from GitHub and analyses root cause. Uses a higher-quality model (Gemma 4) because the output needs to be specific enough for the fixer to act on.

Fixer generates the minimal code change. Same quality-tier model. Outputs a confidence score. If it's below 0.8 the change doesn't progress.

Reviewer is the important one. It looks at the fix without having written it, judges whether the change actually solves the issue, and can reject with feedback that loops back to the Diagnostician for a retry. Three attempts max. Using a different model from the fixer is deliberate. An independent reviewer catches things the fixer rationalised away.

Deployer creates a branch, commits the change, opens a PR, and notifies Telegram.

Model routing runs through a Cloudflare AI Gateway binding called vigil. The gateway caches identical prompts, logs every request, and enforces per-account rate limits. When a fixer invocation costs more than the budget allowed, I can see it in the gateway dashboard before it shows up on a bill.

Auto-merge exists but is off by default. It's gated behind explicit env vars:

AUTO_MERGE_ENABLED=true
AUTO_MERGE_MIN_CONFIDENCE=0.95
AUTO_MERGE_MAX_FILES=2
AUTO_MERGE_MAX_LINES=50

A fix needs all three constraints to pass before it merges itself. Anything larger still waits for me. That's the same one-per-run discipline, applied to merge time instead of PR creation.

Everything else Vigil needs is just Cloudflare primitives wired together: D1 for issue state and agent logs, KV for cache, Queues for the pipeline, Workers AI for the models, Secrets Store for tokens, GitHub API for the PRs, Telegram for notifications. No dedicated server, no always-on process, no external services beyond GitHub and Telegram.

Two Loops, Same Rule

The two systems share the one-per-run constraint but otherwise make opposite choices.

Local loop. Runs inside Claude Code on my machine. Uses Claude as the model. Scoped to one repo. Scheduled via /loop 1h while the session is open. State lives in git plus throwaway files in /tmp. Every PR gets my eyes before it merges. Full context on hand: CLAUDE.md, feedback memory, the MCP servers Claude Code is already wired into. Cost: covered by my Claude Code subscription.

Vigil. Runs on Cloudflare Workers. Uses Workers AI (Gemma 4 for the heavy agents, Qwen3 for the fast ones) routed through AI Gateway. Monitors multiple sites. Scheduled via Cloudflare Cron, always on whether my laptop is or not. State lives in D1, KV, and Queues. A reviewer agent checks the fix before it gets to me. Context is whatever the cron-triggered check output surfaces, nothing more. Cost: Workers AI per-token.

The local loop wins when the decision requires judgement I'd struggle to write into a prompt. Design polish, copy that captures my tone, anything where "looks right" matters. Claude inside Claude Code has access to the same context I do, including the repo's CLAUDE.md and feedback memory, so its taste lines up with mine.

Vigil wins when the decision is mechanical and when uptime matters. A 500 at 3am needs someone looking at it. Security headers drifting after a library upgrade need catching. A broken external link on a blog post from 2026-03-16 needs finding on a crawl, not by me remembering to look. Vigil does these on cron while I sleep.

They overlap more than they compete. The local loop shipped the cross-origin opener policy header. Vigil shipped the broken-link alerts that got me to fix three outbound references by hand. Same site, same kind of improvement, different tradeoffs on who was driving.

What They've Actually Shipped

A sample of real PRs from the last month on iammattl.com. Most of these are on the git log if you want to check them:

  • Stop sidebar typewriter from spamming screen readers every rotation
  • Use CreativeWork (not WebSite) for URL-less archived projects in JSON-LD
  • Dispatch cookie-consent-update on Decline too, not just Accept
  • Extract requireAuth() helper; stop redirecting to the dead /admin/login
  • Set Cross-Origin-Opener-Policy: same-origin for tab isolation
  • Advertise the resume PDF alternate in /resume's Link header
  • Emit X-Cache: HIT|MISS from withEdgeCache for observability
  • Surface recent blog posts on the 404 page
  • Emit wordCount + timeRequired on BlogPosting JSON-LD
  • Route /feed.xml through withEdgeCache for per-POP caching

Every one of these is a fix I would have meant to do and never got around to. Each one is small enough that reviewing it takes less time than writing the PR description. That ratio is the point. If a system like this ever produces a PR I don't want to merge, I've broken the rule that makes it work.

What I Got Wrong

The prompt drifted faster than I expected. Cloudflare adds a new API, I add a new binding, the prompt's Step 1 stops reflecting what the repo actually is. The first few runs after drift produce worse results than usual. The fix was making the prompt's Step 1f an explicit "CLAUDE.md drift check". If the doc is meaningfully stale and no higher-priority issue exists, update the doc as the improvement for that run. The loop maintains its own onboarding surface.

Verifying the issue actually exists matters more than I realised. Early runs would confidently fix things that were already correct, because the model inferred the problem from an old pattern in its training data rather than reading the current file. I added "Verify the issue ACTUALLY exists. Do NOT fix something already correct" to the rules as a hard line. It cut false-positive PRs significantly.

The reviewer in Vigil is what makes autonomous fixes tolerable. When I ran Vigil without the reviewer stage, the fixer would produce plausible-looking diffs that missed the actual issue. Adding a separate agent, with a different model, whose only job is to say whether the fix solves the problem, moved the failure mode from "wrong change merged" to "correct change delayed by a retry". The extra inference cost is worth it.

Auto-merge felt more dangerous than it turned out to be. I kept it off for months. When I finally turned it on with the 0.95-confidence and 2-file thresholds, nothing bad happened because the constraints are so tight that the only things it merges are truly boring. The lesson was that the scary part isn't auto-merge, it's auto-merge without a tight gate.

Where to Start

If you want to try this on your own site, start with the local loop. It's the lower-commitment version and the one that teaches you what to ask for.

  1. Write a prompt in your repo at .claude/prompts/hourly-improvement.md. Lead with the one-per-run rule. Include a priority list so ranking is explicit. Add a "check recent PRs before proposing" step.
  2. Run /loop 1h follow .claude/prompts/hourly-improvement.md exactly in Claude Code. Let it open a PR. Review it. If the PR is bad, the prompt needs work. Iterate on the prompt, not on the tool.
  3. Once the PRs are consistently mergeable, graduate to scheduled tasks so the loop runs even when you're not in the session.

Build the remote version only when you're running out of laptop time. Vigil makes sense because I have two sites, I travel, and I want coverage at times when I'm not working. For a single site you own and actively develop, the local loop is enough.

Both systems live on one idea. Pick one thing. Fix it. Ship it. Do it again in an hour.