Matt Lambert — Software & Cloud Engineering

Hourly Improvements: Two Ways to Keep a Site Moving

Mon, 20 Apr 2026 15:50:00 GMT

Websites rot. Not in dramatic ways. In small ones. A missing rel="noopener", a stale lastmod, a cookie banner that covers the footer on first load, a screen reader that gets spammed by a rotating title every 400ms. Each of these is a five-minute fix. None of them are ever urgent enough to schedule.

I wanted a system that picked one of these off every hour, opened a PR, and let me review it with my coffee. Not a big weekly audit. Not a quarterly cleanup. Small, specific, mergeable changes on a loop.

I ended up building two versions of the same idea. One runs locally inside Claude Code. The other runs on Cloudflare Workers whether my laptop is open or not. This post is about both, and about the constraint that shapes them: one improvement per run, never more.

The One-Per-Run Rule

The temptation with any automated improvement system is to batch. Find ten issues, fix them all, open one big PR. It feels more efficient. It isn't.

Big PRs don't get merged. They sit. Small PRs do get merged, because there's nothing to review beyond the single change they make. A PR titled "Stop sidebar typewriter from spamming screen readers every rotation" takes 30 seconds to review. A PR titled "Accessibility, SEO, and performance fixes" takes 30 minutes, which means it waits until I have 30 minutes, which means never.

So both systems have the same hard rule at the top of their prompt: pick the one highest-impact improvement, implement it, open a PR. If there's nothing worth fixing, exit. Don't manufacture work.

That constraint shapes everything downstream. Triage has to rank issues, not collect them. The fixer has to touch as few files as possible. The reviewer has to reject scope creep as aggressively as it rejects bad fixes. The rule is the thing that keeps the PR list reviewable instead of a backlog of half-finished audits.

The Local Loop: Claude Code Scheduled Tasks

Claude Code has two ways to run things on a schedule. /loop runs a prompt or slash command on a recurring interval in the current session. /schedule runs a remote agent on a cron. Both hit the same scheduled_tasks.lock file in the project root so you can see what's running.

For the local version of this I use /loop 1h follow .claude/prompts/hourly-improvement.md exactly. The prompt is checked into the repo so it evolves with the site. Top of the prompt:

Your job: Assess the live site and codebase, pick the ONE highest-impact improvement, implement it, and open a PR.

The prompt walks Claude through four steps. Snapshot the live site via Cloudflare's Browser Rendering API (desktop and mobile, every key page). Review the codebase for what's behind whatever the screenshots surfaced. Pick one improvement using a ranked priority list (security, broken functionality, SEO, performance, design polish, accessibility, content, code quality). Implement it and open a PR with a structured body: what's wrong, why it matters, what changed, evidence.

There's a hard list of rules at the bottom:

ONE improvement per run. Never batch.
Verify the issue actually exists before fixing it.
Only edit existing files, never create new pages.
Check recent PRs to avoid duplicating work someone (or the last run) already shipped.
Read the full file before and after editing.

The "check recent PRs" rule earns its keep. Without it, the loop will happily open the same fix twice in a row if the first one hasn't merged yet. With it, every run starts with gh pr list --state open --limit 10 and moves on to the next-highest-priority issue if the top one is already in flight.

The prompt also tells Claude to update CLAUDE.md in the same PR if the change touches anything the doc describes. A new npm script, a new Cloudflare binding, a changed pattern. The doc is the onboarding surface for every future run of this loop, so drift compounds badly if you let it.

What makes this work on top of Claude Code specifically, rather than a shell script calling the Anthropic API: Claude Code already has the MCP servers, the Playwright/Browser Rendering access, the GitHub CLI, the repo context, and the instructions in CLAUDE.md. The loop inherits all of it. I didn't have to plumb auth for GitHub or build a screenshot pipeline. It was already there.

The Remote Loop: Vigil on Cloudflare Workers

The local loop has one weakness. It runs when my laptop is on and Claude Code is open. If I'm on a train or at an event, nothing happens.

So I built a second version that lives entirely on Cloudflare. I called it Vigil. It monitors multiple sites (currently iammattl.com and techpartprices.com), runs the same kind of checks, and opens PRs through the same GitHub API.

The architecture is a five-agent pipeline running on Cloudflare Workers, with Queues between each stage:

detected → triaged → diagnosed → fixed → review_passed → pr_created → deployed
                                   ↑           |
                                   └── review_failed (retry, max 3)

Each stage is a separate function that picks up messages from its inbound queue, calls Workers AI, updates state in D1, and enqueues the next stage. Cron triggers drive the detection side of the system on different schedules for different check types:

"triggers": {
  "crons": [
    "*/15 * * * *",  // Uptime — every 15 mins
    "0 2 * * *",     // SEO — daily
    "0 3 * * SUN",   // Broken links — weekly
    "0 4 * * 1",     // Lighthouse — weekly
    "0 5 * * 1",     // Security headers — weekly
    "0 6 * * 1,4",   // SERP tracking — twice weekly
    "0 3 * * 1",     // Dependency CVEs — weekly
    "0 9 * * 1"      // Weekly digest — Monday
  ]
}

Detection is dumb. It finds issues. The interesting work happens in the pipeline after.

Triage classifies each issue by severity, decides whether it's auto-fixable, and identifies the likely files. Uses a fast, cheap model (Qwen3) via AI Gateway because the job is classification, not writing.

Diagnostician fetches the real source from GitHub and analyses root cause. Uses a higher-quality model (Gemma 4) because the output needs to be specific enough for the fixer to act on.

Fixer generates the minimal code change. Same quality-tier model. Outputs a confidence score. If it's below 0.8 the change doesn't progress.

Reviewer is the important one. It looks at the fix without having written it, judges whether the change actually solves the issue, and can reject with feedback that loops back to the Diagnostician for a retry. Three attempts max. Using a different model from the fixer is deliberate. An independent reviewer catches things the fixer rationalised away.

Deployer creates a branch, commits the change, opens a PR, and notifies Telegram.

Model routing runs through a Cloudflare AI Gateway binding called vigil. The gateway caches identical prompts, logs every request, and enforces per-account rate limits. When a fixer invocation costs more than the budget allowed, I can see it in the gateway dashboard before it shows up on a bill.

Auto-merge exists but is off by default. It's gated behind explicit env vars:

AUTO_MERGE_ENABLED=true
AUTO_MERGE_MIN_CONFIDENCE=0.95
AUTO_MERGE_MAX_FILES=2
AUTO_MERGE_MAX_LINES=50

A fix needs all three constraints to pass before it merges itself. Anything larger still waits for me. That's the same one-per-run discipline, applied to merge time instead of PR creation.

Everything else Vigil needs is just Cloudflare primitives wired together: D1 for issue state and agent logs, KV for cache, Queues for the pipeline, Workers AI for the models, Secrets Store for tokens, GitHub API for the PRs, Telegram for notifications. No dedicated server, no always-on process, no external services beyond GitHub and Telegram.

Two Loops, Same Rule

The two systems share the one-per-run constraint but otherwise make opposite choices.

Local loop. Runs inside Claude Code on my machine. Uses Claude as the model. Scoped to one repo. Scheduled via /loop 1h while the session is open. State lives in git plus throwaway files in /tmp. Every PR gets my eyes before it merges. Full context on hand: CLAUDE.md, feedback memory, the MCP servers Claude Code is already wired into. Cost: covered by my Claude Code subscription.

Vigil. Runs on Cloudflare Workers. Uses Workers AI (Gemma 4 for the heavy agents, Qwen3 for the fast ones) routed through AI Gateway. Monitors multiple sites. Scheduled via Cloudflare Cron, always on whether my laptop is or not. State lives in D1, KV, and Queues. A reviewer agent checks the fix before it gets to me. Context is whatever the cron-triggered check output surfaces, nothing more. Cost: Workers AI per-token.

The local loop wins when the decision requires judgement I'd struggle to write into a prompt. Design polish, copy that captures my tone, anything where "looks right" matters. Claude inside Claude Code has access to the same context I do, including the repo's CLAUDE.md and feedback memory, so its taste lines up with mine.

Vigil wins when the decision is mechanical and when uptime matters. A 500 at 3am needs someone looking at it. Security headers drifting after a library upgrade need catching. A broken external link on a blog post from 2026-03-16 needs finding on a crawl, not by me remembering to look. Vigil does these on cron while I sleep.

They overlap more than they compete. The local loop shipped the cross-origin opener policy header. Vigil shipped the broken-link alerts that got me to fix three outbound references by hand. Same site, same kind of improvement, different tradeoffs on who was driving.

What They've Actually Shipped

A sample of real PRs from the last month on iammattl.com. Most of these are on the git log if you want to check them:

Stop sidebar typewriter from spamming screen readers every rotation
Use CreativeWork (not WebSite) for URL-less archived projects in JSON-LD
Dispatch cookie-consent-update on Decline too, not just Accept
Extract requireAuth() helper; stop redirecting to the dead /admin/login
Set Cross-Origin-Opener-Policy: same-origin for tab isolation
Advertise the resume PDF alternate in /resume's Link header
Emit X-Cache: HIT|MISS from withEdgeCache for observability
Surface recent blog posts on the 404 page
Emit wordCount + timeRequired on BlogPosting JSON-LD
Route /feed.xml through withEdgeCache for per-POP caching

Every one of these is a fix I would have meant to do and never got around to. Each one is small enough that reviewing it takes less time than writing the PR description. That ratio is the point. If a system like this ever produces a PR I don't want to merge, I've broken the rule that makes it work.

What I Got Wrong

The prompt drifted faster than I expected. Cloudflare adds a new API, I add a new binding, the prompt's Step 1 stops reflecting what the repo actually is. The first few runs after drift produce worse results than usual. The fix was making the prompt's Step 1f an explicit "CLAUDE.md drift check". If the doc is meaningfully stale and no higher-priority issue exists, update the doc as the improvement for that run. The loop maintains its own onboarding surface.

Verifying the issue actually exists matters more than I realised. Early runs would confidently fix things that were already correct, because the model inferred the problem from an old pattern in its training data rather than reading the current file. I added "Verify the issue ACTUALLY exists. Do NOT fix something already correct" to the rules as a hard line. It cut false-positive PRs significantly.

The reviewer in Vigil is what makes autonomous fixes tolerable. When I ran Vigil without the reviewer stage, the fixer would produce plausible-looking diffs that missed the actual issue. Adding a separate agent, with a different model, whose only job is to say whether the fix solves the problem, moved the failure mode from "wrong change merged" to "correct change delayed by a retry". The extra inference cost is worth it.

Auto-merge felt more dangerous than it turned out to be. I kept it off for months. When I finally turned it on with the 0.95-confidence and 2-file thresholds, nothing bad happened because the constraints are so tight that the only things it merges are truly boring. The lesson was that the scary part isn't auto-merge, it's auto-merge without a tight gate.

Where to Start

If you want to try this on your own site, start with the local loop. It's the lower-commitment version and the one that teaches you what to ask for.

Write a prompt in your repo at .claude/prompts/hourly-improvement.md. Lead with the one-per-run rule. Include a priority list so ranking is explicit. Add a "check recent PRs before proposing" step.
Run /loop 1h follow .claude/prompts/hourly-improvement.md exactly in Claude Code. Let it open a PR. Review it. If the PR is bad, the prompt needs work. Iterate on the prompt, not on the tool.
Once the PRs are consistently mergeable, graduate to scheduled tasks so the loop runs even when you're not in the session.

Build the remote version only when you're running out of laptop time. Vigil makes sense because I have two sites, I travel, and I want coverage at times when I'm not working. For a single site you own and actively develop, the local loop is enough.

Both systems live on one idea. Pick one thing. Fix it. Ship it. Do it again in an hour.

Cloudflare Connect London 2026: The Agentic Internet is Here

Wed, 15 Apr 2026 22:00:00 GMT

I spent today at Cloudflare Connect in London. The Brewery, 60+ speakers, sold out. The event landed mid-way through Cloudflare's Agents Week, so the product announcements from the past few days were fresh and the demos were live. The theme across every session was the same. The internet is being rebuilt around agents.

Agents. Autonomous software that acts on your behalf, runs its own compute, pays for services, talks to other agents. Cloudflare is betting their platform on this shift. After a full day of sessions, I think the bet is right.

The Keynote: Welcome to the Agentic Internet

The opening keynote framed everything that followed. The internet we built was designed for humans browsing pages. The next internet is designed for agents acting on behalf of humans. One-to-one instead of one-to-many. Each agent is a unique instance, serving one user, running one task.

The infrastructure implications are massive. If every knowledge worker has agents running tasks for them, you're looking at hundreds of millions of concurrent compute instances. Traditional containers don't scale to that. Cloudflare's answer is isolates and their new Sandbox environments. Lightweight, persistent, secure. They sleep when idle and wake on demand. Agents only pay for active CPU time, not for sitting around waiting on an LLM response.

Sandboxes hit general availability alongside Cloudflare Containers. Agents get their own persistent compute with file systems, terminal access, git, dev servers. Credentials are injected at the network layer so agents never see raw secrets. It's a full development environment that an AI can operate independently.

The other announcement worth noting was the x402 Foundation. It's a standard for agents to pay for services they consume. Right now, agents can browse the web and call APIs, but there's no native way for them to transact. x402 is building that payment layer. Not exciting on its own, but it matters once agents start operating at scale.

Fast Path to AI: Securely Adopting Models and Agents

This session covered the security side of the agent transition. The pitch was practical. Companies want to use AI models and deploy agents, but the security surface is genuinely new. Prompt injection, data exfiltration through tool calls, agents accessing systems they shouldn't.

Cloudflare's approach layers AI Gateway in front of everything. Unified logging, rate limiting, caching, content filtering across multiple model providers. The argument is that you can't retrofit security onto agents the way you did with web apps. It needs to be embedded from the start. Access controls, identity, and authorization baked into the execution model.

As someone running Workers AI on my own site for blog drafts and image generation, this resonated. I have a Cloudflare Access JWT check protecting my admin routes. That pattern of auth-at-the-edge is exactly what they're proposing for agents, just at a much larger scale.

I was convinced enough to act on it during the session. I have a TrueNAS MCP server that gives Claude direct access to my NAS. 22 tools, container deployments, storage management. It runs locally over stdio. I've wanted to make it remote for a while, accessible from anywhere, but the security story wasn't there. An MCP server that can deploy containers to your NAS shouldn't be exposed to the internet without serious access controls.

MCP Server Portals solve exactly this. They aggregate MCP servers behind a single Zero Trust endpoint. Identity provider authentication, device posture checks, per-tool access policies, full audit logging. You register your MCP server, attach it to a portal, and users connect through one URL protected by Cloudflare Access. The portal even collapses tools into a single code execution mode so the AI client sees a cleaner interface.

The other announcement that caught my attention was Cloudflare Mesh. It's private networking for users, devices, and agents. Think Tailscale, but built into Cloudflare's network. You run a lightweight connector on your server, it gets a private Mesh IP, and any device or Worker on your Mesh can reach it. No port forwarding. No public exposure. Traffic routes through Cloudflare's edge across 330+ cities, so NAT traversal just works.

This matters for me because my TrueNAS sits on my internal network. I don't want to expose it to the internet. Cloudflare Tunnels could do it, but Mesh is bidirectional and many-to-many instead of one-directional. Run a Mesh node on the NAS, connect my devices and Workers, and the MCP server becomes reachable over a private IP with full Zero Trust policy enforcement. The free tier covers 50 nodes and 50 users.

Mesh plus MCP Server Portals gets me from "local MCP server on my home network" to "remote MCP server accessible from anywhere, secured by Zero Trust, without exposing a single port."

I had my laptop open through most of the sessions, exploring the features as they were being discussed and working out how they'd fit into my own infrastructure. The AI Controls integration for my TrueNAS server was pushed before the session ended.

Ditching the Mac Mini: Moltworker and OpenClaw

This was the most entertaining talk of the day. OpenClaw (formerly Moltbot, formerly Clawdbot) is a self-hosted personal AI agent. It connects Claude to your files, APIs, and messaging platforms. The original deployment model was a Mac mini sitting under someone's desk. Always on, always running, always your problem when it crashed at 3am.

Moltworker is the Cloudflare Workers port. Sid Chatterjee walked through the retrospective of packaging OpenClaw to run in a Sandbox container on Cloudflare's network. No hardware. No maintenance. It sleeps when idle, wakes on request, and runs across 300+ data centers. The talk covered the full migration story. Every pain point of self-hosting a persistent AI agent, power outages, OS updates that break things, the Mac mini's fan noise, all gone.

The Moltworker repo is open source. Cloudflare published it with the Sandbox SDK built in. R2 storage for persistence across container restarts. It's a reference implementation for how to deploy a personal agent on their platform.

The talk was honest about the engineering challenges. Adapting an agent designed for a persistent local machine to a sleep/wake serverless model wasn't straightforward. But the result is a self-hosted AI agent that you deploy with a single command and stop thinking about.

I had a sandbox running by the end of the talk. Forked the repo, configured it, deployed. I'd been running OpenClaw on an old laptop to tinker with. The Sandbox deployment replaced that entirely. No hardware to keep running, no laptop lid that needs to stay open.

Kilo Code

John Fawcett from Kilo Code presented in one of Ade Oshineye's lightning sessions. It's an open-source AI coding agent. VS Code, JetBrains, CLI. Over 2 million users, 500+ model options, and an orchestrator mode that coordinates planner, coder, and debugger agents on complex tasks. It forked from Cline and Roo Code, raised $8 million, and recently launched KiloClaw, a hosted version that runs coding tasks in the cloud without tying up your local machine. The talk focused on orchestrating tens of coding agents per developer using Cloudflare Containers and Sandboxes.

The orchestrator concept is familiar. I built something similar with autonomous-coder, a multi-agent system that coordinates 7 specialised agents (frontend, backend, design, QA, DevOps, docs, research) with dependency graphs, heartbeats, and checkpoint recovery. 2,757 lines of coordination code. The hard lesson from building it: the coordination layer is more work than the agents themselves. Kilo Code is productising that same pattern. Break a complex task into subtasks, assign each to a specialised agent, coordinate the results. The difference is they've packaged it into something 2 million people can use without writing the orchestration from scratch.

Media Industry Meetup: AI Crawl Control, Security and Monetization

This session was aimed at the publishing industry. Cloudflare now blocks all AI crawlers by default on new websites. That's the baseline. From there, publishers get three options for each crawler: allow free access, charge per request, or block entirely.

Pay-per-crawl is the interesting middle ground. It uses the HTTP 402 status code (the same standard the x402 Foundation is building on) to let publishers set a flat per-request price. The crawler authenticates, pays, gets the content. Cloudflare handles billing, aggregation, and distribution. Publishers like DMGT, Associated Press, and Conde Nast are already on board.

The room was mostly media and publishing people, not developers. The questions were practical. How do you price access? How do you differentiate between a crawler training a model and one fetching a snippet for a citation? What happens when agents replace the traffic that ad revenue depends on?

Nobody had clean answers. But the questions are the right ones. The agentic internet needs economic infrastructure as much as it needs compute and security. Pricing models for agent access to content don't exist yet. They're being figured out now, and the publishers in that room were trying to work out if the numbers would add up.

What I Took Away

I run my site on Cloudflare Workers. Blog data in D1, images in R2, AI features powered by Workers AI. I've been on this platform for over a year. What struck me at Connect wasn't any single announcement. It was the coherence of the vision.

Every product slots into the agent story. Workers for compute, Durable Objects for state, Sandboxes for persistent environments, AI Gateway for security, R2 and D1 for storage. It's not a pivot. It's a logical extension of what they already built. The serverless edge platform designed for web applications turns out to be what AI agents need too.

Right now, agents still browse websites and fill in forms because that's the interface that exists. The agentic internet means building the native ones. MCP servers instead of screen scraping. Agent-to-agent authentication instead of OAuth flows designed for humans. Programmatic payments instead of checkout pages.

I left The Brewery thinking about what this means for the tools I use daily. My MCP servers already give Claude access to Cloudflare, GitHub, and my NAS. The Sandbox model could let those agents run persistently instead of dying when I close the terminal. Mesh and MCP Server Portals give me a path to making my TrueNAS MCP server remotely accessible without exposing my home network.

The infrastructure is shipping faster than I can explore it. Every session introduced something I wanted to try, and I ran out of day before I ran out of ideas. The agentic internet isn't a concept deck anymore. It's live, and the challenge now is finding the time to build on it.

Agents: When Claude Works Autonomously

Mon, 13 Apr 2026 08:18:00 GMT

The last post covered MCP servers. Giving Claude direct access to your infrastructure. Your NAS, your databases, your running services. Skills tell Claude how to do things. MCP servers give Claude access to the systems where things happen.

But in both cases, you're still driving. You ask a question, Claude answers. You give an instruction, Claude executes. Every step goes through you.

Agents change that. You define a task and a set of boundaries. Claude figures out the steps, delegates work, runs things in parallel, and comes back with results. The shift isn't from manual to automatic. It's from directing every action to defining the scope and letting execution happen within it.

This is the final post in the series. It's also where the other pieces come together. Skills define methodology. MCP servers provide access. Agents use both to work independently.

What an Agent Actually Is

In Claude Code, an agent is a scoped instance of Claude that handles one part of a larger task. You'll see them called subagents. The idea is straightforward: instead of one Claude doing everything in sequence, you spin up focused instances that each handle a specific job.

Each subagent gets its own context window, its own tool access, its own area of focus. This matters more than it sounds. Context windows are finite. A subagent that only thinks about schema validation doesn't waste tokens on performance metrics or content analysis. It does one thing, and it does it with full attention.

An agent definition is a markdown file with frontmatter. A name, a description, the tools it's allowed to use. Below that, its instructions. Same format as a skill, but the intent is different. A skill tells Claude how to do something. An agent tells Claude to go do it.

---
name: seo-technical
description: Technical SEO specialist. Analyzes crawlability,
  indexability, security, URL structure, mobile optimization,
  Core Web Vitals, and JavaScript rendering.
tools: [Read, Bash, Write, Glob, Grep]
---

The description serves the same purpose as a skill trigger. It tells the orchestrating agent what this subagent is good at, so it knows when to delegate.

Subagents in Practice

The clearest example I have is the SEO audit from the plugins post. I mentioned it there but didn't go deep. Here's what actually happens.

When I run /seo audit on a URL, the orchestrator skill spawns 6 subagents in parallel:

Technical analyses crawlability, indexability, security headers, URL structure, mobile optimisation, Core Web Vitals
Content evaluates E-E-A-T signals, readability, content depth, thin content detection
Schema detects and validates structured data, generates missing markup
Sitemap validates XML sitemaps, checks URL coverage, identifies gaps
Performance measures Core Web Vitals, analyses page load waterfall
Visual takes screenshots at desktop and mobile breakpoints, checks above-the-fold content

Each runs independently. They don't share context. They don't wait for each other. The orchestrator waits for all 6 to complete, then synthesises the results into a scored report with a prioritised action plan.

What would take an hour of sequential analysis finishes in minutes. Not because any individual check is faster, but because they all run at the same time. The parallelism is the point.

The other benefit is less obvious. Each subagent is a specialist. The schema agent knows schema types, validation rules, and Google's current requirements. It doesn't need to know about robots.txt parsing or content readability scores. Narrower focus means better results on each individual check.

Agent Libraries: Context Without Repetition

Subagents solve the parallelism problem. Agent libraries solve the knowledge problem.

I have a homelab. TrueNAS server, n8n for workflow automation, Docker containers, Nginx Proxy Manager for routing. Multiple projects deploy to it. This site, a price comparison tool, a YouTube automation pipeline, a file converter app. Every project needs the same infrastructure knowledge. IP addresses, deployment procedures, Docker conventions, n8n API patterns.

The obvious approach is to put all of that in each project's CLAUDE.md. It works, but it duplicates everything. Update the n8n API endpoint? Change it in 5 files. Add a new deployment convention? Same story. And every project loads context it doesn't need for the current task.

So I built a homelab agent library. It's a layered context system with four levels:

Global loads every time. The infrastructure map. IP addresses, service endpoints, network layout, conventions that apply everywhere.

Technology loads when working with specific tools. There's an n8n layer with API patterns, workflow design rules, and known gotchas. A Docker layer with container management patterns. Each one only loads when relevant.

Purpose loads for specific activities. The deployment layer knows how to get a service from a local Docker Compose file to a running container on TrueNAS. It doesn't load when you're just editing code.

Project loads for specific codebases. The iammattl layer knows this site runs on Cloudflare Workers. The techpartprices layer knows its deployment target is different. Project-specific context without polluting the global scope.

layers:
  - path: layers/global
    scope: always
  - path: layers/n8n
    scope: technology
  - path: layers/docker
    scope: technology
  - path: layers/deployment
    scope: purpose
  - path: layers/projects/iammattl
    scope: project

Alongside the layers, there are skills and rules. Skills are reusable procedures. deploy-container knows the exact steps: validate the compose file, transfer to TrueNAS, build and start, verify the health check, optionally set up the reverse proxy. Rules are hard constraints. deployment-safety.md defines what's not allowed regardless of which agent runs. docker-wsl.md captures a specific gotcha about Docker credential helpers in WSL2.

The compound effect is that new projects get deployment knowledge without duplicating anything. I add a project layer with the specifics, and the existing infrastructure knowledge is already there.

Multi-Agent Orchestration

The first two examples are practical and approachable. This one is the far end of the spectrum. Most people won't need it. But it shows where the model goes when you push it.

I forked and extended a multi-agent orchestration system called autonomous-coder. It coordinates 7 specialised agents: frontend, backend, design, QA, DevOps, documentation, and research. Given a set of tasks with dependencies, it figures out what can run in parallel and executes them simultaneously.

The process works like this:

Plan. Analyse task dependencies. Build a dependency graph. Group tasks into levels where everything in a level can run concurrently.
Spawn. Each task gets assigned to a specialised agent based on its type. Agents launch as separate OS processes. True parallelism, not async.
Coordinate. A state manager handles inter-agent communication through file-based IPC with proper locking. Each agent sends heartbeats every 10 seconds. If a heartbeat stops for 60 seconds, the coordinator flags it as crashed.
Verify. Design tasks hit a quality gate. The system checks for screenshots at desktop and mobile breakpoints. No screenshots, no pass. It auto-creates blocker tasks with Playwright instructions if the verification is missing.
Recover. Checkpoints save progress. If an agent crashes mid-task, work resumes from the last checkpoint instead of restarting from scratch.

The result is 2-3x speedup on multi-component tasks. 13 Python modules, 2,757 lines of coordination code. The agents themselves are the simple part. The coordination is where the complexity lives.

I'm being specific about the numbers because they tell the real story. The orchestrator, the state manager, the heartbeat monitor, the recovery system. That's more code than the agents do actual work. If you're thinking about building something like this, know that the hard problem isn't giving Claude tasks. It's managing what happens when multiple Claude instances work on the same codebase at the same time.

The Trust Question

This is the part people actually want to talk about. How much can you trust an agent to work unsupervised?

The honest answer: it depends entirely on the task.

Where agents work well:

Well-scoped tasks with clear success criteria. "Analyse this page for SEO issues and score it" has a defined output.
Repetitive work across a known pattern. Deploying containers, running audits, generating boilerplate.
Parallel analysis where each piece is independent. The SEO audit works because the 6 subagents don't need to coordinate.
Anything where the methodology is fully defined and the cost of being wrong is low.

Where they don't:

Ambiguous requirements. If you can't define the success criteria, an agent can't either.
Novel architecture decisions. Agents are good at following established patterns, not inventing new ones.
High-stakes operations with slow feedback loops. An agent that deploys to production needs more guardrails than one that reads logs.
Tasks that require cross-agent coordination on shared state. This is technically solvable (autonomous-coder does it) but the overhead is significant.

The practical rule I use: if I'd hand the task to a competent developer with clear written instructions, an agent can probably handle it. If the task needs judgement that comes from experience and context I can't easily write down, I stay in the loop.

Guardrails matter more than capability. The TrueNAS MCP server from the last post blocks privileged containers and dangerous mounts by default. That's a guardrail baked into the infrastructure, not into a prompt. When an agent has deployment access, the constraints need to live in the system, not in the instructions. Instructions get ignored under edge cases. System-level constraints don't.

Trust builds incrementally. Start with read-only agents. Things that analyse, report, and suggest but don't modify anything. Once you're confident in the analysis, graduate to write access. Then to autonomous execution. Same way you'd onboard a new team member. You don't hand someone production access on day one.

What I Got Wrong

Too many concurrent agents hit limits faster than expected. Six subagents running simultaneously means six context windows, six sets of tool calls, six streams of output. The resource consumption scales linearly but the coordination overhead scales worse than that. I've learned to be deliberate about how many agents run at once rather than parallelising everything because I can.

Overly broad agent definitions produce mediocre work. Same lesson as skills. An agent defined as "handle all frontend tasks" makes worse decisions than one defined as "analyse CSS specificity issues and propose fixes." Narrower scope, better results.

Autonomous doesn't mean unchecked. The visual verification gate in autonomous-coder exists because I shipped broken UI without it. The agent finished the task, reported success, and the layout was wrong. Now design tasks don't pass without screenshots proving the output looks right. Every quality gate I've added was in response to something going wrong.

Coordination is harder than execution. The state management, heartbeats, and recovery system in autonomous-coder account for more code than the agents themselves. If your agents need to share state or depend on each other's output, expect the coordination layer to be the bulk of the work.

Where to Start

If you've followed this series and have skills and MCP servers set up, adding agents is the natural next step.

Start with a subagent in an existing skill. Take something that runs sequentially and parallelise one piece. If your deployment skill checks three things in sequence and they're independent, make them three subagents.

Start read-only. An agent that analyses but doesn't modify is low risk and immediately useful. Let it prove itself before you give it write access.

Define boundaries before capabilities. What the agent can't do matters more than what it can. Blocked operations, restricted file paths, required verification steps. Set these first.

Build a context library when you see duplication. If you're copying the same infrastructure context into multiple CLAUDE.md files, extract it into a shared library. The layered loading means agents only get the context they need.

The Series Arc

Five posts. One progression.

Getting started. Claude as a conversation partner. Give it good inputs, get better outputs.

Building projects. Claude as a daily tool. CLAUDE.md as institutional memory. The workflow that makes it reliable.

Skills and plugins. Claude remembers how to do things. Package expertise so it runs the same way every time.

MCP servers. Claude connects to real infrastructure. Your databases, your servers, your running services. Access without tab switching.

Agents. Claude works autonomously within your boundaries. Delegation, parallelism, and knowing when to step back.

Each layer compounds on the last. Skills are more useful with MCP access. Agents are more useful with skills and MCP servers combined. The whole stack works because each piece does one thing and they compose naturally.

The goal was never full autonomy. It's the right amount of autonomy for the task at hand. Sometimes that's a chat message. Sometimes it's 6 agents running in parallel across your infrastructure. The skill isn't in building the most autonomous system possible. It's in knowing which level of autonomy the current task actually needs.

MCP Servers: Connecting Claude to Real Infrastructure

Tue, 07 Apr 2026 07:59:00 GMT

The last post covered skills and plugins. Reusable expertise that tells Claude how to do things. A skill knows the methodology. It knows the steps. But it can only work with what Claude can already see.

That's the codebase in front of it. Files, terminal output, whatever you paste into the conversation. Everything else, your infrastructure, your databases, your running services, lives behind a wall Claude can't reach.

MCP servers remove that wall.

What MCP Actually Is

MCP stands for Model Context Protocol. The name is more intimidating than the concept.

An MCP server is a small program that gives Claude access to an external system. It exposes tools. Claude calls those tools the same way it runs a shell command or reads a file. The difference is that the tool might query a database, take a browser screenshot, or deploy a container to your NAS.

Add a server, Claude gets new capabilities. Remove it, those capabilities go away. The interface is standardised. Any MCP server works with any MCP-compatible client. Claude Code, the desktop app, the web app, other AI tools that support the protocol.

You don't need to understand the protocol to use them. You install a server, point it at your infrastructure, and Claude starts using the tools it provides. The protocol handles the plumbing.

The Servers I Use

I have a handful of MCP servers that stay on permanently. Cloudflare, GitHub, Context7, and Chrome DevTools. The rest I enable when I need them and disable when I don't. Here's how I use the core ones, and a few of the situational servers that are worth mentioning.

Cloudflare — Where My Sites Run

This site runs on Cloudflare Workers. The blog data lives in D1. Images live in R2. The Cloudflare MCP server gives Claude direct access to all of it.

I can query the blog database mid-conversation. Check how many posts are published, pull content for review, verify a migration ran correctly. I can list Workers, check KV namespaces, manage R2 buckets. It's infrastructure management without leaving the terminal.

The moment this clicked for me was when I was writing the previous blog post. I needed to check the exact slug and status of a draft in D1. Instead of opening the Cloudflare dashboard, finding the database, writing a SQL query, I just asked. Claude queried D1, showed me the results, and we kept working. Trivial on its own. But those small context switches add up across a day of work.

GitHub — PR and Issue Management

GitHub is the other server I can't turn off. PR creation, issue management, code search across repos. It handles the full workflow. Create a branch, push changes, open a PR with a description, all from conversation. No tab switching to the GitHub UI for routine operations.

Context7 — Current Documentation on Demand

This one is deceptively important. Context7 fetches current library documentation in real time. When Claude is writing code that uses a specific library, it can pull the latest docs instead of relying on training data.

Training data has a cutoff. Libraries change. APIs get deprecated, new methods get added, configuration formats evolve. Without Context7, Claude sometimes generates code using outdated patterns. With it, Claude checks the current documentation first.

I use it constantly when working with Next.js, Cloudflare Workers, and Drizzle ORM. All three move fast. The difference between "this worked six months ago" and "this works now" matters when you're deploying to production.

Chrome DevTools — Live Browser Inspection

The Chrome DevTools MCP server connects Claude to a running browser. It can navigate pages, take screenshots, inspect the DOM, read console output, monitor network requests, and run Lighthouse audits.

For frontend work, this is the one that changes the workflow most. Instead of describing what you see on screen, Claude sees it directly. "The layout breaks on mobile" becomes Claude taking a screenshot, identifying the issue, fixing the CSS, and taking another screenshot to confirm. The feedback loop tightens from minutes to seconds.

I use it alongside the SEO skills from the last post. The visual analysis agent takes screenshots at desktop and mobile breakpoints. The performance agent runs Lighthouse. Having real browser data instead of guessing makes the analysis credible.

Enabled When Needed

The rest come and go depending on the task.

TrueNAS is the one I'm most proud of. I forked an existing server and heavily customised it. Rewrote authentication, added security validation that blocks privileged containers and dangerous mounts, built Docker Compose to TrueNAS Custom App conversion, added auto-reconnect for dead WebSocket connections. The fork has 22 tools, 165 tests, and 80% coverage. When I need to deploy a service to my NAS, I enable it. Claude reads a Docker Compose file, converts it to TrueNAS format, deploys it, and verifies it's running. One conversation instead of thirty minutes of tab switching.

n8n handles workflow automation. It runs on my homelab. The MCP server lets Claude create, test, and manage workflows conversationally. Faster than clicking through the node editor for anything beyond a simple two-step automation. The documentation server alongside it means Claude pulls current n8n docs instead of guessing at node configurations.

Playwright provides browser automation for testing. Serena does semantic code navigation, understanding symbols and their relationships rather than just text search.

Building Your Own

If the system you need isn't covered by an existing server, you have two options. Fork one that's close and adapt it, or build from scratch. I'd check first. The TrueNAS server started as a fork. Most of my work was enhancing it to fit my setup, not writing MCP plumbing from zero.

The MCP SDK is available in Python and TypeScript. You define tools with names, descriptions, and parameters. Each tool is a function that does something and returns a result. The SDK handles the protocol, transport, and communication with the client.

@server.tool()
async def get_system_info() -> dict:
    """Get TrueNAS system information."""
    client = await get_client()
    info = await client.get_system_info()
    return {"hostname": info.hostname, "version": info.version, ...}

That's the shape of it. Define what the tool does, handle the API call, return structured data. The description matters because it's how Claude decides when to use the tool. Same principle as skill trigger descriptions from the last post.

The harder parts are authentication, error handling, and security validation. Exposing your NAS to an AI tool means thinking about what operations should be allowed. My server blocks privileged containers and dangerous filesystem mounts by default. That kind of guardrail belongs in the server, not in a prompt.

Test thoroughly. The mock mode in the TrueNAS server lets me run the full test suite without a live NAS connection. 165 tests might seem like overkill for an MCP server, but when the tool is managing your storage infrastructure, you want confidence it does what you expect.

If you build something useful, open source it. I published my TrueNAS fork because other homelabbers have the same needs. The original author gets contributions back, the ecosystem grows, and someone else doesn't have to solve the same problems from scratch.

What I Got Wrong

A few lessons from accumulating MCP servers.

Too many servers at once creates noise. Each server adds tools to Claude's context. A dozen servers can mean over 100 tools available in every conversation. Most of the time, you need three or four. The rest are consuming context for nothing. I'm more selective now about which servers are active globally versus enabled per project.

Security needs thought, not afterthoughts. An MCP server with write access to your NAS or your production database is powerful. It's also a risk if the tool descriptions are ambiguous or the guardrails are missing. Think about what you're exposing before you connect it. Least privilege applies here the same way it applies everywhere else.

Not all servers are equal quality. Some community servers are well-tested and maintained. Others are weekend projects with no error handling. Before connecting a server to anything important, read the code. Check the test coverage. Understand what it's doing with your credentials.

Server descriptions matter more than you'd expect. If the tool descriptions are vague, Claude either won't use the server when it should, or will use it when it shouldn't. Good descriptions include when to use the tool and what it returns. This is the same lesson as skill trigger descriptions. The description is the interface between Claude and the capability.

What's Next

Skills tell Claude how to do things. MCP servers give Claude access to the systems where things happen. But in both cases, you're still driving. You ask, Claude acts, you review, you ask again.

The next step is agents. Claude working autonomously. Spawning subagents that run in parallel, delegating tasks, making decisions within defined boundaries. That's the final post in this series.

Plugins and Skills: Making Claude Work Your Way

Mon, 30 Mar 2026 07:00:00 GMT

In the last post, I walked through how to get more out of Claude. Give it real source material, iterate until it sounds like you, extract your tone of voice, build up context so each conversation isn't starting from scratch.

That post ended with "build up context over time." Claude now does this automatically. It has a persistent memory system that learns your preferences, your role, how you like to work, and carries that across conversations. Combined with CLAUDE.md files that give it project-specific context, Claude remembers who you are and what you're working on.

But there's still a gap. Memory knows your preferences. CLAUDE.md knows your codebase. Neither one captures how you want things done. The methodology. The steps you follow when reviewing a PR, auditing a page for SEO issues, or writing a commit message. You've figured out what works through trial and error. You're still explaining the process manually every time.

Skills and plugins fill that gap. They're reusable expertise that travels with you across projects. The difference between "let me explain how I want this done" and "just do it the way we agreed."

If you've ever created a shell alias, a code snippet, or an editor macro, you already understand the principle. This is the same thing, applied to AI.

The Landscape

Before getting into the details, here's the mental model. Three layers of context, then the automation on top.

Memory is personal context. Claude learns your preferences, your role, your working style and carries it across conversations automatically. This is the "build up context over time" from the getting started post, but built in.

CLAUDE.md is project context. It lives in your repo and tells Claude how this specific codebase works. The architecture, the gotchas, the rules. Covered in a previous post.

Skills are reusable playbooks. A skill defines a methodology or workflow that works across any project. Think "how to run an SEO audit" rather than "how this repo is structured."

Slash commands are the user-facing shortcuts. Type /commit or /seo audit and a full workflow runs. The interface layer on top of skills.

Hooks are guardrails. They run automatically before or after Claude takes an action. You don't invoke them. They fire when something happens and prevent mistakes before they land.

Plugins bundle everything together. Skills, commands, hooks, and agents in a single installable package. Like a VS Code extension, but for Claude.

Where They Live

All three surfaces support these concepts now. The web app (claude.ai), the desktop app, and Claude Code in the terminal.

On the web and desktop, skills are installed through Settings as ZIP packages. Connectors give you one-click access to external services like GitHub, Slack, and Google Drive. The Cowork plugin marketplace has pre-built plugins for specific workflows. Slash commands come bundled with installed skills and plugins.

In Claude Code, everything is file-driven. Skills are markdown files in directories. MCP servers are configured in settings.json. Slash commands are .md files you write yourself. Hooks give you full lifecycle automation. It's more hands-on, but the tradeoff is complete control over how everything works.

The underlying format is the same. A skill is a SKILL.md file whether you're uploading it through the GUI or placing it in a directory. The why is identical across surfaces. Package your expertise so Claude does it consistently.

One thing worth knowing. Skills created in Claude Code don't automatically appear in the desktop or web app, and vice versa. But Claude Code's remote control feature lets you connect the CLI to the desktop app, giving you access to your CLI skills and tools from the desktop interface.

Skills in Practice

The best way to explain skills is to show one. I use an SEO analysis suite that has 13 specialised skills and 6 subagents. I'll walk through three levels of complexity so you can see where to start and where it can go.

Level 1: A Focused Skill

Here's seo-schema, the simplest skill in the suite. It detects, validates, and generates Schema.org structured data for any page.

---
name: seo-schema
description: >
  Detect, validate, and generate Schema.org structured data.
  JSON-LD format preferred. Use when user says "schema",
  "structured data", "rich results", "JSON-LD", or "markup".
---

That's the frontmatter. A name, and a description. The description is doing more work than it looks. This is how Claude decides whether to activate the skill. Those trigger phrases at the end ("schema", "structured data", "rich results") mean that when I type "check the structured data on this page", Claude automatically loads the skill and follows its methodology.

Get the description wrong and the skill never fires. Too broad and it fires when you don't want it. This is the single most important line in any skill file.

Below the frontmatter is plain markdown. Detection steps, validation rules, a list of current and deprecated schema types, output format. The skill tells Claude exactly how to approach the task, what to check, and how to present the results. It's the methodology I'd follow myself, written down once so I never have to explain it again.

One detail worth calling out. The skill references external files (references/schema-types.md) but explicitly says "do NOT load all at startup." Claude loads reference material on demand, only when it's actually needed. This is progressive disclosure. It keeps the context window clean when you have dozens of skills installed, and means Claude isn't burning tokens reading about deprecated schema types unless it's actually validating markup.

Level 2: The Orchestrator

Individual skills are useful on their own. But the real power shows up when they compose.

The main /seo skill acts as a router. It doesn't do analysis itself. It reads what you're asking for, detects the business type from homepage signals, and delegates to the right specialist.

/seo audit         → Full website audit with parallel delegation
/seo schema        → Schema detection and generation
/seo technical     → Technical SEO (8 categories)
/seo content       → E-E-A-T and content quality
/seo geo           → AI search optimisation
/seo plan         → Strategic SEO planning

Twelve sub-commands, each mapping to a specialist skill. One entry point, many capabilities.

The industry detection is a good example of why this works better as a skill than as a prompt you type each time. The orchestrator analyses homepage signals to classify the site. SaaS sites have pricing pages and free trial CTAs. Local businesses have phone numbers and service areas. E-commerce sites have product schemas and cart functionality. The analysis adjusts based on what it finds. That's logic that was refined over multiple iterations. Once it's in the skill, it runs the same way every time, whether you're auditing a SaaS product or a local plumber's website.

Level 3: Parallel Delegation

/seo audit takes it further. Instead of running checks sequentially, it spawns 6 subagents in parallel. Technical SEO, content quality, schema validation, sitemap analysis, performance measurement, and visual testing. Each subagent runs independently with its own tools and area of focus.

The output is a scored report (0-100), a prioritised action plan grouped by severity (Critical, High, Medium, Low), and desktop plus mobile screenshots. What would take an hour of sequential analysis finishes in minutes.

I won't go deep on agents here. That's a topic for its own post. The point is that skills can delegate to agents, and the skill defines when and how that delegation happens.

The File Structure

Here's what the full suite looks like on disk:

~/.claude/skills/
├── seo/                    # Main orchestrator
│   ├── SKILL.md            # Router + scoring methodology
│   ├── references/         # On-demand knowledge files
│   ├── schema/             # JSON-LD templates
│   └── scripts/            # Python analysis tools
├── seo-audit/SKILL.md      # Full audit workflow
├── seo-schema/SKILL.md     # Schema detection/generation
├── seo-technical/SKILL.md  # Technical SEO
├── seo-content/SKILL.md    # E-E-A-T analysis
├── seo-sitemap/SKILL.md    # Sitemap analysis
├── seo-images/SKILL.md     # Image optimisation
├── seo-geo/SKILL.md        # AI search optimisation
├── seo-plan/SKILL.md       # Strategic planning
├── seo-page/SKILL.md       # Single page analysis
├── seo-programmatic/SKILL.md
├── seo-competitor-pages/SKILL.md
└── seo-hreflang/SKILL.md

~/.claude/agents/
├── seo-technical.md        # 6 specialist subagents
├── seo-content.md          # for parallel audit
├── seo-schema.md           # delegation
├── seo-sitemap.md
├── seo-performance.md
└── seo-visual.md

13 skills, 6 agents, supporting references and scripts. It started as one file. Each new capability was just another SKILL.md following the same pattern.

The full suite is open source on GitHub if you want to see the complete implementation.

Slash Commands

Skills fire automatically when Claude detects relevance. Slash commands are the explicit version. You type /something and a workflow runs on demand.

In Claude Code, a command is a markdown file in .claude/commands/. It can accept arguments, reference files, and run shell commands. On the web and desktop, commands come bundled with installed skills and plugins.

The value is consistency. Take /commit as an example. Without it, committing is a multi-step conversation. "Stage these files, look at the recent commit style, write a message that matches, create the commit." With the command, it's one word. Same result every time.

The rule of thumb: if you do something more than three times and the steps are always the same, make it a command.

Hooks

Hooks are the part most people skip. They shouldn't.

A hook runs automatically in response to an event. You don't type anything. Claude is about to take an action, and your hook intercepts it. Five event types cover most cases:

PreToolUse fires before Claude runs a tool. Block dangerous commands before they execute.
PostToolUse fires after. Run linting after every file edit, for example.
Stop fires when Claude finishes a response. Good for notifications or cleanup.
SessionStart fires when a new conversation begins. Set up context automatically.
UserPromptSubmit fires when you send a message. Validate or transform input.

The practical pattern: every time Claude does something you didn't want, add a hook. Force-pushed to main? Add a PreToolUse hook that blocks git push --force on protected branches. Forgot to lint? Add a PostToolUse hook that runs the linter after file edits.

Over time, the guardrails build up. The mistakes stop repeating. It's the same principle as CLAUDE.md, but for actions instead of knowledge.

Hooks are currently a Claude Code feature. The web and desktop apps don't expose hook authoring directly, though plugins can include hook definitions.

Plugins

Once you have a few related skills, some commands, and a hook or two, you're looking at a plugin. A plugin bundles everything into a single installable package with a plugin.json manifest.

I use several. Chrome DevTools MCP for live browser debugging. The GitHub plugin for PR management. Serena for semantic code navigation. And my own SEO suite with 13 skills and 6 agents.

You don't need to start with a plugin. If it's just for you and it's a few files, loose skills work fine. Graduate to a plugin when you're sharing it, or when you have skills, commands, hooks, and agents that need to work together as a unit.

Where to Start

If you've read this far and want to try it, here's the progression I'd follow.

Start with one skill for a task you do weekly. Something focused, like a code review checklist or a deployment verification. Write the SKILL.md, get the trigger description right, and test it by chatting naturally to see if Claude picks it up.

Add a slash command for anything you do more than three times with the same steps. Commits, PR creation, test runs. One command, consistent output.

Add hooks reactively. Don't try to anticipate every failure. Wait until Claude does something you didn't want, then add the guardrail. They build up naturally.

Graduate to a plugin when you've got 3-4 related skills that belong together. Not before.

The temptation is to over-engineer early. Resist it. The first version of any skill should be embarrassingly simple. The SEO suite I use started as a single skill. It grew to 13 because each addition solved a real problem, not because someone planned it that way.

What I Got Wrong

A few lessons from working with these.

Skills that try to do too much don't work well. A skill that handles "all SEO tasks" is too vague for Claude to use effectively. Splitting it into focused specialists (schema, technical, content) made each one more reliable.

Trigger descriptions need tuning. Too broad and the skill fires on unrelated conversations. Too narrow and you have to invoke it manually every time. The sweet spot takes a few iterations.

Writing skills that duplicate CLAUDE.md is a waste. If the information is project-specific, it belongs in CLAUDE.md. Skills are for methodology that applies everywhere.

And the biggest one: treating the first version as the final version. Skills improve through use. Run them, notice what's missing, add it. The feedback loop is the point.

What's Next

Skills tell Claude how to do things. They encode methodology and expertise. But they can only work with what Claude can already access.

MCP servers change that equation. They give Claude direct access to external systems. Your NAS, your workflow automation, your live documentation. That's the next post.

Getting Started with Claude - Beyond the Chat Box

Mon, 23 Mar 2026 17:10:18 GMT

Most people I talk to about Claude use it the same way. Paste a question, get an answer, move on. It works, but it's barely scratching the surface. I keep getting asked how I get so much out of it, so I figured I'd write down the process I actually follow.

This isn't about building a full website or anything complex. It's about getting Claude to work with you instead of just for you.

Start with something real

Don't open Claude and ask it to "write me a blog post about leadership." That's how you get content that sounds like every other AI-generated post on LinkedIn. Instead, give it something to work with.

I start with the subject matter. An article I've read, a thread from a forum, notes from a conversation, even a voice memo transcript. The more context Claude has, the less it falls back on generic filler.

A prompt like "write a LinkedIn post about this article" with the article pasted in will get you a first draft that's already more specific than starting from nothing.

Make it sound like you

The first draft will sound like Claude. That's fine, it's a first draft. The real work is in the iteration.

Read through it and start pushing back. Tell Claude what to change and why. Things like:

"I wouldn't use the word 'innovative', swap it for something more specific"
"This reads like a press release, make it more conversational"
"I use shorter sentences than this"
"Drop the exclamation marks, I don't write like that"

Each correction teaches Claude something about how you communicate. After a few rounds, the output starts to sound less like AI and more like you.

Extract your tone of voice

Once you've got a piece of writing you're happy with, ask Claude to extract your tone of voice from it. Get it to describe how you write. Sentence length, vocabulary, structure, what you avoid.

This is the part most people skip, and it's probably the most valuable step. You end up with a reference document that captures how you actually communicate. Things like whether you use colons or semicolons, whether you lean on specific verbs, whether you quantify things or keep it vague.

Use that as a reference for everything you write after. Each new piece gets easier because Claude isn't starting from zero. It's starting from you.

Build up context over time

Claude doesn't remember your previous conversations by default. But you can fix that.

If you're using Claude Pro, you can save your tone of voice document and key preferences as project knowledge. Every new conversation in that project starts with that context already loaded.

If you're using Claude Code, the terminal-based version, it has a built-in memory system. I use a CLAUDE.md file in every project that acts as institutional memory. A plain markdown doc that tells Claude how the project works, what conventions to follow, and what to avoid. Each conversation picks up where the last one left off.

The pattern is the same either way. Stop treating each conversation as a blank slate.

The compound effect

The first post took me a while. Lots of back and forth, lots of corrections. The second one was noticeably faster. By the third or fourth, I was mostly just providing the subject matter and making minor tweaks.

That's the real payoff. Not any single conversation, but the fact that Claude gets better at being your writing partner the more you invest in teaching it. The same applies to code, documentation, planning. Anything where your specific style and preferences matter.

Where to go from here

If you're just getting started:

Pick one real thing to write about. Not a test, something you'd actually publish.
Give Claude the raw material. Articles, notes, whatever context you have.
Iterate on the output. Push back on anything that doesn't sound like you.
Extract your tone of voice. Save it somewhere you can reuse it.
Use it as a starting point next time. The gap between first draft and final version gets smaller every time.

That's it. No special tools required, no complex setup. Just a conversation with a bit more intention behind it.

What I've Learned Building Projects with Claude Code

Mon, 16 Mar 2026 14:44:31 GMT

I've been using Claude Code as my daily pair programmer for about a year now. Across multiple projects — Terraform modules managing 400+ Cloudflare zones, a full-stack price comparison platform with 4,800+ tests, automated workflows, a mobile file converter, and this website — it's become a core part of how I build software.

Here's what I've picked up along the way.

It Won't Replace Your Thinking

The biggest misconception is that you hand over a problem and get a solution back. It doesn't work like that. The better mental model is a pair programmer who's read every Stack Overflow answer but has never worked at your company.

Claude doesn't know your system's quirks. It doesn't know that your n8n Code node doesn't have fetch, or that TrueNAS will silently fail if you use update_custom_app instead of update_compose_config. You learn that through building — and then you teach it.

That's where the real workflow starts.

The CLAUDE.md File Changed Everything

Early on, I was repeating myself every conversation. "Don't use that API, it's deprecated." "Test coverage must stay above 80%." "The CI pipeline works like this." Every new chat started from zero.

So I started maintaining a CLAUDE.md file in each project — a plain markdown doc that acts as institutional memory. The rules, the gotchas, the architecture decisions, the hard-won lessons from previous sessions.

My Terraform project at work has a 500+ line CLAUDE.md. TechPartPrices has 860+ lines. They grow organically — every time I hit a wall and solve it, the fix goes into CLAUDE.md so I never hit it again.

The result — a single engineer delivering what would normally need a platform team. Not because AI wrote all the code, but because it remembered all the context.

It's Like Onboarding a Developer Every Morning

The best way I can describe it is onboarding a very skilled but brand-new hire — every single morning. They're talented, they learn fast, but they don't know your codebase yet.

CLAUDE.md is their onboarding doc. The better you write it, the faster they're productive. I structure mine with:

Project overview — what this is, how it's deployed
Commands — how to build, test, deploy
Architecture — key patterns and why they exist
Gotchas — the things that'll waste your afternoon if you don't know them
Rules — test coverage thresholds, commit conventions, things that must not break

Sounds like effort, but it pays for itself within a day.

Push Complexity Into Deterministic Code

Here's a pattern I learned the hard way — AI is roughly 90% accurate per step. Sounds great until you chain 5 steps together and you're at 59% accuracy. For 10 steps? 35%.

The solution is to push complexity out of the AI's decision-making and into deterministic code. I use a three-layer approach:

Directive layer — SOPs written in markdown that describe what should happen
Orchestration layer — Claude reads the directives and decides what to do next
Execution layer — Python scripts that do the actual work, reliably, every time

Claude orchestrates. Code executes. The AI decides what to do — the scripts guarantee how it's done.

Verify Visually Before Testing

For anything with a UI, I verify visually before writing tests. The instinct with AI-assisted development is to write the code, write the tests, and move on. The problem — you can have a green test suite and a screen that looks like it was built during a power cut.

My workflow for frontend tasks: implement → screenshot → verify it looks right → then write the tests. For backend work, it's the opposite — tests first. But for UI, the eyes come first.

Fetch the Docs Every Time

One rule I don't break — before implementing anything involving a library or framework, fetch the latest documentation first. I use Context7 for this. It pulls current docs so Claude isn't working from training data that might be months out of date.

This single habit has saved me from countless issues with outdated API patterns, deprecated methods, and missed features. Tailwind v4 changed how utility classes work under the hood. Without current docs, Claude would happily write Tailwind v3 patterns that silently break.

Extending Claude with MCP Servers, Plugins, Skills, and Agents

Out of the box, Claude Code is a capable pair programmer. The real shift came when I started wiring it into my actual infrastructure — so it's not just writing code, it's operating systems.

I run multiple plugins and MCP servers. Here's how they fit together.

MCP Servers — Connecting Claude to Your Infrastructure

MCP (Model Context Protocol) servers give Claude direct access to external tools and services. Instead of copying error messages back and forth, Claude can just look.

I built my own TrueNAS MCP server — forked an existing project and extended it to fit my homelab. It connects Claude to my NAS with 22 tools for checking app status, updating Docker Compose configs, managing ZFS snapshots, and monitoring storage. When I'm deploying docker containers, Claude doesn't need me to SSH in and paste logs. It reads them directly.

My n8n MCP server does the same for workflow automation. Claude can create, update, validate, and test n8n workflows without me touching the UI. The blog post topic curator for TechPartPrices — a 49-node workflow with a two-trigger state machine, Telegram integration, and DALL-E image generation — was built almost entirely through Claude talking to the n8n API.

Context7 fetches live documentation for any library, so Claude is never working from stale training data. Prevents more bugs than any linter.

Plugins — Specialised Capabilities

Plugins add focused tools for specific tasks. Chrome DevTools MCP lets Claude inspect live pages, take screenshots, and debug CSS issues directly in the browser — I used it today to diagnose a Tailwind v4 specificity issue. Playwright handles automated UI testing and visual verification. The GitHub plugin manages PRs and issues without leaving the terminal.

Serena gives Claude semantic code navigation — it can find symbols, trace references, and understand architecture without reading entire files. For large codebases, that's the difference between Claude being useful and Claude being lost.

Skills — Reusable Expertise

Skills are like playbooks. I've built a suite of 13 SEO skills that handle everything from technical audits to schema markup generation. Instead of explaining what an SEO audit involves every time, the skill encodes the methodology. Claude runs the audit, delegates to specialist sub-agents, and produces a scored report.

The pattern works for any domain expertise you find yourself repeating. Package it as a skill — it becomes a one-command operation.

Agents — Autonomous Problem Solving

Agents take this further. Instead of Claude doing one thing at a time, agents spin up specialised sub-processes that work in parallel. A frontend developer agent handles React and CSS. A software architect agent evaluates trade-offs. An SEO agent crawls pages and delegates to six specialist sub-agents simultaneously.

The key insight is delegation. When I run a full site audit, the main agent doesn't do all the work — it launches a technical SEO agent, a content quality agent, a performance agent, a schema agent, all running concurrently. What would take an hour of sequential analysis happens in minutes.

I've built custom agents for specific workflows too. TechPartPrices uses agents for code review, test generation, and deployment validation. Each agent has its own tools, its own system prompt, its own area of expertise — like having a small team, each focused on what they do best.

How It All Connects

A typical session — I ask Claude to deploy an update. It uses the n8n MCP to check the current workflow state, the TrueNAS MCP to update the Docker Compose config, Context7 to verify the latest API patterns, and Playwright to confirm the UI still works. No tab switching. No copy-pasting. Just a conversation that drives real infrastructure.

That's the shift — Claude stops being a chatbot and becomes an interface to your entire development environment.

The Projects That Proved It

TechPartPrices — An Amazon price tracker following 2,400+ products. Built with Next.js, Drizzle ORM, and Cloudflare D1. 4,800+ tests at 80%+ coverage, an n8n Telegram bot for admin, and an automated blog post topic curator. The CLAUDE.md file alone documents 18 n8n gotchas I discovered through trial and error.

This Website — The cyberpunk terminal UI, canvas pixelation effects, SVG cityscape, blog CMS with AI writing tools, and the deployment pipeline to Cloudflare Workers — all built with Claude Code from design to deployment.

What Doesn't Work

It's not all smooth.

It forgets. Every conversation starts fresh. Without CLAUDE.md, you're re-explaining everything. Institutional memory is your workaround for the lack of persistent context.

It's confidently wrong about edge cases. Especially newer APIs and platform-specific quirks. Claude will tell you with full confidence that an n8n node works a certain way — and it's just wrong. You learn to verify and document.

Chained reasoning degrades. The more steps in a chain, the less reliable the output. That's why the three-layer architecture exists — you don't ask AI to do 10 things in sequence. You ask it to decide the next thing, run deterministic code, then come back.

The Takeaway

Working with AI isn't about writing less code. It's about maintaining context, building institutional memory, and knowing when to let the AI think versus when to let code execute.

The engineers who'll get the most from these tools aren't the ones who type "build me an app." They're the ones who write solid CLAUDE.md files, verify before they trust, and treat the AI like what it is — a skilled colleague with amnesia.

After 20+ years of building for the web, the last year with Claude Code has been the most productive stretch of my career. Not because the AI did the work for me — but because it let me operate at a scale I couldn't reach alone.