Agent Readiness: Scoring an Agent-Friendly Site

11 May 2026 11 min read

#cloudflare#ai#agents#agent-readiness#web-standards

The last post mapped where the agentic internet is heading. Cloudflare's Connect event in London, 60+ speakers, the same theme on every slide. The internet is being rebuilt around agents.

This post is the practical follow-up. Cloudflare have shipped a scoring tool at isitagentready.com that measures how agent-friendly your site is across four dimensions. Discoverability, Content, Bot Access Control, Capabilities. I worked through each one in the weeks after Connect, then ran the score. It validated. It didn't drive.

So this isn't a write-up of the score. It's the inventory of what each category actually checks, what I shipped for it on iammattl.com, and what's still missing. Plus one category the rubric doesn't measure that I'd argue still matters.

What the Score Measures

Cloudflare scores four dimensions. From their post:

Discoverability. robots.txt, sitemap.xml, Link Headers (RFC 8288). Can an agent find the site's surfaces from the network signals alone?
Content. Markdown for Agents. Can the site serve a clean machine-readable representation, not just HTML?
Bot Access Control. Content Signals, AI bot rules in robots.txt, Web Bot Auth. Does the site take a position on who can read it and what they can do with it?
Capabilities. Agent Skills, API Catalog, OAuth server discovery (RFC 8414 & 9728), MCP Server Card, WebMCP. Can an agent discover what actions the site exposes and call them?

Commerce standards are checked separately and don't count toward the score yet.

The score lives at isitagentready.com and is also surfaced in Cloudflare's URL Scanner as an "Agent Readiness" tab. Run a URL through either, and you get a per-category breakdown.

Here's the inventory.

Discoverability

The HTTP-and-text layer. Three things the score looks at, all served by files agents can fetch directly without scraping pages.

robots.txt is the obvious one. /robots.txt on iammattl declares wildcard Allow: / and Disallow: /admin/, with Sitemap: pointing at /sitemap.xml. Standard shape. Score expects it. The Bot Access Control category re-reads the same file for different signals, so robots.txt earns its points in two categories.

sitemap.xml lists every indexable URL with a real <lastmod> value. The dates are sourced from the same constant that the JSON-LD dateModified, the HTTP Last-Modified header, and the markdown frontmatter all read. Five places, one source of truth. Five hand-maintained fields would drift inside a week.

Link Headers (RFC 8288) are the surprise. Every HTML response from iammattl carries a multi-relation Link header that points at every other machine-readable surface:

Link: </sitemap.xml>; rel="sitemap"; type="application/xml",
      </feed.xml>; rel="alternate"; type="application/rss+xml"; title="Blog RSS feed",
      </llms.txt>; rel="llms"; type="text/plain",
      </.well-known/api-catalog>; rel="api-catalog"; type="application/linkset+json",
      </.well-known/agent-skills/index.json>; rel="agent-skills"; type="application/json"

An agent that fetched any HTML page now knows where the sitemap, feed, llms.txt, API catalog, and agent skills live, without parsing the page body. RFC 8288 is the formal definition of HTTP Web Linking. The score weights this because it's the most efficient way to publish "here's everything else on the site" without an agent having to guess paths.

The site also serves /llms.txt, which isn't in the Discoverability rubric specifically but functions as the discoverable hub the Link header points to. Site name, one-paragraph description, grouped page links, links to every machine-readable surface, contact, content policy. 35 lines. Ten minutes of work the first time you write one.

Content

This category measures one thing: Markdown for Agents. Whether the site can serve its content as markdown instead of HTML, for agents that want the body without the chrome.

Every HTML page on iammattl has a markdown representation served via content negotiation. The same URL, with Accept: text/markdown in the request, returns the page body as markdown instead of HTML:

$ curl -s -H "Accept: text/markdown" https://iammattl.com/about | head -10
---
title: "About"
url: https://iammattl.com/about
updated: 2026-04-22
---

# About

Matt Lambert is a Cloud Engineer with...

Vary: Accept is set so caches don't mix representations. An agent fetching the URL with the markdown Accept header gets the markdown variant cached separately from the HTML variant.

The body opens with YAML frontmatter. Three fields. title, url (the canonical), and updated (YYYY-MM-DD of the last content change). One parse and an agent has structured metadata, the canonical URL, and the body. No CSS, no navigation, no analytics tags to skip.

The route is wired for conditional GET. A weak ETag derived from the body lets If-None-Match short-circuit to a 304 Not Modified. An agent polling for changes gets a 304 with no body if nothing changed since its last fetch.

There's also a parallel /md/{path} route that serves the same markdown at an explicit URL, for agents that don't do content negotiation. X-Robots-Tag: noindex on those URLs because they're alternate representations of pages already indexed under their HTML URLs, not separate documents. Each /md/* route declares a canonical pointing at its HTML twin, so a search engine that finds both knows which one to rank.

The blog index at /md/blog is a list of posts as date, title, link, tag block. Parsing it is one regex. Index plus a typical post is around 5 KB total over the wire. Long-form posts push that to 18 KB.

Why bother? Because the alternative is forcing every agent to run an HTML-to-markdown converter against your pages. The converter strips noise badly. The agent sees a mess. The site can fix that in one place. Half a day of work, every downstream agent benefits.

Bot Access Control

This is where you take a position on who can read the site and what they can do with the content. Three things the score checks: Content Signals, AI bot rules in robots.txt, and Web Bot Auth.

Content Signals are declared via contentsignals.org syntax in robots.txt. Three flags: search=yes, ai-train=no, ai-input=yes. Translation: search engines can index, agents can use the content to answer live user queries, training corpora can't have it.

That's a position. A small one, but explicit. Most sites are silent on this, and silence reads differently to different bots. Some treat it as permission. Some treat it as refusal. A declared policy makes the agreement explicit, and you can change it later without anyone being surprised.

AI bot rules sit below the wildcard rule. 31 named AI bots, each with the same Content-Signal:

User-agent: *
Content-Signal: search=yes, ai-train=no, ai-input=yes
Allow: /
Disallow: /admin/

User-agent: GPTBot
Content-Signal: search=yes, ai-train=no, ai-input=yes
Allow: /
Disallow: /admin/

The full list, drawn from the user-agents isitagentready.com itself recognises, covers GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-Web, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Google-Extended, GoogleOther, Applebot-Extended, Bytespider, CCBot, Meta-ExternalAgent, anthropic-ai, YouBot, DuckAssistBot, and the rest. A wildcard would technically be enough. Explicit is auditable.

Web Bot Auth is the gap. It's a Cloudflare protocol for cryptographically verified bot identity. Cloudflare handles the verification at the edge. A site doesn't implement Web Bot Auth in its own code, but its Bot Management settings determine whether verified bots are recognised. I haven't tuned this on iammattl yet. The score docks the Bot Access Control category accordingly. Honest gap.

Be honest about the limit on the rest of the layer, too. Robots.txt is the declared contract. Not every bot honours it. A small AI scraper that ignores the file isn't going to volunteer that fact. Treat the policy as the public statement of intent, and the actual enforcement as Cloudflare Bot Management's job.

Capabilities

The Capabilities category is the largest. Five items: Agent Skills, API Catalog, OAuth server discovery, MCP Server Card, and WebMCP. iammattl ships three.

Agent Skills are declared at /.well-known/agent-skills/index.json, following Cloudflare's agent-skills-discovery-rfc v0.2.0. The index lists each skill, its type, a description, the URL of its SKILL.md body, and a SHA-256 of the body for integrity. Each SKILL.md is a separate document an agent can fetch independently.

{
	"$schema": "https://raw.githubusercontent.com/cloudflare/agent-skills-discovery-rfc/main/schemas/v0.2.0/index.json",
	"version": "0.2.0",
	"skills": [
		{
			"name": "discover-blog",
			"type": "data",
			"description": "How to find and fetch blog posts on iammattl.com",
			"url": "https://iammattl.com/.well-known/agent-skills/discover-blog/SKILL.md",
			"sha256": "..."
		}
	]
}

API Catalog at /.well-known/api-catalog is an RFC 9727 linkset. Different shape from llms.txt. Where llms.txt groups by section ("Pages", "Machine-readable", "Contact"), the linkset groups by link relation type. Each anchor declares its service-desc link with media type and title. Useful for agents that prefer link-relation traversal over prose parsing.

WebMCP (W3C draft) is the active layer. A site registers tools via navigator.modelContext.provideContext({ tools }). An agent running in or alongside the browser can call those tools the same way it'd call MCP tools on a server. iammattl registers a small set: list recent blog posts, fetch a post by slug, look up the agent-skills index. Each tool is backed by the same /md/blog endpoints the rest of the discovery surface serves, so the WebMCP tools and the rest of the agent-readable site always agree.

{
  name: 'list_recent_posts',
  description: 'List the most recent blog posts published on iammattl.com',
  inputSchema: {
    type: 'object',
    properties: { limit: { type: 'number', default: 10, maximum: 50 } },
    additionalProperties: false,
  },
  execute: async ({ limit }) => {
    const md = await fetch('https://iammattl.com/md/blog').then((r) => r.text());
    return parseBlogIndex(md).slice(0, limit);
  },
}

Input validation matters. The fetch-by-slug tool runs every input through a regex that matches the site's slugify output. Lowercase letters, digits, hyphens, no leading or trailing hyphen. A tool caller can't path-traverse out of /md/blog/ because the regex rejects the input before it reaches a URL.

Browsers without WebMCP do nothing. Today that's most of them. Chrome's early preview programme is live, the rest are watching. The API is shaped to live with that. Silent no-op when unsupported, callable when available.

MCP Server Card is the second gap. The rubric lists it separately from Agent Skills, so they're not the same spec, and iammattl doesn't serve one. The MCP Server Card is a discovery document specifically for MCP server endpoints. The agent-skills discovery I have today describes site-level skills, not a hosted MCP endpoint. The site doesn't host one yet. Worth building if the score weighting matters. Not pressing while the rest of the Capabilities category is covered.

OAuth server discovery (RFC 8414 and RFC 9728) is the third gap, and the only one I'm not closing. iammattl isn't an OAuth server. There's no authorization to advertise. The rubric weighs it because some agent-callable sites do issue tokens to bots; mine doesn't. The score docks the category, and that's the correct behaviour.

What the Score Doesn't Check (But I Still Did)

The score's four categories don't cover structured data. JSON-LD with stable @id linking, BlogPosting markup, Person and Organization entities, ImageObject with width and height pinned to the actual file bytes. None of this moves the Agent Readiness number.

It still matters. Schema is what AI Overviews and Perplexity citations key off when they're choosing which content to surface, and search engines have wanted it for years. Across iammattl there are 12 top-level entity types in use: Person, Organization, WebSite, BlogPosting, BreadcrumbList, ImageObject, ItemList, Occupation, EducationalOccupationalCredential, WebPage, ProfilePage, CollectionPage.

The connecting trick is stable @id values across pages. The Person node has @id: "https://iammattl.com/#matt". Every page that mentions Matt references the same @id rather than embedding a fresh Person inline. A consumer that resolves identifiers gets one canonical entity, not five copies that look similar. Same idea for WebSite. The ImageObject for Matt's profile photo lives in a single TypeScript constant, inlined into every Person reference so the width and height stay in lockstep with the actual /profile.webp bytes.

It's not in the Agent Readiness score. It's still worth doing.

What I Got Wrong

Built well-known endpoints before llms.txt. The agent-skills and api-catalog routes shipped weeks before I wrote llms.txt. During that gap, nothing on the site advertised them. The Link header would have, but I added it later still. A crawler that knew the conventions could find the endpoints by guessing paths; a crawler that didn't, couldn't. The Link header plus llms.txt is the discovery layer that ties everything together. Build them before the endpoints they point at, not after.

Treated robots.txt as the enforcement layer. Search engines mostly honour it. AI-specific bots are inconsistent. Smaller scrapers ignore it entirely. Robots.txt with Content Signals is the public position. Bot Management is the enforcement. Treating them as one layer was a mistake. They do different jobs, and only one of them blocks anything.

Shipped the /md/* route before adding Accept-based negotiation. Both ended up live, which is fine, but the canonical-URL hygiene only works because each /md/* route declares a canonical pointing at its HTML twin. Without that, /md/about looked like a duplicate of the home page rather than of /about. One PR cleaned it up. The lesson was that alternate representations need explicit canonical declarations, not just sensible defaults.

Underestimated how much of agent-readiness is consistency plumbing. The same date has to appear correctly on the HTML page, the JSON-LD dateModified, the sitemap <lastmod>, the markdown frontmatter updated, and the HTTP Last-Modified header. One source of truth per page was the only practical way to keep them in lockstep.

Assumed the score would weight structured data. The four scored categories don't include JSON-LD. I built the schema graph anyway because Perplexity citations and AI Overviews still key off it, but if I'd been chasing the score number rather than the underlying agent-friendliness, the schema work was time spent outside the rubric.

Where to Start on Your Site

In priority order. The first three are afternoon-of-work fixes that move the score the most.

Add an llms.txt. Cheapest fix, biggest single signal. Site name, one-paragraph description, grouped links to your canonical pages, links to your machine-readable surfaces. Ten minutes of work.

Add Content Signals to robots.txt and take a position. search=, ai-train=, and ai-input= values that reflect what you actually want. Silence is a worse signal than "no".

Emit a multi-relation Link header on every HTML response. Sitemap, feed, llms.txt, any well-known surfaces you have. RFC 8288 is well-defined. The Discoverability category weights it.

Ship markdown content negotiation. Half a day of work. Every downstream agent benefits. If your stack doesn't support response-shape negotiation easily, ship a parallel /md/* route and link it from llms.txt.

Build an Agent Skills index. Cloudflare's agent-skills-discovery-rfc is the format. Declare what an agent can do on the site, even if today it's "fetch the post list, fetch a post".

Skip WebMCP for now unless you specifically want to invest in early-adopter tooling. Browser support isn't there yet. The rest of the stack pays back today.

Closing

The agentic internet from the Connect post wasn't a future thing. The score at isitagentready.com is one of the clearest indicators that it's already here. A site that wasn't built for agents three months ago can become readable by them in an afternoon. A site that wants to be callable, not just readable, can ship that in a week.

Cloudflare's score is useful because it forces a specific question for every category: did I take a position, or did I leave it silent? Silent is the default and it's almost never the right answer. The categories matter more than the number, but the number is what gets you to read the categories.

The next question, once a site is readable and callable, is how the site itself calls agents. Not the other way around. One AI Gateway in front of every model your code talks to. That's a post for another week.