The Eight Signals
Each signal is independently measurable, scorable, and improvable. Together, they form a
comprehensive framework for evaluating and optimizing AI visibility.
Signal 1: MCP (Model Context Protocol)
The Model Context Protocol (MCP) is a machine-readable endpoint that advertises tools, resources,
and capabilities to AI agents. Published at .well-known/mcp.json, the MCP manifest
tells AI systems what your site offers and how to programmatically interact with it.
MCP goes beyond passive content delivery. While most GEO signals help AI systems read
your content, MCP enables AI agents to use your services — querying APIs, accessing
structured data, and performing actions on behalf of users. This is the frontier of AI integration.
What it contains: Server name, version, available tools with input/output schemas,
resource URIs, and authentication requirements. The manifest follows the open MCP specification
and is discoverable by any compliant AI agent.
Signal 2: llms.txt
The llms.txt file is a top-level text file that provides an LLM-optimized content map,
citation guidance, and key facts about your site. Think of it as the robots.txt
equivalent for AI comprehension — while robots.txt tells crawlers what they may access,
llms.txt tells language models what they should know.
A well-crafted llms.txt includes: a concise description of the organization, a sitemap of key pages
with one-line summaries, citation preferences (how the organization prefers to be referenced),
and factual claims the organization stands behind. This file dramatically improves the accuracy
of AI-generated statements about your business.
Why it matters: Without llms.txt, AI systems infer facts about your business from
whatever fragments they can scrape. With it, you provide authoritative ground truth that shapes
how AI systems describe and recommend you.
Signal 3: Clean-Room HTML
Clean-Room HTML is server-rendered HTML delivered to bot user agents without any SPA shell,
JavaScript framework, or client-side rendering requirement. The full semantic content is present
in the initial HTTP response — no hydration, no API calls, no JavaScript execution needed.
This is implemented via edge functions triggered by bot user-agent detection. When an AI crawler
requests a page, the edge middleware identifies the bot and routes the request to a dedicated
rendering function that returns pure HTML with embedded structured data. Human visitors continue
to receive the interactive SPA experience.
Technical requirements: Proper heading hierarchy (h1 through h6), semantic elements
(article, section, nav, main, aside), lists and tables where appropriate, inline or embedded CSS
(no external stylesheet dependencies), and content parity with the human-visible version.
Signal 4: AI Content Feed
The AI Content Feed is a structured JSON file at /ai-content-index.json that
enumerates all AI-readable resources on your site with metadata. It serves as a machine-readable
table of contents that enables systematic crawling by AI systems.
Each entry includes the page URL, title, description, content type, last modified date, and
relevance tags. This gives AI crawlers an efficient way to discover and prioritize your content
without relying solely on link-following or sitemap.xml (which was designed for traditional
search engines, not AI systems).
Key difference from sitemap.xml: A sitemap lists URLs and change frequencies.
An AI content feed provides semantic context — what each page is about, what type of
content it contains, and why an AI system should index it.
Signal 5: JSON-LD Structured Data
JSON-LD (JavaScript Object Notation for Linked Data) is Schema.org structured data embedded in
every page. It provides a machine-readable context layer that tells AI systems the meaning
of your content — not just its text. Types include Organization, WebSite, Service, Article,
FAQPage, TechArticle, ScholarlyArticle, and dozens of domain-specific schemas.
Well-implemented JSON-LD transforms ambiguous text into explicit facts. Instead of an AI system
having to infer that "Acme Corp" is a company from context clues, the Organization schema
declares it explicitly with name, URL, description, founding date, contact information, and
service area.
GEO best practice: Every page should have at least one JSON-LD block with the
most specific Schema.org type applicable. Homepage gets Organization + WebSite. Service pages
get Service. Articles get Article or TechArticle. FAQ pages get FAQPage with Question/Answer pairs.
Signal 6: TTFB (Time to First Byte)
Time to First Byte measures how quickly your server responds to a request. The GEO performance
budget sets targets of p50 under 200ms and p95 under 500ms. Fast responses signal quality to
AI crawlers and improve crawl budget efficiency — crawlers can index more of your site in the
same time window.
AI crawlers have timeout thresholds. If your pages take seconds to respond (common with server-side
rendering of heavy frameworks), crawlers may abandon the request or receive incomplete content.
Edge-served clean-room HTML typically achieves sub-100ms TTFB because the response is pre-built
and served from the CDN layer closest to the crawler.
Measurement: TTFB is monitored via daily health checks that measure response
times from multiple vantage points. Results are stored in the site_health_checks
table and factor into the performance dimension of the GEO composite score.
Signal 7: AI Bots Allowed
This signal measures whether your robots.txt explicitly allows all major AI crawlers.
The required Allow directives cover: GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI), ClaudeBot,
Claude-Web (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI), CCBot
(Common Crawl), Amazonbot, Applebot, and other emerging AI user agents.
This is the opposite of the defensive posture many sites take. Where some organizations block AI
crawlers to prevent content from being used in training, GEO-optimized sites explicitly welcome
them. The reasoning is straightforward: if an AI system cannot crawl your content, it cannot
cite your business in responses. Blocking AI bots is blocking AI-driven discovery.
Implementation: A permissive robots.txt with explicit User-agent + Allow rules
for each known AI crawler, plus a wildcard Allow for the site's public pages. Crawl-delay
directives should be absent or minimal to avoid throttling AI indexing.
Signal 8: HTTP/3 (QUIC)
HTTP/3 is the modern transport protocol built on QUIC, offering faster connection establishment,
improved multiplexing, and better performance on lossy networks. For GEO purposes, HTTP/3 support
is a forward-looking infrastructure signal.
Current status: HTTP/3 is treated as N/A in GEO scoring because support is
CDN-dependent rather than application-controllable. Sites deployed on modern CDNs (Vercel,
Cloudflare, Fastly) typically receive HTTP/3 automatically. The signal is tracked for
completeness and will be weighted once AI crawlers consistently negotiate HTTP/3 connections.