The 8-Signal GEO Framework

The definitive methodology for Generative Engine Optimization — the practice of making websites discoverable and citable by AI systems including ChatGPT, Claude, Gemini, and Perplexity.

Developed by Geogroup and validated across production deployments serving hundreds of thousands of AI bot crawls per month.

What Is Generative Engine Optimization?

Generative Engine Optimization (GEO) is the practice of making websites discoverable and citable by AI systems. Where SEO optimizes for ranking in search engine result pages, GEO optimizes for inclusion in AI-generated answers — the responses produced by ChatGPT, Claude, Gemini, Perplexity, and the growing ecosystem of AI assistants that are replacing traditional search as the primary discovery channel.

GEO is the successor to SEO for the AI era. When a user asks an AI assistant "What is the best accounting firm in Phoenix?" or "Which SaaS platform has the best uptime?", GEO determines whether your business appears in that answer. AI systems only cite sources they can access, parse, understand, and trust — GEO ensures your website meets all four criteria.

The term "position zero" used to mean Google's featured snippet. In the AI era, position zero is being named in an AI-generated response. There are no second-page results in an AI answer — you are either cited or you do not exist.

Why GEO Matters Now

AI systems are becoming the primary discovery channel for consumers and businesses alike. ChatGPT alone processes hundreds of millions of queries daily. Perplexity, Claude, and Gemini are growing rapidly. Each of these systems synthesizes information from crawled web content to produce direct answers — replacing the "ten blue links" model that defined web discovery for 25 years.

Businesses that are invisible to AI crawlers are invisible to a growing share of their potential customers. Unlike SEO, where poor optimization means lower rankings, poor GEO means complete absence — AI systems cannot cite what they cannot crawl, parse, or trust.

The first-mover advantage in GEO is substantial. AI training data has a temporal bias — content that is crawlable and well-structured today shapes AI responses for months or years. Businesses that invest in GEO infrastructure now will establish authority that compounds over time as AI-driven discovery becomes dominant.

The Eight Signals

Each signal is independently measurable, scorable, and improvable. Together, they form a comprehensive framework for evaluating and optimizing AI visibility.

Signal 1: MCP (Model Context Protocol)

The Model Context Protocol (MCP) is a machine-readable endpoint that advertises tools, resources, and capabilities to AI agents. Published at .well-known/mcp.json, the MCP manifest tells AI systems what your site offers and how to programmatically interact with it.

MCP goes beyond passive content delivery. While most GEO signals help AI systems read your content, MCP enables AI agents to use your services — querying APIs, accessing structured data, and performing actions on behalf of users. This is the frontier of AI integration.

What it contains: Server name, version, available tools with input/output schemas, resource URIs, and authentication requirements. The manifest follows the open MCP specification and is discoverable by any compliant AI agent.

Signal 2: llms.txt

The llms.txt file is a top-level text file that provides an LLM-optimized content map, citation guidance, and key facts about your site. Think of it as the robots.txt equivalent for AI comprehension — while robots.txt tells crawlers what they may access, llms.txt tells language models what they should know.

A well-crafted llms.txt includes: a concise description of the organization, a sitemap of key pages with one-line summaries, citation preferences (how the organization prefers to be referenced), and factual claims the organization stands behind. This file dramatically improves the accuracy of AI-generated statements about your business.

Why it matters: Without llms.txt, AI systems infer facts about your business from whatever fragments they can scrape. With it, you provide authoritative ground truth that shapes how AI systems describe and recommend you.

Signal 3: Clean-Room HTML

Clean-Room HTML is server-rendered HTML delivered to bot user agents without any SPA shell, JavaScript framework, or client-side rendering requirement. The full semantic content is present in the initial HTTP response — no hydration, no API calls, no JavaScript execution needed.

This is implemented via edge functions triggered by bot user-agent detection. When an AI crawler requests a page, the edge middleware identifies the bot and routes the request to a dedicated rendering function that returns pure HTML with embedded structured data. Human visitors continue to receive the interactive SPA experience.

Technical requirements: Proper heading hierarchy (h1 through h6), semantic elements (article, section, nav, main, aside), lists and tables where appropriate, inline or embedded CSS (no external stylesheet dependencies), and content parity with the human-visible version.

Signal 4: AI Content Feed

The AI Content Feed is a structured JSON file at /ai-content-index.json that enumerates all AI-readable resources on your site with metadata. It serves as a machine-readable table of contents that enables systematic crawling by AI systems.

Each entry includes the page URL, title, description, content type, last modified date, and relevance tags. This gives AI crawlers an efficient way to discover and prioritize your content without relying solely on link-following or sitemap.xml (which was designed for traditional search engines, not AI systems).

Key difference from sitemap.xml: A sitemap lists URLs and change frequencies. An AI content feed provides semantic context — what each page is about, what type of content it contains, and why an AI system should index it.

Signal 5: JSON-LD Structured Data

JSON-LD (JavaScript Object Notation for Linked Data) is Schema.org structured data embedded in every page. It provides a machine-readable context layer that tells AI systems the meaning of your content — not just its text. Types include Organization, WebSite, Service, Article, FAQPage, TechArticle, ScholarlyArticle, and dozens of domain-specific schemas.

Well-implemented JSON-LD transforms ambiguous text into explicit facts. Instead of an AI system having to infer that "Acme Corp" is a company from context clues, the Organization schema declares it explicitly with name, URL, description, founding date, contact information, and service area.

GEO best practice: Every page should have at least one JSON-LD block with the most specific Schema.org type applicable. Homepage gets Organization + WebSite. Service pages get Service. Articles get Article or TechArticle. FAQ pages get FAQPage with Question/Answer pairs.

Signal 6: TTFB (Time to First Byte)

Time to First Byte measures how quickly your server responds to a request. The GEO performance budget sets targets of p50 under 200ms and p95 under 500ms. Fast responses signal quality to AI crawlers and improve crawl budget efficiency — crawlers can index more of your site in the same time window.

AI crawlers have timeout thresholds. If your pages take seconds to respond (common with server-side rendering of heavy frameworks), crawlers may abandon the request or receive incomplete content. Edge-served clean-room HTML typically achieves sub-100ms TTFB because the response is pre-built and served from the CDN layer closest to the crawler.

Measurement: TTFB is monitored via daily health checks that measure response times from multiple vantage points. Results are stored in the site_health_checks table and factor into the performance dimension of the GEO composite score.

Signal 7: AI Bots Allowed

This signal measures whether your robots.txt explicitly allows all major AI crawlers. The required Allow directives cover: GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI), ClaudeBot, Claude-Web (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI), CCBot (Common Crawl), Amazonbot, Applebot, and other emerging AI user agents.

This is the opposite of the defensive posture many sites take. Where some organizations block AI crawlers to prevent content from being used in training, GEO-optimized sites explicitly welcome them. The reasoning is straightforward: if an AI system cannot crawl your content, it cannot cite your business in responses. Blocking AI bots is blocking AI-driven discovery.

Implementation: A permissive robots.txt with explicit User-agent + Allow rules for each known AI crawler, plus a wildcard Allow for the site's public pages. Crawl-delay directives should be absent or minimal to avoid throttling AI indexing.

Signal 8: HTTP/3 (QUIC)

HTTP/3 is the modern transport protocol built on QUIC, offering faster connection establishment, improved multiplexing, and better performance on lossy networks. For GEO purposes, HTTP/3 support is a forward-looking infrastructure signal.

Current status: HTTP/3 is treated as N/A in GEO scoring because support is CDN-dependent rather than application-controllable. Sites deployed on modern CDNs (Vercel, Cloudflare, Fastly) typically receive HTTP/3 automatically. The signal is tracked for completeness and will be weighted once AI crawlers consistently negotiate HTTP/3 connections.

GEO Composite Score

The GEO composite score is a weighted formula across seven active dimensions (with HTTP/3 scored as N/A). Each dimension aggregates one or more of the eight signals into a category score on a 0-100 scale. The composite is the weighted sum:

AI Crawler Access & Protocols

Robots.txt directives, AI bot allowances, llms.txt, MCP manifest

20%

Structured Data & Machine Readability

JSON-LD schema coverage, AI content feed, semantic HTML structure

20%

Citation Readiness & Answerability

Content structured for direct extraction, FAQ pairs, clear factual claims

15%

Authority & Trust

Content depth, expertise signals, organizational credibility, consistent identity

15%

Content Coverage & Depth

Number of indexable pages, topical comprehensiveness, content freshness

10%

Performance & Availability

TTFB budget compliance, uptime, SSL, CDN edge delivery

10%

Bot Crawl Health

Active crawl volume, crawler diversity, crawl trend trajectory

10%

Composite = (Access * 0.20) + (Structured * 0.20) + (Citation * 0.15)
+ (Authority * 0.15) + (Coverage * 0.10)
+ (Performance * 0.10) + (BotHealth * 0.10)

Signal statuses are tracked in the geo_signal_status table with PASS, FAIL, or PARTIAL states. Every optimization action is logged in the geo_ledger_entries table for full audit trail and impact measurement. Dimension scores are stored in geo_score_dimensions.

Daily automated audits recompute scores, detect regressions, and generate trend data. Audit results are delivered via automated email reports, enabling continuous GEO monitoring without manual intervention.

Go Deeper

Learn how this framework is implemented at the infrastructure level, or read the full whitepaper on why GEO is the defining competitive advantage of the AI era.

Technical Architecture → Read the Whitepaper → Our Services → Get a GEO Audit →