Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 133 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 441 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

AI Answer Engine Citation Behavior An Empirical Analysis of the GEO16 Framework (2509.10762v1)

Published 13 Sep 2025 in cs.AI

Abstract: AI answer engines increasingly mediate access to domain knowledge by generating responses and citing web sources. We introduce GEO-16, a 16 pillar auditing framework that converts on page quality signals into banded pillar scores and a normalized GEO score G that ranges from 0 to 1. Using 70 product intent prompts, we collected 1,702 citations across three engines (Brave Summary, Google AI Overviews, and Perplexity) and audited 1,100 unique URLs. In our corpus, the engines differed in the GEO quality of the pages they cited, and pillars related to Metadata and Freshness, Semantic HTML, and Structured Data showed the strongest associations with citation. Logistic models with domain clustered standard errors indicate that overall page quality is a strong predictor of citation, and simple operating points (for example, G at least 0.70 combined with at least 12 pillar hits) align with substantially higher citation rates in our data. We report per engine contrasts, vertical effects, threshold analysis, and diagnostics, then translate findings into a practical playbook for publishers. The study is observational and focuses on English language B2B SaaS pages; we discuss limitations, threats to validity, and reproducibility considerations.

Summary

  • The paper introduces the GEO-16 framework, a 16-pillar system designed to quantify on-page quality signals that predict citation behavior in AI answer engines.
  • It employs robust statistical methods, including logistic regression, to determine operational thresholds (G ≥ 0.70 and ≥12 pillar hits) and compare engine performance.
  • Findings reveal that high-quality metadata, semantic structure, and authoritative references significantly enhance citation rates, with Brave outpacing competitors.

Empirical Analysis of AI Answer Engine Citation Behavior via the GEO-16 Framework

Introduction

This paper presents a rigorous empirical paper of citation behavior in AI answer engines, focusing on the determinants of which web pages are cited in generative search results. The authors introduce the GEO-16 framework, a 16-pillar page auditing system designed to quantify granular on-page quality signals relevant to citation likelihood. The paper targets B2B SaaS domains, harvesting 1,702 citations from Brave Summary, Google AI Overviews, and Perplexity across 70 industry-focused prompts, and auditing 1,100 unique URLs. The analysis provides operational thresholds and actionable recommendations for publishers seeking to optimize their content for AI-driven discoverability.

GEO-16 Framework and Theoretical Principles

The GEO-16 framework operationalizes six core principles that link human-readable quality to machine parsability and retrieval/citation behavior in answer engines:

  • People-first content: Emphasizes answer-first summaries, clear structure, and explicit claim demarcation.
  • Structured data: Requires semantic HTML, valid JSON-LD schema, canonical URLs, and logical heading hierarchies.
  • Provenance: Prioritizes authoritative sources, inline citations, and transparency.
  • Freshness: Surfaces visible and machine-readable timestamps, revision history, and current sitemaps.
  • Risk management: Enforces editorial review and fact-checking for accuracy and regulatory compliance.
  • RAG optimization: Promotes scoped topics, dense internal/external linking, and canonicalization to facilitate retrieval.

Each page is scored 0–3 per pillar, aggregated to a normalized GEO score G∈[0,1]G \in [0,1], and pillar hits are counted for scores ≥2\geq 2. The framework is designed to be reproducible and interpretable, enabling robust benchmarking across engines and domains.

Methodology

The paper employs a cross-sectional, multi-engine audit. Prompts are crafted to elicit vendor citations across 16 B2B SaaS verticals. Citations are collected from Brave, Google AIO, and Perplexity, with strict URL normalization and deduplication. Each URL is fully rendered and scored using the GEO-16 framework. Statistical analyses include correlation, permutation tests, and logistic regression with domain-clustered standard errors. Thresholds for GEO score and pillar hits are selected via Youden’s JJ and micro-averaged F1_1.

Results

Engine Citation Behavior

Brave Summary consistently cites higher-quality pages (mean G=0.727G=0.727), followed by Google AIO (G=0.687G=0.687), while Perplexity lags (G=0.300G=0.300). Citation rates mirror this trend: Brave (78%), Google AIO (72%), Perplexity (45%). The distribution of citations and GEO scores across engines and domains is visualized in (Figure 1). Figure 1

Figure 1: Citation analysis across engines and domains, including average citations per answer, government/education link ratios, citation density, and GEO score versus citation count.

Brave and Google AIO favor authoritative domains, with a higher ratio of government/education links, while Perplexity exhibits lower citation density and GEO scores.

Domain and Vertical Analysis

Cloud and insurance domains achieve the highest average GEO scores and pillar hits, whereas customer service and HR domains trail (Figure 2). Figure 2

Figure 2: Top domains by average GEO score and pillar hits, highlighting sectoral disparities in page quality and citation likelihood.

A heatmap of GEO scores across domains and engines reveals that Brave consistently achieves higher scores, while Perplexity trails markedly (Figure 3). Figure 3

Figure 3: Heatmap of average GEO scores across domains and engines, illustrating engine-specific citation preferences.

Pillar-Level Insights

Correlation analysis identifies Metadata {content} Freshness (r=0.68r=0.68), Semantic HTML (r=0.65r=0.65), and Structured Data (r=0.63r=0.63) as the strongest predictors of citation likelihood. Evidence {content} Citations and Authority {content} Trust also show substantial impact. Pillar breakdowns indicate that UX {content} Readability and Metadata {content} Freshness score highest, while Transparency {content} Ethics and Visuals {content} Media score lowest (Figure 4). Figure 4

Figure 4: Average performance across all GEO-16 pillars and pillar performance by engine, with Perplexity exhibiting lower scores across most dimensions.

Thresholds and Predictive Modeling

A practical operating point emerges: pages with G≥0.70G \geq 0.70 and ≥12\geq 12 pillar hits achieve a 78% cross-engine citation rate. Logistic regression yields an odds ratio of 4.2 [3.1, 5.7] for GEO score, confirming its strong predictive power. Cross-engine citations (URLs cited by multiple engines) exhibit 71% higher quality scores than single-engine citations.

Comprehensive engine performance analysis (Figure 5) shows Brave leading across GEO score, pillar hits, source diversity, and citation density. Figure 5

Figure 5: Complete engine performance analysis, summarizing GEO score, pillar hits, source diversity, and citation density for each engine.

Discussion and Implications

The findings demonstrate that on-page quality signals—especially metadata freshness, semantic structure, and structured data—are critical for AI engine discoverability. However, generative engines systematically favor earned media and authoritative third-party domains, often excluding brand-owned and social content. This introduces a dual optimization challenge: publishers must not only meet GEO-16 thresholds but also secure coverage on authoritative domains to maximize citation likelihood.

Actionable recommendations include:

  • Exposing both human- and machine-readable recency signals (visible dates, JSON-LD).
  • Enforcing semantic hierarchy and schema completeness.
  • Providing diverse references to authoritative sources.
  • Maintaining accessible, well-structured page layouts.
  • Cultivating earned media relationships and diversifying content distribution.

Limitations include the focus on English-language B2B SaaS content and the observational design, which may be subject to unobserved confounding. Future work should extend the framework to other languages, verticals, and experimental interventions (e.g., schema ablations, reference density manipulations).

Conclusion

The GEO-16 framework provides a reproducible, interpretable system for linking granular on-page quality signals to AI answer engine citation behavior. Operational thresholds (G≥0.70G \geq 0.70, ≥12\geq 12 pillar hits) align with substantial gains in citation likelihood. Engine comparisons reveal distinct signal preferences, underscoring the need for both on-page optimization and strategic positioning on authoritative domains. The framework offers actionable benchmarks for publishers seeking to enhance their visibility in generative search, with implications for future research in cross-lingual and multimodal citation behavior.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 16 tweets and received 73 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com