SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning

Published 20 Nov 2025 in cs.CL and cs.DL | (2511.16198v1)

Abstract: Effective scientific communication depends on accurate citations that validate sources and guide readers to supporting evidence. Yet academic literature faces mounting challenges: semantic citation errors that misrepresent sources, AI-generated hallucinated references, and traditional citation formats that point to entire papers without indicating which sections substantiate specific claims. We introduce SemanticCite, an AI-powered system that verifies citation accuracy through full-text source analysis while providing rich contextual information via detailed reasoning and relevant text snippets. Our approach combines multiple retrieval methods with a four-class classification system (Supported, Partially Supported, Unsupported, Uncertain) that captures nuanced claim-source relationships and enables appropriate remedial actions for different error types. Our experiments show that fine-tuned lightweight LLMs achieve performance comparable to large commercial systems with significantly lower computational requirements, making large-scale citation verification practically feasible. The system provides transparent, evidence-based explanations that support user understanding and trust. We contribute a comprehensive dataset of over 1,000 citations with detailed alignments, functional classifications, semantic annotations, and bibliometric metadata across eight disciplines, alongside fine-tuned models and the complete verification framework as open-source software. SemanticCite addresses critical challenges in research integrity through scalable citation verification, streamlined peer review, and quality control for AI-generated content, providing an open-source foundation for maintaining citation accuracy at scale.

Abstract PDF Upgrade to Chat

Summary

The paper presents an AI-powered pipeline for full-text citation verification using a hybrid of dense semantic retrieval and sparse keyword matching.
It employs a four-category classification system to assess citation support, offering actionable editorial recommendations for improving research integrity.
Experimental results demonstrate that medium-scale fine-tuned models achieve high accuracy and efficiency, reducing computational costs.

"SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning"

Introduction

The paper "SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning" (2511.16198) addresses the need for a more reliable method of verifying citations in academic literature. Traditionally, citation errors in scholarly publications have significantly impacted research integrity. These errors include false claims supported by references, missing qualifications, incorrect study attributions, and selective citing of results, often resulting in the propagation of misinformation. Such inaccuracies can lead to decreased credibility of research work and wasted resources.

Manual verification during peer review is insufficient due to the increasing volume of academic publications, putting strain on reviewers who lack standardized criteria or training [smith2006peer, mishra2025challenges]. Moreover, the rise of AI-generated content introduces fabricated citations that are difficult to distinguish manually [gibney2025hallucination]. Thus, there's a pressing need for automated systems capable of verifying citation accuracy through detailed analysis.

Methodology

SemanticCite presents an AI-driven approach to full-text citation verification. The system uses a multi-stage pipeline that includes PDF text extraction, hybrid retrieval processes of dense semantic similarity and sparse BM25 keyword matching, neural reranking with FlashRank, and analysis by LLMs (Figure 1). This comprehensive methodology outputs classification results, supporting evidence, detailed reasoning, and confidence scores for each verified citation task.

Figure 1: Semantic Citation Verification Pipeline: A multi-stage automated system for citation verification combining document processing, vector embedding storage, hybrid retrieval, neural reranking, and LLM-based analysis.

SemanticCite employs a nuanced four-category classification system: Supported, Partially Supported, Unsupported, and Uncertain, enabling a granular approach to citation verification. This classification captures the complexity of citation-reference relationships beyond binary models, allowing for appropriate remedial actions for different citation errors (Figure 2).

Figure 2: Four-Category Classification Scheme for Source-Claim Alignment Assessment.

The methodology highlights the system's ability to perform deep semantic analysis on complete source documents, enabling rich contextual understanding through evidence-based reasoning. The system's efficiency is underscored by demonstrating that fine-tuned lightweight models can produce results comparable to larger commercial systems, yet with reduced computational requirements.

System Architecture

SemanticCite uses advanced retrieval-enhanced generation techniques to augment citation verification. The hybrid retrieval system combines dense semantic approaches with sparse matching methods to ensure both semantic understanding and exact term correspondence. Dense retrieval utilizes vector embeddings stored in a high-dimensional space to cluster similar content, while sparse retrieval ensures that exact terms are matched precisely.

Neural reranking further refines retrieval results using cross-encoders to optimize document relevance, thereby selecting the most pertinent passages for citation analysis. This hybrid approach proves effective for academic texts that often involve specialized terminology [arivazhagan2023, sawarkar2024blended].

Model Evaluation

The models developed under SemanticCite utilize QLoRA fine-tuning techniques, employing Qwen3 models across varying scales (1.7B, 4B, and 8B parameters). Experimental results show that these models achieve significant performance enhancements over base models in both citation preprocessing and classification tasks.

For instance, the Qwen3 4B model demonstrates superior weighted accuracy and near-optimal text generation quality, achieving 83.64% weighted accuracy with high character similarity in output. This suggests that accurately fine-tuned medium-scale models can balance performance with computational efficiency, providing viable solutions for institutions with diverse resource constraints.

Practical and Theoretical Implications

SemanticCite provides actionable guidance for researchers and reviewers by transforming classifications into concrete editorial recommendations. This can significantly enhance the quality of peer review processes, allowing reviewers to focus on scholarly assessments instead of time-consuming verification tasks. Additionally, the system's capability extends beyond citation accuracy to AI-generated content verification, addressing the challenge of AI hallucinations in generated reports and summaries.

The system promotes institutional quality control by enabling retrospective analysis of published work, identifying citation patterns needing corrective action. It advances citation understanding by explicating nuanced relationships between claims and supports evidence meticulously extracted from source materials.

Conclusion

SemanticCite represents a substantial advancement in the field of citation verification, offering scalable, precise solutions for maintaining research integrity. By leveraging AI-driven methodologies such as retrieval-enhanced generation and nuanced classification frameworks, the system provides robust, evidence-based verification processes that bridge automated analysis with practical editorial strategies.

Further research efforts may explore expansion into multimodal and multilingual domains, enhanced AI-assisted improvement suggestions, and integration with existing scholarly communication processes to solidify its place as a critical tool in promoting research transparency and integrity. The system's implications extend into broader applications in automated content generation and verification, serving as a foundation for addressing persistent challenges in AI-produced scholarly material.

Markdown