FakeSV: Multimodal Benchmark for Fake News

Updated 9 November 2025

FakeSV is a comprehensive multimodal benchmark dataset designed to detect fake news on Chinese short-video platforms by integrating video, audio, text, and social signals.
It employs rigorous annotation protocols with high inter-annotator agreement (Cohen’s κ = 0.89) by cross-referencing fact-checking sources to ensure reliable labeling.
The dataset supports diverse splitting strategies and fusion models, providing actionable insights for advancing research in multimodal misinformation detection.

FakeSV is a large-scale, multimodal benchmark dataset constructed for the study and evaluation of fake news detection on Chinese short-video platforms. Designed to systematically capture the complex interplay between content signals and social context, FakeSV supports fine-grained analysis and algorithmic innovation for the detection of misinformation in rich media environments. The dataset is notable for its breadth—comprising video, audio, text, user comments, and publisher metadata—and for rigorous annotation protocols that enable both content-based and context-aware modeling.

1. Dataset Construction and Scope

FakeSV was introduced to address fundamental gaps in multimodal fake-news detection research, specifically the scarcity of public short-video benchmarks and the need for datasets integrating diverse content and extensive social signals (Qi et al., 2022). Data collection targeted two of the largest Chinese short-video platforms, Douyin and Kuaishou, focusing on content spanning January 2011 to early 2022. Fact-checking websites provided event keywords and debunked-claim seeds; these informed the crawling and annotation of video samples. The resulting corpus consists of:

Videos: Raw short video clips (≤5 minutes), sampled keyframes, and cover images.
Audio: Complete original track for each video.
Textual modalities: Titles, on-screen text via OCR, and ASR speech transcripts.
Social context: Up to 100 top user comments per video (with like-counts and reply statistics), plus publisher profiles featuring verification status, fan/follower counts, self-introduction, and historical publishing activity.

FakeSV includes three label categories: Fake, Real, and Debunked, with the binary Fake vs. Real task being canonical for most detection experiments.

Label	Count
Fake	1,827
Real	1,827
Debunked	1,884
Total	5,538

2. Annotation Protocol and Quality Control

Label assignment in FakeSV is performed by cross-referencing video content with established fact-checking portals (e.g., Weibo Community Management, Tencent Jiaozhen, China Fact Check) (Qi et al., 2022). For the primary release, nine postgraduate annotators underwent protocol training followed by a two-pass review involving first and second authors. Consensus mechanisms resolved ambiguous cases; “Other” labels were assigned for irreducible uncertainty. Inter-annotator agreement, as quantified by Cohen’s κ, reached 0.89—indicative of “almost perfect” agreement.

Labels follow strict criteria:

Fake: Video and title jointly present a claim previously debunked and not supported by authoritative news sources.
Real: Verified by independent, reputable news reports.
Debunked: Videos concerning claims that have been debunked but not directly reused as positive Fake examples.

3. Multimodal Structure and Feature Representation

Each FakeSV instance incorporates multiple modalities:

Visual: Video keyframes (sampled, cover), resized and normalized for feature extraction (e.g., VGG-19).
Audio: Original track, resampled and converted into log-mel spectrograms; encoded with models such as Wav2Vec.
Text: Tokenized and embedded using pre-trained LLMs (e.g., BERT-base-Chinese), encompassing both OCR outputs and manual/human-edited captions.
Comments: Each comment embedded using BERT, weighted by like-count, and aggregated by formulas such as

$x_C = \sum_{j=1}^k \frac{l_j + 1}{\sum_{t=1}^k l_t + k} \cdot c_j$

Publisher Metadata: Quantitative fields (fan counts, etc.) are min-max normalized.

These modalities are supplied to models either as individual streams or through hierarchical fusion networks that employ cross-modal attention mechanisms (Yan et al., 12 Jan 2025, Qi et al., 2022).

4. Dataset Splits, Evaluation Protocol, and Statistics

FakeSV supports several data splitting strategies:

Chronological/Timestamped: 70% train, 15% validation, 15% test, with strict temporal hold-out to mimic real-world deployment (Yan et al., 12 Jan 2025, Bu et al., 2024).
Event-level K-fold: Five-fold cross-validation at the event description level, ensuring that test sets comprise unseen events (Qi et al., 2022).

Rounded example split counts (for $N=5,538$ ):

Split	#Videos
Train	3,876
Val	831
Test	831

Key observed data statistics (Fake vs. Real sets):

Statistic	Fake	Real
Avg. title length	22 chars	35 chars
% Empty titles	12%	5%
% with comments	68%	75%
Avg. #comments (top100)	58	62
Publisher verified	15%	75%
Median fan count	1,200	12,000
Video $\leq$ 5min	100%	100%

There is no class imbalance when the canonical "Fake" vs. "Real" split is used.

5. Analytical Insights and Modality-Specific Patterns

FakeSV’s layered annotation enables exploratory analyses revealing modality-specific trends (Qi et al., 2022, Bu et al., 2024):

Textual features: Fake videos have shorter and more colloquial/emotional titles ("OMG," frequent questions). Emotion lexicon scoring reveals higher “like” and “surprise” in fakes.
Visual features: NIQE metrics demonstrate significantly lower visual quality in fake videos (statistically significant at $p < 0.01$ ).
Audio features: Increased concentration of high-arousal emotion classes in fake speech.
Publisher and social context: Fake videos disproportionately originate from unverified publishers (15% verified vs. 75% for reals). Fakes evoke a higher rate of “doubtful” comments (18% vs. 4% for real). Publisher profiles for fakes tend toward higher consumption and lower production metrics.
Temporal/propagation patterns: 39% of fake videos appear after official debunking events; cover-image duplication rates are elevated in fakes.

Creative-process based analysis (Bu et al., 2024) further exposes that fake news videos in FakeSV exhibit:

Higher variance in audio-emotion logits.
Lower text–visual semantic alignment, as measured by JS divergence between CLIP-encoded frame and text distributions.
Less color-rich and spatially refined on-screen text.
More monotonous temporal text-exposure patterns.

6. Benchmark Tasks, Methodologies, and Performance

FakeSV serves as a benchmark for multiple detection methodologies (Qi et al., 2022, Yan et al., 12 Jan 2025, Bu et al., 2024), supporting tasks such as multimodal fake news detection, social-context analysis, and creative-process-aware modeling.

Models ingest all available modalities (video, audio, text, comments, publisher profiles). Standard feature extraction pipelines include BERT for text, VGG-19 for images, and Wav2Vec for audio. Hierarchical and co-attention fusion mechanisms are employed to exploit cross-modal interactions. Losses are computed at multiple fusion levels in architectures such as MTPareto (Yan et al., 12 Jan 2025). Ablation studies and cross-modal selection strategies demonstrate non-trivial accuracy improvements when all modalities are employed.

Performance on the core binary task (Fake vs. Real), as reported across various methods and splits:

Model	Acc (%)	F1 (%)
BERT (text)	76.8	76.8
MyVC (text+img+comments)	75.1	75.0
TT (video+audio)	75.0	75.0
SV-FEND (all modals)	79.3	79.2
MTPareto	84.50	84.15
FakingRecipe	85.35	84.83

Reported metrics: Accuracy, Precision, Recall, and F₁-score, as well as macro-F1 for imbalanced or multiclass tasks.

7. Challenges, Limitations, and Future Directions

FakeSV’s real-world focus, multimodal richness, and fine-grained labels establish it as a challenging benchmark for fake news detection research. However, several open challenges persist:

Domain adaptation: Short-video specific artifacts, linguistic features (Chinese), and social context differ from text/news-image datasets and from Western media.
Generalization: Emerging event types and new content forms may induce "concept drift." Chronological evaluation mitigates but does not eliminate this concern.
Annotation: Although inter-annotator agreement is high, some cases ("Other" or "Debunked") remain under-characterized for certain downstream uses.
Creative process cues: The importance of cross-modal semantic alignment, editing traces, and emotional inference in fakes suggests new detection paradigms and potential for transfer learning.
Integration with singing-voice deepfake (Fake Song Detection): While not the same domain, insights from FSD regarding domain-specific model training (Xie et al., 2023) highlight the inadequacy of speech-trained ADD baselines for genre-adapted tasks, a consideration that may carry into future FakeSV expansions.

A plausible implication is that future benchmarks will further increase modality diversity, introduce crosslingual evaluation, and provide annotation for nuanced video edits and composition processes. The empirical superiority of process-aware detection (as shown by FakingRecipe's ~5% accuracy increase over previous SOTA (Bu et al., 2024)) underscores the ongoing need for both dataset and methodological innovation.

PDF Markdown Chat (Pro)

References (4)

FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms (2022)

MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection (2025)

FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process (2024)

FSD: An Initial Chinese Dataset for Fake Song Detection (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to FakeSV Dataset.

FakeSV: Multimodal Benchmark for Fake News

1. Dataset Construction and Scope

2. Annotation Protocol and Quality Control

3. Multimodal Structure and Feature Representation

4. Dataset Splits, Evaluation Protocol, and Statistics

5. Analytical Insights and Modality-Specific Patterns

6. Benchmark Tasks, Methodologies, and Performance

7. Challenges, Limitations, and Future Directions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

FakeSV: Multimodal Benchmark for Fake News

1. Dataset Construction and Scope

2. Annotation Protocol and Quality Control

3. Multimodal Structure and Feature Representation

4. Dataset Splits, Evaluation Protocol, and Statistics

5. Analytical Insights and Modality-Specific Patterns

6. Benchmark Tasks, Methodologies, and Performance

7. Challenges, Limitations, and Future Directions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research