InfoAffect Dataset: Multimodal Affective Analysis
- InfoAffect dataset is a corpus of 3,500 infographic-text pairs annotated for affect (positive, neutral, negative) using state-of-the-art multimodal LLMs.
- It employs rigorous preprocessing, an accompanied-text-priority method, and model ensemble fusion to ensure high-quality, consistent affect annotations.
- The resource provides actionable insights for affective computing, enhancing visual communication and multimodal learning applications.
The InfoAffect dataset is a corpus of 3,500 real-world infographic–text pairs, annotated for affective content across positive, neutral, and negative dimensions. InfoAffect is designed to facilitate affective analysis of infographics by integrating multimodal information from both visual design and accompanied textual descriptions. The construction process employs modern multimodal LLMs (MLLMs), rigorous preprocessing, quality control, and a supervised affect-annotation scheme to deliver highly consistent affect ratings. The resultant dataset serves as a resource for research at the intersection of affective computing, visual communication, and multimodal learning, with demonstrated high agreement between model-based annotations and human affect judgments (Fu et al., 9 Nov 2025).
1. Data Composition and Domain Coverage
InfoAffect comprises 3,500 infographic–text pairs curated from six distinct domains to maximize topical diversity and represent a broad spectrum of real-world designs. The source domains are:
- Data Journalism (media outlets, online news)
- Government Portals (public-policy reports)
- Non-Governmental Organizations (white papers, policy briefs)
- Academic & Institutional Repositories (scientific archives)
- Corporate & Institutional Reports (annual, sustainability reports)
- Blogs, Documentation & Social Media (technical guides, influencer-shared infographics)
Preprocessing procedures include canonicalizing images to RGB, enforcing maximum image size, discarding corrupted or low-resolution files, and de-duplicating instances using perceptual hashing. For infographics embedded in PDF or HTML, the pipeline extracts the focal figure to ensure alignment between the image and the accompanying text.
A central feature is the accompanied-text-priority method: when a headline, standfirst, or caption is present, it is utilized verbatim; otherwise, GPT-4o generates a one-sentence, factually faithful description. Quality control is enforced through consistency checks between visual and textual content, journalistic style compliance, and stratified human review to exclude speculative and sensitive material.
2. Affect Annotation Framework
The affect annotation process centers around a strictly bounded Affect Table. To construct this lexicon, the procedure samples 1,000 infographics from five preexisting datasets and prompts GPT-4o to extract affective words. These terms undergo a two-stage clustering:
- First-level: Polarity labeling as Positive, Neutral, or Negative.
- Second-level: Fine-grained clustering using Word2Vec embeddings and HDBSCAN. Core distances and mutual-reachability distances are used to construct a minimum-spanning tree, with clusters pruned to yield stable and semantically coherent groups, each represented by a centroid vector termed a canonical "affect label".
Representative second-level affect labels include:
- Positive: content, hopeful, surprised, proud, awestruck, excited, grateful, etc.
- Neutral: reflective, calm, neutral, attentive, indifferent, uncertain, curious, etc.
- Negative: bored, frustrated, shocked, annoyed, sad, concerned, overwhelmed, etc.
Annotators (the five selected MLLMs) are constrained to select exactly 5–8 affect labels per instance from the Affect Table, assigning a confidence score in the range [0, 100] to each. The annotation protocol is formalized via the TIDD-EC prompt framework, which specifies the task, provides step-wise instructions, restricts outputs to a JSON schema, and appends the full Affect Table for reference. This scheme enforces label consistency and prevents label drift in affect assignment.
3. Multimodal Model Ensemble and Fusion Methodology
Five state-of-the-art MLLMs serve as annotators:
- GPT-4o
- Claude Sonnet 4
- Doubao-1.5-Pro
- Gemini 2.5
- Qwen3-VL-Plus
Each model receives both the infographic image and the corresponding text in a unified multimodal prompt, jointly attending to visual (color, composition, iconography) and textual semantic features. The model output is a structured JSON object listing "affective_words": an array of {word, confidence} entries.
To aggregate annotations across models, the Reciprocal Rank Fusion (RRF) method is applied. For each possible affect label and each model index :
- = rank of in model ’s output (1 = highest)
- = confidence score by model
- Fused score:
where . If is not returned by model , , so the contribution is 0.
Labels are sorted by , yielding aggregated affect distributions with robust confidence values, balancing complementary strengths across models.
4. Evaluation and Human Agreement
InfoAffect employs the Composite Affect Consistency Index (CACI) to quantify alignment between system-extracted affects and human raters. For each infographic, human and system outputs are reduced to three-dimensional vectors (Negative, Neutral, Positive counts). The measurement consists of:
- Dominant-Match Indicator :
- Cosine Similarity :
- Composite Index: .
User studies involve two experimental groups: 12 subject-matter experts (faculty, students, practitioners) for usability, and 44 total raters (including 32 MTurk contributors) for accuracy. In the accuracy paper (305 randomly sampled infographics), human raters categorize affects, and CACI values are computed:
| Class | Mean CACI |
|---|---|
| Positive | 0.992 |
| Neutral | 0.982 |
| Negative | 0.985 |
The overall mean CACI for InfoAffect is 0.986, denoting extremely high agreement between automated extraction and crowd annotations.
5. Key Insights, Limitations, and Applications
Analysis of InfoAffect supports several findings:
- Dual-modal pairing of visual and textual modalities produces more faithful affect predictions than image-only baselines.
- Constraining annotation to a fixed Affect Table via prompt-based extraction eliminates uncontrolled label drift.
- Model ensemble fusion (RRF) increases robustness versus single-model annotation.
Core applications include:
- Providing affective feedback to infographic designers for communication impact tuning.
- Fine-tuning multimodal LLMs for sentiment/emotion analysis in graphic media.
- Investigating design guidelines that modulate affect through layout, color, and typography.
Primary limitations noted are:
- Dataset scale is moderate, with rare affects under-represented and label frequencies long-tailed.
- Clustering granularity produces imbalanced second-level clusters, complicating downstream modeling of rare affective states.
6. Recommendations and Future Directions
Recommendations include dataset expansion to increase coverage of rare affects and topical breadth. The development of standardized human rating protocols (e.g., anchor stimuli, controlled exposure) is advised to supplement model-based annotation. Element-wise interventions (e.g., modifying color ramps, iconography, or headline framing) could be studied to establish causal links between design features and affective response. Providing a lightweight affective evaluation kit for reproducible human-in-the-loop annotation is identified as a desirable extension.
A plausible implication is that large-scale, systematically annotated multimodal datasets such as InfoAffect will support the evolution of both affective computing and the design sciences, especially as automated, scalable methodologies supplant exclusively manual annotation workflows (Fu et al., 9 Nov 2025).