Attribution behavior of smaller variants within vision-language model families

Determine whether smaller variants within the same vision-language model families exhibit attribution behavior comparable to the largest family variants evaluated in this study, particularly in light of the observed negative correlation between model parameter count and the Error Sensitivity Score (ESS) when attributing rhetorical techniques and authorial intents in misleading visualizations.

Background

The study evaluates 16 state-of-the-art multimodal models and analyzes how they attribute rhetorical techniques and authorial intents behind misleading visualizations. Across experiments, the authors report a negative correlation between total parameters and their Error Sensitivity Score (ESS), indicating that larger models often produce less discriminating attribution profiles.

Because only the largest variant of each model family was tested, and given the growing emphasis on small, efficient models for edge deployment (SLMs), the authors highlight the unresolved question of whether smaller variants within the same families would demonstrate similar attribution behavior. Establishing this would clarify how model scale and architecture influence sensitivity and alignment with human expert judgments in these tasks.

References

We evaluated the largest variant of each model family; however, the field is increasingly oriented toward small, efficient models for edge deployment (SLMs) , and it remains an open question whether smaller variants within the same family would exhibit comparable attribution behavior, especially given the negative correlation between model parameters and ESS.

True (VIS) Lies: Analyzing How Generative AI Recognizes Intentionality, Rhetoric, and Misleadingness in Visualization Lies  (2604.01181 - Blasilli et al., 1 Apr 2026) in Section: Limitations and Conclusion