Cross-Modal Safety Consistency (CMSC-score)
- The paper introduces CMSC-score to quantify cross-modal safety consistency through an exponential decay function applied to the standard deviation of modality-specific safety scores.
- Methodology involves aggregating safety scores from 24 modality sub-categories using parallelized prompts and LLM-based comprehension filtering to ensure valid comparisons.
- Empirical findings on OLLMs show CMSC-scores ranging from 0.42 to 0.83, revealing cross-modal vulnerabilities and emphasizing the need for uniform safety performance.
The Cross-Modal Safety Consistency Score (CMSC-score) is a quantitative metric introduced to assess the consistency of safety performance in Omni-modal LLMs (OLLMs) across a spectrum of input modalities, including text, audio, image, and their combinations. CMSC-score is designed to reveal whether an OLLM's refusal and safe-content behaviors are uniform when the same semantic prompt is delivered through different sensory inputs. This metric is a core contribution of Omni-SafetyBench, the first large-scale benchmark explicitly constructed for evaluating OLLM safety under joint modality transformations, where traditional single-modal safety metrics often fail to capture cross-modal vulnerabilities (Pan et al., 10 Aug 2025).
1. Formal Definition and Mathematical Formulation
The CMSC-score is predicated on collecting Safety-scores, denoted , for modality sub-categories derived from parallelizing benchmark prompts. The computation proceeds as follows:
- Compute the mean Safety-score:
- Compute the standard deviation:
- Transform the dispersion into a consistency score with:
where is a user-defined sensitivity hyperparameter (set to 5 in (Pan et al., 10 Aug 2025)). Each by design, with in the Omni-SafetyBench setting.
CMSC-score lies in the interval , with a value of 1 indicating perfect cross-modal consistency (). As the divergence in per-modality Safety-scores increases, CMSC-score decays exponentially toward zero.
2. Component Terms and Metric Rationale
- (Safety-score): The OLLM’s safety metric for the -th modality, itself a conditional function of the conditional Attack Success Rate (C-ASR) and conditional Refusal Rate (C-RR), as per Equation (2) in (Pan et al., 10 Aug 2025).
- (Mean): Represents the central tendency of safety across modalities.
- (Standard Deviation): Measures fluctuations in safety performance and is the critical determinant for penalizing inconsistency.
- (Sensitivity): Controls the penalty gradient; empirically set to 5 to ensure meaningful discriminative power across the observed dispersion range.
- The exponential mapping enforces a sharp penalty for even moderate inconsistency, focusing evaluative pressure on uniform safety irrespective of overall level.
The rationale is to avoid a metric that rewards models for selectively high safety in a single modality if that same model is arbitrarily vulnerable in others.
3. Methodology and Benchmarking Protocol
The computation of CMSC-score is embedded within the Omni-SafetyBench evaluation pipeline as follows:
- Prompt Parallelization: Seed prompts are instantiated into 24 distinct modality inputs (text, image, audio, and all pairwise/tri-modal combinations).
- Comprehension Filtering: Each input’s comprehension is judged by an LLM-based judge (Qwen-Plus). Inputs not understood by the target OLLM are excluded from safety aggregation, as misunderstanding can artifically inflate safety metrics.
- Safety Labeling:
- Responses to understood inputs are annotated for harm (unsafe) and refusals.
- C-ASR and C-RR are calculated by conditioning on the “understood” subset.
- The Safety-score is then synthesized as:
- Aggregation and CMSC-score Computation: Safety-scores are aggregated for modalities meeting a minimum comprehension threshold (understand rate 20%). and are computed on this set, and the final CMSC-score is produced via .
A minimum of two modalities is required after filtering to yield a meaningful .
4. Key Empirical Findings and Illustrative Example
Empirical analysis on ten OLLMs using Omni-SafetyBench yields CMSC-scores across a typical range of 0.42 to 0.83. Notable results from Table 7 of (Pan et al., 10 Aug 2025) include:
| Model Name | CMSC-score |
|---|---|
| gemini-2.5-pro | 0.83 |
| gemini-2.5-pro-preview | 0.80 |
| Unified-IO2-xxlarge | 0.81 |
| Qwen2.5-omni-7b | 0.74 |
| Baichuan-omni-1.5 | 0.56 |
| Minicpm-o-2.6 | 0.42 |
A high CMSC-score signals that an OLLM’s refusal and safe-content output tendencies are stable across text, audio, image, and joint modalities. Conversely, lower scores highlight cross-modal vulnerabilities even when other safety metrics may appear satisfactory.
Illustrative Example: If a model achieves (text), (image), and (audio), then
This reflects moderate cross-modal consistency.
5. Assumptions, Edge Cases, and Metric Constraints
- Score Range Enforcement: All are bounded within ; and the empirical are correspondingly constrained.
- Filtering of Low-Comprehension Modalities: Any modality with an “understand rate” is excluded from aggregation to prevent models with persistent “I don't know” behaviors from attaining artificial consistency.
- Missing Data: Modalities lacking sufficient comprehension are omitted; at least two must remain.
- Perfect Consistency: yields .
- Hyperparameter : Chosen heuristically; modifies the functional sensitivity but does not influence the score’s theoretical range.
A plausible implication is that practitioners must interpret CMSC-score in conjunction with average Safety-score, as uniform low safety yields high CMSC but is undesirable in practice.
6. Limitations and Potential Extensions
- Dispersion-Only Focus: CMSC-score exclusively measures standard deviation of Safety-scores. Uniformly poor safety (e.g., across all modalities) produces high CMSC. Joint reporting with the mean Safety-score is essential for a comprehensive assessment.
- Heuristic Sensitivity Parameter (): The current setting (5) is not data-driven. Alternative parameter selection strategies may improve fidelity and discriminative utility.
- Outlier Susceptibility: Single outliers among modalities can dominate , potentially skewing consistency evaluation. Robust statistical alternatives, such as median absolute deviation, are suggested as possible refinements.
- Modality Weighting: The current formulation assumes independence and equal importance across modality sub-categories. Practical applications may warrant differential weighting, especially for more security-critical modalities (e.g., audio-visual joint attacks).
- Metric Integration: Future research could consider embedding cross-modal consistency within the Safety-score itself, or devising joint metrics that penalize both low mean safety and high dispersion simultaneously.
7. Significance in OLLM Safety Research
CMSC-score provides a systematic and quantitative framework for identifying models exhibiting inconsistent refusal or safety-regulation behaviors across modality transformations. Its introduction addresses a key shortcoming in prior LLM safety evaluations, which rarely tested models on the same content delivered through different modalities or exposed multi-modal prompt-specific vulnerabilities. The metric is instrumental in:
- Highlighting models that, despite achieving high safety on canonical (text) benchmarks, falter under audio, visual, or compound modality transformations.
- Informing the development of evaluation methodologies and model architectures attuned to omni-modal robustness.
- Providing an actionable analytic for comparing OLLMs, interpreting cross-modal safety trade-offs, and spotlighting areas necessitating targeted safety intervention (Pan et al., 10 Aug 2025).
The adoption of CMSC-score establishes a foundational baseline for systematic cross-modal safety auditing in OLLMs and is poised to influence future paradigm design and regulatory benchmarking within the field.