Omni-Preference: A Multimodal Evaluation Paradigm

Updated 7 February 2026

Omni-Preference is a framework that generalizes preference evaluations with flexible, multimodal criteria across text, images, audio, and more.
It utilizes automated data generation pipelines and teacher-model annotations to achieve scalable and high-confidence pairwise comparisons across diverse domains.
Mathematical models, including discriminative and generative objectives, underpin its robust alignment of system outputs with structured, rubric-based evaluations.

Omni-Preference encompasses a class of frameworks, datasets, mathematical models, and training objectives for representing and eliciting preferences across heterogeneous, often high-dimensional, input domains—spanning text, images, video, audio, and even physiological stimuli—while supporting structurally flexible, multi-criterion, or entirely free-form evaluation. Across recent research in LLMs, reward modeling, generative diffusion, and neuroeconomics, Omni-Preference solutions provide mechanisms to synthesize, annotate, and learn from preference data that capture the richness of user intent (e.g., grounded in rubrics, support for arbitrary criteria, or physiological observables) and the complexity of multimodal outputs. The aim is to enable automated, scalable, and interpretable alignment of generative and reasoning systems with nuanced and context-dependent notions of quality, faithfulness, safety, or utility.

1. Core Concepts and Definitions

At its foundation, the Omni-Preference paradigm seeks to generalize the notion of a "preference" beyond statically defined, unidimensional, or single-modality settings.

General form: A preference is an ordering or assignment of value to outputs or choices, possibly conditioned on a context and an axis of evaluation.
Multimodality: Preferences can be elicited or deployed on outputs in text, images, video, audio, 3D, or even arbitrary sensor data. Recent frameworks require support for pairwise or scalar scoring across any of these representation spaces (Kong et al., 31 Jan 2026, Jin et al., 27 Oct 2025, Chen et al., 31 Aug 2025, Liu et al., 2024).
Flexible criteria: Instead of optimizing for a hard-wired notion of helpfulness, harmlessness, or image realism, Omni-Preference systems provide for arbitrary, even free-form, human- or machine-specified criteria at inference or evaluation time (Jin et al., 27 Oct 2025).
Rubric- and facet-grounding: Preference judgments may be decomposed by rubrics (e.g., fluency, relevance, accuracy, reasoning, safety, visual grounding, acoustic fidelity), with structured rationales justifying each comparative score (Kong et al., 31 Jan 2026).

An archetypal Omni-Preference dataset consists of tuples:

$(c, x, y_A, y_B, p, S_A, S_B, J)$

where $c$ is the criterion (possibly free-form), $x$ is the input/query, $y_A$ and $y_B$ are candidate outputs, $p$ is the preferred choice, $S_A, S_B$ are scalar or multidimensional scores, and $J$ is a structured, possibly rubric-decomposed, justification.

2. Automated Omni-Preference Data Generation and Pipelines

Modern frameworks operationalize Omni-Preference via large-scale, automated pipelines:

Pair synthesis via capability gap: Generate responses $y_A, y_B$ to an input $x$ by systematically contrasting a "strong" generator model $M_s$ with a "weak" $M_w$ , where $y_A = M_s(x)$ and $y_B = M_w(x)$ (Kong et al., 31 Jan 2026).
Teacher model annotation: Pairs are scored by high-capacity LLMs independently, producing overall verdicts, numeric scores, and rubric-grounded rationales; reconciliation, filtering, and merging protocols ensure high-confidence supervision without human labeling (Kong et al., 31 Jan 2026).
Multimodal benchmark coverage: Datasets span domains, e.g., HH-RLHF prompts for text, RLAIF-V and VL-RewardBench for images, ActivityNet/ShareGPT-Video for video, Clotho-AQA and Audio-HH-RLHF for audio (Kong et al., 31 Jan 2026, Jin et al., 27 Oct 2025, Chen et al., 31 Aug 2025, Liu et al., 2024).
Free-form criterion augmentation: Automatic synthesis of arbitrary evaluation prompts or axes, e.g., by prompting GPT-4o to explain criteria and verifying using secondary models, facilitating instruction-tuning with wide epistemic coverage (Jin et al., 27 Oct 2025).
Quality filtering and reconciliation: Use of rule-based filters for duplication, JSON validity, verdict-score consistency, and low-information examples ensures strong integrity in pairwise and scalar preference data (Kong et al., 31 Jan 2026).

3. Mathematical Formulations and Training Objectives

The mathematical underpinnings of Omni-Preference frameworks span both classical and novel preference modeling approaches.

Discriminative objectives: The Bradley–Terry model is used for binary preference learning, with loss

$\mathcal{L}_\mathrm{BT} = -\log \frac{\exp(r(c, x, y_p))}{\exp(r(c,x, y_1))+\exp(r(c,x, y_2))}$

where $r$ scores the candidate conditioned on all inputs and the criterion (Jin et al., 27 Oct 2025).

Generative objectives: Models may generate chain-of-thought explanations $e$ and a preference decision $p'$ , using RL (e.g., Group Relative Policy Optimization/GRPO) to maximize expected agreement with reference preferences, with explicit KL regularization (Jin et al., 27 Oct 2025, Kong et al., 31 Jan 2026).
Rubric-decomposed reward: Composite reward signals incentivize correct verdicts, score consistency, and rubric justification coverage, as in:

$r_i = w_\mathrm{fmt} R_\mathrm{fmt}(y_i) + w_\mathrm{pref} R_\mathrm{pref}(y_i) + w_\mathrm{rub} R_\mathrm{rub}(y_i)$

for each output $y_i$ , with advantages normalized within sampled groups (Kong et al., 31 Jan 2026).

DPO for multimodal diffusion: In text-to-video settings, the DPO contrastive loss is extended to pairs generated by maximizing a composite "OmniScore"—a weighted aggregate of visual (intra- and inter-frame) and semantic alignment sub-scores (Liu et al., 2024).
Neuroeconomic hyperfunction: Individual omni-preference can be represented as a hyperfunction $F(\xi)$ , defined by boundary values of complex-analytic functions, mapping all possible stimuli $\xi$ to real values $V = F(\xi)$ , fully reconstructible from physiological data such as neuron interspike intervals (Shapiro, 2011).

4. Rubrics, Criteria, and Multi-Dimensional Supervision

In contrast to scalar reward or unidimensional preference, Omni-Preference approaches impose or elicit structured, rubric-driven supervision:

Criterion	Definition	Example Modality-Specific Facet
Fluency	Clarity, grammar, conciseness	-
Relevance	Prompt/scene fidelity	Visual/temporal grounding (vision/video)
Accuracy	Factual correctness, completeness	Acoustic fidelity (audio)
Reasoning	Logical structure, inference steps	-
Safety	No harmful or disallowed content	-

Rubrics may be modality-agnostic or include facets specific to, e.g., frame consistency (video), object grounding (vision), or speech-text mapping (audio) (Kong et al., 31 Jan 2026). Free-form criteria allow for user- or model-specified axes at inference time, supporting fine-tuned evaluation and downstream task adaptation (Jin et al., 27 Oct 2025).

5. Representative Datasets and Benchmarks

Omni-Preference: 41,000 high-confidence pairwise comparisons across image, video, audio, and text. Preference pairs annotated with rubric-based criteria by multiple LLM teachers and reconciled (Kong et al., 31 Jan 2026).
Omni-RewardData: 317,000 pairs (248k general, 69k instruction-tuning with synthesized criteria), supporting discriminative and generative preference learning (Jin et al., 27 Oct 2025).
Omni-RewardBench: Evaluation covering nine tasks, five modalities (T2T, TI2T, TV2T, TA2T, T2I, T2V, T2A, T23D, TI2I), annotated for 1–10 free-form criteria per pair (Jin et al., 27 Oct 2025).
VideoDPO: Automatically constructed video preference pairs using the "OmniScore," quantifying both visual and semantic quality in text-to-video generation (Liu et al., 2024).
Neuroeconomic datasets: Physiological measurements (e.g., spike train histograms) linked directly to value assignments under the hyperfunction framework (Shapiro, 2011).

6. Empirical Results and Analysis

Empirical evaluations demonstrate that Omni-Preference-based reward models substantially improve both accuracy and interpretability over unimodal or rigidity-constrained baselines.

Multimodal reward modeling: Omni-RRM achieves 71.8% mean preference accuracy over image (e.g., 67.1% on VL-RewardBench), video (80.2% on ShareGPT-V), and audio (66.8% on Audio-HH-RLHF), with state-of-the-art results for open-source models and significant ablation improvements due to rubric grounding and reinforcement learning on hard pairs (Kong et al., 31 Jan 2026).
Generalist RM performance: Omni-RewardModel-BT achieves 73.68% overall accuracy on Omni-RewardBench, compared to 62.18% for base models, with superior robustness to mixed-modality training and free-form evaluation (Jin et al., 27 Oct 2025).
Preference alignment in diffusion: VideoDPO delivers marked improvements on both quality and semantic measures (e.g., VC2 baseline 80.44% VBench-Total vs. 81.93% for VideoDPO), validated by both synthetic and qualitative benchmarks (Liu et al., 2024).
Neurophysiological comprehensiveness: Hyperfunction-based omni-preference models provably admit all possible preference geometries and are experimentally reconstructible from neural statistics (Shapiro, 2011).

7. Theoretical and Foundational Interpretations

Omni-Preference carries both operational and theoretical significance:

No reliance on hardwired metrics: Allows arbitrary, context-aware extension to new domains, modalities, and axes of judgment (e.g., safety, trustworthiness, artistic merit).
Automated, scalable supervision: Enables preference elicitation at web-scale without the need for bespoke human annotation, leveraging bootstrapping from multiple model capabilities (Kong et al., 31 Jan 2026, Jin et al., 27 Oct 2025).
Axiomatic completeness: Hyperfunctional modeling ensures no loss of expressive power in representing infinite-dimensional or discontinuous preference sets (Shapiro, 2011).
Interpretability: Rubric- or chain-of-thought-based explanations provide criterion- and dimension-level insight into model decisions (Kong et al., 31 Jan 2026, Jin et al., 27 Oct 2025).
Future directions: Extending frameworks to additional modalities (e.g., tactile, multi-agent settings), more sophisticated end-to-end joint annotation, and parameter-efficient adaptation remain open research areas (Chen et al., 31 Aug 2025).

Omni-Preference thus forms the basis for state-of-the-art multimodal reward modeling, scalable system alignment, and mathematically rigorous representations of both machine and human valuation across arbitrary input domains.

Markdown Upgrade to Chat

References (5)

Omni-RRM: Advancing Omni Reward Modeling via Automatic Rubric-Grounded Preference Synthesis (2026)

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences (2025)

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination (2025)

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation (2024)

Generalized Functions & Experimental Methods of Obtaining Statistical Variable-Quantities Which Fully Determine Preferences in Choice-Rich Environments (2011)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Omni-Preference.

Omni-Preference: A Multimodal Evaluation Paradigm

1. Core Concepts and Definitions

2. Automated Omni-Preference Data Generation and Pipelines

3. Mathematical Formulations and Training Objectives

4. Rubrics, Criteria, and Multi-Dimensional Supervision

5. Representative Datasets and Benchmarks

6. Empirical Results and Analysis

7. Theoretical and Foundational Interpretations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Omni-Preference: A Multimodal Evaluation Paradigm

1. Core Concepts and Definitions

2. Automated Omni-Preference Data Generation and Pipelines

3. Mathematical Formulations and Training Objectives

4. Rubrics, Criteria, and Multi-Dimensional Supervision

5. Representative Datasets and Benchmarks

6. Empirical Results and Analysis

7. Theoretical and Foundational Interpretations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research