Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 110 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 469 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs (2507.16663v1)

Published 22 Jul 2025 in cs.CL and cs.AI

Abstract: Despite efforts to unify multimodal generation and understanding tasks in a single model, we show these MLLMs exhibit self-contradiction where generation produces images deemed misaligned with input prompts based on the model's own understanding. We define a Nonunified score that quantifies such self-contradiction. Our empirical results reveal that the self-contradiction mainly arises from weak generation that fails to align with prompts, rather than misunderstanding. This capability asymmetry indicates the potential of leveraging self-contradiction for self-improvement, where the stronger model understanding guides the weaker generation to mitigate the generation-understanding gap. Applying standard post-training methods (e.g., SFT, DPO) with such internal supervision successfully improves both generation and unification. We discover a co-improvement effect on both generation and understanding when only fine-tuning the generation branch, a phenomenon known in pre-training but underexplored in post-training. Our analysis shows improvements stem from better detection of false positives that are previously incorrectly identified as prompt-aligned. Theoretically, we show the aligned training dynamics between generation and understanding allow reduced prompt-misaligned generations to also improve mismatch detection in the understanding branch. Additionally, the framework reveals a potential risk of co-degradation under poor supervision-an overlooked phenomenon that is empirically validated in our experiments. Notably, we find intrinsic metrics like Nonunified score cannot distinguish co-degradation from co-improvement, which highlights the necessity of data quality check. Finally, we propose a curriculum-based strategy based on our findings that gradually introduces harder samples as the model improves, leading to better unification and improved MLLM generation and understanding.

Summary

  • The paper introduces a novel framework leveraging self-contradiction in MLLMs as an internal signal to enhance generation quality.
  • It adapts post-training methods like SFT and DPO by using the model's stronger understanding branch as a reward model, leading to co-improvement of both branches.
  • A curriculum-based learning strategy is applied to progressively challenge the model, significantly boosting unification between generation and understanding.

Self-Contradiction as Self-Improvement: Analyzing the Generation-Understanding Gap in MLLMs

The paper "Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs" introduces a novel approach to improving multimodal LLMs (MLLMs) by leveraging a self-contradiction phenomenon. This method turns the generation-understanding disparity into a tool for self-improvement, allowing models to refine their performance without external supervision.

Defining and Quantifying Self-Contradiction

MLLMs, despite claims of unifying multimodal tasks, often exhibit self-contradiction—where the model's generated images do not align with the input prompts according to its own understanding branch. A Nonunified score quantifies this disparity by measuring the frequency at which the understanding branch identifies a generated image as misaligned with its respective prompt. Empirical evidence shows that self-contradiction mainly arises from weak generation rather than misunderstanding, highlighting an internal capability asymmetry. Figure 1

Figure 1: Self-contradiction in MLLMs.

Exploiting Self-Contradiction for Self-Improvement

The authors propose leveraging the stronger understanding branch as a guiding force to enhance the weaker generation branch. Standard post-training methods like supervised fine-tuning (SFT) and direct preference optimization (DPO) are adapted to treat the understanding branch as an internal reward model. This approach leads to significant improvements in both generation quality and MLLM unification, demonstrating that internal consistency can be achieved without external signals.

Co-Improvement and Co-Degradation Dynamics

A novel phenomenon of co-improvement is observed: when only the generation branch is targeted for improvement, understanding also benefits. This is traced back to shared training dynamics where the understanding branch becomes better at detecting false positives. The alignment between generation and understanding can, however, also result in co-degradation under poor supervision. Intrinsic metrics, such as the Nonunified score, cannot differentiate between these effects, necessitating a data quality check prior to post-training. Figure 2

Figure 2

Figure 2: Co-degradation of generation and understanding.

Curriculum-based Strategy for Enhanced Self-Improvement

To further harness the synergy between generation and understanding, a curriculum-based learning strategy is introduced. This method gradually introduces harder samples as the model's capabilities improve, using a dynamic dataset that evolves alongside the model. This curriculum-based approach significantly enhances the model's generation and understanding capabilities while promoting greater unification between the two. Figure 3

Figure 3: Pipeline of the CLO-based fine-tuning method.

Conclusion

The findings of this paper have significant implications for the development of more robust and unified MLLMs. By turning self-contradiction into a self-supervision mechanism, MLLMs can achieve better alignment and improved performance across multimodal tasks. This self-improvement framework opens new pathways for future research, particularly in exploring the potential of internal signals to mitigate gaps in model capabilities and enhancing self-improvement processes with curriculum learning principles.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.