Papers
Topics
Authors
Recent
2000 character limit reached

MLLM-Assisted Conformity Enhancement (MACE)

Updated 13 January 2026
  • MACE is a prompt-based framework that uses large multimodal language models to enforce strict alignment with visual evidence or group consensus.
  • It filters out non-grounded or hallucinated elements by retaining only tokens and aspects visually supported or inferable from input data.
  • In multi-agent settings, MACE balances self-confidence and peer influence via confidence-normalized pooling to achieve robust consensus accuracy.

MLLM-Assisted Conformity Enhancement (MACE) is a prompt-based method that leverages large multimodal LLMs (MLLMs) or ensembles of LLM-based agents to enforce strict alignment of multimodal outputs with external schema, visual evidence, or prevailing group judgments. Originally applied for curation and grounding of noisy item listings in e-commerce, and later formalized in the context of collective LLM agent systems, MACE eliminates non-grounded or hallucinated elements, ensuring robust, visually justified, and schema-compliant results (Zhang et al., 13 Aug 2025, Han et al., 9 Jan 2026).

1. Conceptual Foundations

MACE operationalizes “conformity” as the systematic enforcement of agreement between multimodal model outputs and either explicit visual evidence (in e-commerce applications) or group-level consensus (in multi-agent LLM systems). In the e-commerce context, it constitutes a filter removing non-visual or spurious information from raw image–text–aspect triplets—ensuring all retained tokens and key–value pairs are visually supported or inferable. In LLM multi-agent systems, MACE formalizes how agents shift their predictions towards the prevailing judgments in a networked population, balancing self-confidence and peer influence to maximize consensus accuracy while mitigating the risk of collective error cascades (Zhang et al., 13 Aug 2025, Han et al., 9 Jan 2026).

2. Task Formulation and Mathematical Structure

In e-commerce vision–language applications (Zhang et al., 13 Aug 2025), the formal MACE task is specified as follows:

  • Inputs:
    • xx: product image (or frozen visual embedding)
    • y0y^0: noisy/original title string
    • A0={(ki,vi)}iA^0 = \{ (k_i, v_i) \}_i: set of raw aspect key–value pairs
    • SS: platform schema (allowed aspect keys)
  • Outputs:
    • yry^r: title rewritten to include only visually inferable tokens
    • ArA0A^r \subseteq A^0: subset of aspects confirmed from xx
  • Visual Grounding Predicate:
    • I(x,t){0,1}I(x, t) \in \{0,1\}: whether token tt is visually supported by xx
    • I(x,(k,v)){0,1}I(x, (k,v)) \in \{0,1\}: whether value vv is visually confirmable for key kk
  • Mathematical Formulation:
    • yr=concat[ttokenize(y0):I(x,t)=1]y^r = \text{concat}[ t \in \text{tokenize}(y^0) : I(x, t) = 1 ]
    • Ar={(k,v)A0:I(x,(k,v))=1kS}A^r = \{ (k,v) \in A^0 : I(x, (k,v))=1 \land k \in S\}

In multi-agent systems (Han et al., 9 Jan 2026), MACE's conformity mechanism is instantiated by confidence-normalized pooling, recursively updating each agent’s support score:

si(t+1)=αpi(t)yi(t)+(1α)jN(i)pj(t)yj(t)αpi(t)+(1α)jN(i)pj(t)+εs_i^{(t+1)} = \frac{ \alpha\,p_i^{(t)}\,y_i^{(t)} + (1-\alpha)\sum_{j\in N(i)}p_j^{(t)}\,y_j^{(t)} } { \alpha\,p_i^{(t)} + (1-\alpha)\sum_{j\in N(i)}p_j^{(t)} + \varepsilon }

yi(t+1)=1{si(t+1)τ}y_i^{(t+1)} = \mathbf{1}\big\{s_i^{(t+1)}\ge\tau\big\}

where α\alpha (self-weight) and β\beta (social-weight) modulate the trade-off between self-reliance and conformity.

3. MACE Algorithmic Workflows

MACE is implemented as a deterministic, zero-shot prompt applied to each raw image–title–aspect example:

  • For each (x,y0,A0)(x, y^0, A^0):
    • Prompt a large VLM (e.g., InternVL2.5-78B):
    • “Given image xx and its title y0y^0 and aspects A0A^0,
    • 1) rewrite the title to remove tokens not grounded in the image,
    • 2) remove aspects not visually confirmed.”
    • Parse response (yr,Ar)(y^r, A^r) in JSON format.
    • Validate ArA^r against schema SS.
    • Add (x,yr,Ar)(x, y^r, A^r) to the refined dataset.

Agents are nodes in a network with specific topology (e.g., star, ring, complete graph):

  • At each timestep:
    • Each agent updates its internal state by integrating self-score and neighbor inputs using the confidence-normalized pooling rule.
    • Update iterations continue until unanimity or a preset maximum number of rounds.
  • Two canonical protocols:
    • Centralized Aggregation: One-shot, hub-agent computes final decision from leaf submissions, used where a trusted “expert” agent is available.
    • Distributed Consensus: Iterative peer-to-peer pooling allows group convergence, enhancing robustness but potentially introducing latency and synchronization cost.

4. Empirical Performance and Metrics

Empirical evaluation demonstrates that MACE significantly enhances groundedness and compliance:

Model Rouge-L Aspect F1 Schema Recall
LLaVA-NeXT-7B Baseline 0.36 0.33 0.49
LLaVA-NeXT-7B + MACE 0.43 0.35 0.53
  • MACE yields a +19% improvement in Rouge-L, +6% in aspect F1, +8% in schema recall over baselines.
  • Qualitative analysis shows removal of non-visual tokens (e.g., “Size 12”, “SHIP FAST”) and non-confirmable aspects.

Key performance measures examined include:

  • Final Accuracy (FA)
  • Time-to-Consensus (TTC)
  • Conformity Index (CI), Average CI (ACI)
  • Center–Periphery Consistency (CPC)

Topological and parameter dependencies:

  • Distributed settings with m=4m=4 neighbors, α=0.75\alpha=0.75 yield a robust balance: TTC4TTC \approx 4, FA0.78FA \approx 0.78–$0.83$.
  • Centralized aggregation achieves highest CA at α0.75\alpha \approx 0.75; over-conformity (α\alpha low) increases cascade risk.

5. Design Principles and Topological Trade-offs

Self–Social Weighting

  • Moderate α\alpha (0.5\approx 0.5): Balanced speed and robustness.
  • High α\alpha ($0.75$–$1.0$): Maximizes reliability, slows convergence.
  • Low α\alpha ($0.25$): Prioritizes speed, increases susceptibility to information cascades.

Topology Choices

Topology Speed Robustness Failure Mode
Centralized (Star/Hierarchy) Fastest Hub-dependent Single-point-of-failure, alignment bias
Moderately Connected (Ring m=4m=4) Balanced High Manageable cascade risk
Fully Connected (Complete) Fastest consensus High conformity Highest risk of wrong cascades

Cascade Mitigation

  • Limit connectivity (m4m \leq 4).
  • Maintain moderate–high α\alpha (0.5\geq 0.5).
  • Implement dissent tokens for abrupt surges in agent confidence.
  • Periodic injection of trusted-oracle signals.
  • Employ dynamic decision thresholds (τ>0.5\tau > 0.5) in late rounds to require super-majority.

6. Implementation Details and Example Code

  • Data preparation: ∼1M raw samples yield ∼890k MACE-refined pairs.
  • Prompting: InternVL2.5-78B, structured JSON output, single turn per example.
  • Downstream tuning: LLM parameters fully fine-tuned on MACE output; vision encoder frozen; LLaVA-NeXT-7B, Qwen2-VL-7B, InternVL2.5-8B evaluated.

Pseudocode for distributed conformity dynamics:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def pooled_score(agent, neighbors, alpha, beta, eps=1e-6):
    s_self = agent.p * agent.y
    c_neighbors = [nbr.p * nbr.y for nbr in neighbors]
    num_n = len(neighbors)
    num = alpha * s_self + beta * sum(c_neighbors)
    den = alpha + beta * num_n + eps
    return num / den

def update_agents(agents, adjacency, alpha, beta, tau=0.5):
    new_states = []
    for i, agent in enumerate(agents):
        nbrs = [agents[j] for j in adjacency[i]]
        s_new = pooled_score(agent, nbrs, alpha, beta)
        y_new = 1 if s_new >= tau else 0
        p_new = float(s_new)
        new_states.append((y_new, p_new))
    for agent, (y_new, p_new) in zip(agents, new_states):
        agent.y, agent.p = y_new, p_new

7. Significance, Limitations, and Generalizations

MACE offers a universal, prompt-based framework for curating and confidence-weighting agent or model outputs under structured conformity constraints. In vision–language alignment, it eliminates hallucinations and enforces schema adherence, narrowing the gap between text and image understanding in downstream model training (Zhang et al., 13 Aug 2025). In LLM-based MAS, MACE enables precise control of convergence, accuracy, and robustness through principled trade-offs among topology, self-social weighting, and consensus thresholds; however, cascade risks and alignment biases must be carefully managed through architectural and procedural safeguards (Han et al., 9 Jan 2026). A plausible implication is that future applications of MACE may generalize to additional modalities or multi-level consensus protocols, embedded in more heterogeneous and adversarial environments.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to MLLM-Assisted Conformity Enhancement (MACE).