MLLM-Assisted Conformity Enhancement (MACE)

Updated 13 January 2026

MACE is a prompt-based framework that uses large multimodal language models to enforce strict alignment with visual evidence or group consensus.
It filters out non-grounded or hallucinated elements by retaining only tokens and aspects visually supported or inferable from input data.
In multi-agent settings, MACE balances self-confidence and peer influence via confidence-normalized pooling to achieve robust consensus accuracy.

MLLM-Assisted Conformity Enhancement (MACE) is a prompt-based method that leverages large multimodal LLMs (MLLMs) or ensembles of LLM-based agents to enforce strict alignment of multimodal outputs with external schema, visual evidence, or prevailing group judgments. Originally applied for curation and grounding of noisy item listings in e-commerce, and later formalized in the context of collective LLM agent systems, MACE eliminates non-grounded or hallucinated elements, ensuring robust, visually justified, and schema-compliant results (Zhang et al., 13 Aug 2025, Han et al., 9 Jan 2026).

1. Conceptual Foundations

MACE operationalizes “conformity” as the systematic enforcement of agreement between multimodal model outputs and either explicit visual evidence (in e-commerce applications) or group-level consensus (in multi-agent LLM systems). In the e-commerce context, it constitutes a filter removing non-visual or spurious information from raw image–text–aspect triplets—ensuring all retained tokens and key–value pairs are visually supported or inferable. In LLM multi-agent systems, MACE formalizes how agents shift their predictions towards the prevailing judgments in a networked population, balancing self-confidence and peer influence to maximize consensus accuracy while mitigating the risk of collective error cascades (Zhang et al., 13 Aug 2025, Han et al., 9 Jan 2026).

2. Task Formulation and Mathematical Structure

In e-commerce vision–language applications (Zhang et al., 13 Aug 2025), the formal MACE task is specified as follows:

Inputs:
- $x$ : product image (or frozen visual embedding)
- $y^0$ : noisy/original title string
- $A^0 = \{ (k_i, v_i) \}_i$ : set of raw aspect key–value pairs
- $S$ : platform schema (allowed aspect keys)
Outputs:
- $y^r$ : title rewritten to include only visually inferable tokens
- $A^r \subseteq A^0$ : subset of aspects confirmed from $x$
Visual Grounding Predicate:
- $I(x, t) \in \{0,1\}$ : whether token $t$ is visually supported by $x$
- $I(x, (k,v)) \in \{0,1\}$ : whether value $v$ is visually confirmable for key $k$
Mathematical Formulation:
- $y^r = \text{concat}[ t \in \text{tokenize}(y^0) : I(x, t) = 1 ]$
- $A^r = \{ (k,v) \in A^0 : I(x, (k,v))=1 \land k \in S\}$

In multi-agent systems (Han et al., 9 Jan 2026), MACE's conformity mechanism is instantiated by confidence-normalized pooling, recursively updating each agent’s support score:

$s_i^{(t+1)} = \frac{ \alpha\,p_i^{(t)}\,y_i^{(t)} + (1-\alpha)\sum_{j\in N(i)}p_j^{(t)}\,y_j^{(t)} } { \alpha\,p_i^{(t)} + (1-\alpha)\sum_{j\in N(i)}p_j^{(t)} + \varepsilon }$

$y_i^{(t+1)} = \mathbf{1}\big\{s_i^{(t+1)}\ge\tau\big\}$

where $\alpha$ (self-weight) and $\beta$ (social-weight) modulate the trade-off between self-reliance and conformity.

3. MACE Algorithmic Workflows

MACE is implemented as a deterministic, zero-shot prompt applied to each raw image–title–aspect example:

For each $(x, y^0, A^0)$ $(x, y^{0}, A^{0})$ :
- Prompt a large VLM (e.g., InternVL2.5-78B):
- “Given image $x$ and its title $y^0$ and aspects $A^0$ ,
- 1) rewrite the title to remove tokens not grounded in the image,
- 2) remove aspects not visually confirmed.”
- Parse response $(y^r, A^r)$ in JSON format.
- Validate $A^r$ against schema $S$ .
- Add $(x, y^r, A^r)$ to the refined dataset.

Agents are nodes in a network with specific topology (e.g., star, ring, complete graph):

At each timestep:
- Each agent updates its internal state by integrating self-score and neighbor inputs using the confidence-normalized pooling rule.
- Update iterations continue until unanimity or a preset maximum number of rounds.
Two canonical protocols:
- Centralized Aggregation: One-shot, hub-agent computes final decision from leaf submissions, used where a trusted “expert” agent is available.
- Distributed Consensus: Iterative peer-to-peer pooling allows group convergence, enhancing robustness but potentially introducing latency and synchronization cost.

4. Empirical Performance and Metrics

Empirical evaluation demonstrates that MACE significantly enhances groundedness and compliance:

Model	Rouge-L	Aspect F1	Schema Recall
LLaVA-NeXT-7B Baseline	0.36	0.33	0.49
LLaVA-NeXT-7B + MACE	0.43	0.35	0.53

MACE yields a +19% improvement in Rouge-L, +6% in aspect F1, +8% in schema recall over baselines.
Qualitative analysis shows removal of non-visual tokens (e.g., “Size 12”, “SHIP FAST”) and non-confirmable aspects.

Key performance measures examined include:

Final Accuracy (FA)
Time-to-Consensus (TTC)
Conformity Index (CI), Average CI (ACI)
Center–Periphery Consistency (CPC)

Topological and parameter dependencies:

Distributed settings with $m=4$ neighbors, $\alpha=0.75$ yield a robust balance: $TTC \approx 4$ , $FA \approx 0.78$ –$0.83$.
Centralized aggregation achieves highest CA at $\alpha \approx 0.75$ ; over-conformity ( $\alpha$ low) increases cascade risk.

5. Design Principles and Topological Trade-offs

Moderate $\alpha$ ( $\approx 0.5$ ): Balanced speed and robustness.
High $\alpha$ ($0.75$–$1.0$): Maximizes reliability, slows convergence.
Low $\alpha$ ($0.25$): Prioritizes speed, increases susceptibility to information cascades.

Topology Choices

Topology	Speed	Robustness	Failure Mode
Centralized (Star/Hierarchy)	Fastest	Hub-dependent	Single-point-of-failure, alignment bias
Moderately Connected (Ring $m=4$ )	Balanced	High	Manageable cascade risk
Fully Connected (Complete)	Fastest consensus	High conformity	Highest risk of wrong cascades

Cascade Mitigation

Limit connectivity ( $m \leq 4$ ).
Maintain moderate–high $\alpha$ ( $\geq 0.5$ ).
Implement dissent tokens for abrupt surges in agent confidence.
Periodic injection of trusted-oracle signals.
Employ dynamic decision thresholds ( $\tau > 0.5$ ) in late rounds to require super-majority.

6. Implementation Details and Example Code

Data preparation: ∼1M raw samples yield ∼890k MACE-refined pairs.
Prompting: InternVL2.5-78B, structured JSON output, single turn per example.
Downstream tuning: LLM parameters fully fine-tuned on MACE output; vision encoder frozen; LLaVA-NeXT-7B, Qwen2-VL-7B, InternVL2.5-8B evaluated.

Pseudocode for distributed conformity dynamics:

def pooled_score(agent, neighbors, alpha, beta, eps=1e-6):
    s_self = agent.p * agent.y
    c_neighbors = [nbr.p * nbr.y for nbr in neighbors]
    num_n = len(neighbors)
    num = alpha * s_self + beta * sum(c_neighbors)
    den = alpha + beta * num_n + eps
    return num / den

def update_agents(agents, adjacency, alpha, beta, tau=0.5):
    new_states = []
    for i, agent in enumerate(agents):
        nbrs = [agents[j] for j in adjacency[i]]
        s_new = pooled_score(agent, nbrs, alpha, beta)
        y_new = 1 if s_new >= tau else 0
        p_new = float(s_new)
        new_states.append((y_new, p_new))
    for agent, (y_new, p_new) in zip(agents, new_states):
        agent.y, agent.p = y_new, p_new

7. Significance, Limitations, and Generalizations

MACE offers a universal, prompt-based framework for curating and confidence-weighting agent or model outputs under structured conformity constraints. In vision–language alignment, it eliminates hallucinations and enforces schema adherence, narrowing the gap between text and image understanding in downstream model training (Zhang et al., 13 Aug 2025). In LLM-based MAS, MACE enables precise control of convergence, accuracy, and robustness through principled trade-offs among topology, self-social weighting, and consensus thresholds; however, cascade risks and alignment biases must be carefully managed through architectural and procedural safeguards (Han et al., 9 Jan 2026). A plausible implication is that future applications of MACE may generalize to additional modalities or multi-level consensus protocols, embedded in more heterogeneous and adversarial environments.

PDF Markdown Chat (Pro)

References (2)

Bridging Modality Gaps in e-Commerce Products via Vision-Language Alignment (2025)

Conformity Dynamics in LLM Multi-Agent Systems: The Roles of Topology and Self-Social Weighting (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to MLLM-Assisted Conformity Enhancement (MACE).

MLLM-Assisted Conformity Enhancement (MACE)

1. Conceptual Foundations

2. Task Formulation and Mathematical Structure

3. MACE Algorithmic Workflows

E-Commerce Conformity Enhancement (Zhang et al., 13 Aug 2025)

Multi-Agent Conformity Dynamics (Han et al., 9 Jan 2026)

4. Empirical Performance and Metrics

E-Commerce Experiments (Zhang et al., 13 Aug 2025)

Multi-Agent MAS Metrics (Han et al., 9 Jan 2026)

5. Design Principles and Topological Trade-offs

Topology Choices

Cascade Mitigation

6. Implementation Details and Example Code

E-Commerce Prompting (Zhang et al., 13 Aug 2025)

Multi-Agent System Code (Han et al., 9 Jan 2026)

7. Significance, Limitations, and Generalizations

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

MLLM-Assisted Conformity Enhancement (MACE)

1. Conceptual Foundations

2. Task Formulation and Mathematical Structure

3. MACE Algorithmic Workflows

E-Commerce Conformity Enhancement (Zhang et al., 13 Aug 2025)

Multi-Agent Conformity Dynamics (Han et al., 9 Jan 2026)

4. Empirical Performance and Metrics

E-Commerce Experiments (Zhang et al., 13 Aug 2025)

Multi-Agent MAS Metrics (Han et al., 9 Jan 2026)

5. Design Principles and Topological Trade-offs

Self–Social Weighting

Topology Choices

Cascade Mitigation

6. Implementation Details and Example Code

E-Commerce Prompting (Zhang et al., 13 Aug 2025)

Multi-Agent System Code (Han et al., 9 Jan 2026)

7. Significance, Limitations, and Generalizations

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics