Mirage Framework: Multi-domain Algorithms
- Mirage Framework is a family of modular, scalable systems designed for diverse tasks including distributed graph mining, GPU scheduling, and LLM evaluation.
- Each variant applies unique methodologies—such as iterative MapReduce for subgraph mining, reinforcement learning for scheduling, and schema-driven approaches for compositional generalization.
- Its open-source implementations and benchmark studies demonstrate superior performance, resource efficiency, and adaptability across computational domains.
The term “Mirage Framework” refers to a family of distinct frameworks and algorithms, each developed for markedly different application domains—ranging from distributed frequent subgraph mining and batch scheduling for GPU clusters, to model optimization for LLM serving, systematic compositional generalization, multimodal reasoning, retrieval-augmented evaluation, agentic misinformation detection, and several others. The following sections detail seminal Mirage frameworks, their technical underpinnings, and domain-specific contributions, as established by peer-reviewed research and preprint literature.
1. Distributed Frequent Subgraph Mining: Iterative MapReduce Mirage (Bhuiyan et al., 2013)
The original MIRAGE framework (“Iterative MapReduce based subGraph Extraction”) provides a scalable solution to frequent subgraph mining (FSM), designed for large-scale graph datasets that cannot fit in main memory. MIRAGE partitions the input dataset into disjoint subsets and performs support counting in parallel across compute nodes using the iterative MapReduce paradigm.
At each iteration , frequent patterns of size generate candidate subgraphs of size :
- Map phase: Each mapper receives a pattern (with min-dfs-code), generates candidates by rightmost path extension, filters duplicates via canonical labeling, and emits key-value pairs if the candidate's occurrence list is non-empty.
- Reduce phase: Reducers aggregate local supports by summing occurrence list lengths, output frequent subgraphs meeting the minimum support threshold.
Support is formally defined as:
where denotes subgraph isomorphism.
MIRAGE incorporates optimizations from FSM literature (min-dfs-code, RMV/RMP extension, partition-based edge filtering), enabling near-linear scalability with cluster size. It outperforms prior MapReduce FSM schemes, especially in heterogeneous, edge-balanced partitions, and is validated on large biological and synthetic graph datasets. Source code is available at www.cs.iupui.edu/~alhasan/software/.
2. Reinforcement Learning for Low-interruption GPU Cluster Scheduling (Ding et al., 2023)
The Mirage framework for batch GPU clusters introduces a reinforcement learning (RL)–driven resource provisioner atop Slurm. The agent observes system state vectors encoding queue, server, and job attributes, maintaining a history window of past states. The RL models—Deep Q-Networks (DQN), policy gradients, and Mixture-of-Experts (MoE) augmentations—learn to minimize interruption by proactively submitting successor jobs before previous segments conclude.
The standard Q-learning update is:
Experiments over three production GPU clusters demonstrated 17–100% reduction in job interruption and 23–76% increase in uninterrupted jobs (depending on cluster and load). Mirage models outperform both reactive (avg-historical) and classical statistical baselines under medium-to-high congestion.
3. LLM Social Role-play Evaluation: Multiverse Interactive MIRAGE (Cai et al., 3 Jan 2025)
MIRAGE (Multiverse Interactive Role-play Ability General Evaluation) evaluates LLMs in complex, interactive, human-mimetic environments via murder mystery simulations. It employs eight scripted scenarios spanning diverse social contexts, with metrics targeting nuanced behavioral competencies:
- Trust Inclination Index (TII):
- Clue Investigation Capability (CIC):
- Interactivity Capability Index (ICI): Assesses reasoning, communication, cooperation, and creativity.
- Script Compliance Index (SCI): Combines direct scoring with Rouge-L metrics against script reconstruction.
Experiments confirmed that even advanced LLM agents (such as GPT-4) struggle with maintaining dynamic trust/suspicion balances, high sequence clue investigation, and fidelity to complex role instructions under context window constraints. Code and datasets are public at https://github.com/lime728/MIRAGE.
4. RAG Benchmark: Metric-Intensive MIRAGE for Component Evaluation (Park et al., 23 Apr 2025)
MIRAGE is a focused benchmark for Retrieval-Augmented Generation (RAG) systems, constructed as a 7,560 QA dataset with a 37,800-item retrieval pool. Each query is paired with pseudo-relevant and distractor document chunks, supporting three evaluation modes: base, oracle, and mixed context. Four principal metrics dissect RAG adaptability:
| Metric | Definition | Interpretation |
|---|---|---|
| Noise Vulnerability | Failures in mixed context but success in oracle context | Sensitivity to noise |
| Context Acceptability | Success in noisy context given correct chunk | Robust utilization |
| Context Insensitivity | Persistent failure regardless of context | LLM limitation |
| Context Misinterpretation | Correct w/o context, incorrect with correct context | Harm by context |
Notably, these metrics sum to unity across the test set, supporting granular attribution analysis. MIRAGE benchmarks retriever-LLM pairings (e.g., GPT-4o/nv-embed-v2), showing substantial variance depending on pairing and context noise, with LLAMA2-7B exhibiting greater context fragility. All data/code at https://github.com/nlpai-lab/MIRAGE.
5. LLM Serving Optimization: Parameter Remapping Mirage (Li et al., 15 Jul 2025)
For multi-tenant LLM serving, MIRAGE introduces a new paradigm for KV cache management by dynamic parameter remapping. Instead of bidirectional cache swapping (with blocking CPU-GPU memory exchange), the approach reclaims memory assigned to static model parameters (which do not update during inference) and repurposes it for growing KV cache demands. This is especially efficient for inactive models in a multi-tenant environment, leveraging high CPU-GPU bandwidth architectures such as NVIDIA’s GH200 Superchip.
Key scheduling constraint:
where is the transfer time per layer, the number of remapped layers, and is per-layer GPU computation time. Evaluation on ShareGPT and Alpaca workloads confirmed substantial reductions in latency (44.8–82.5% in tail time-between-token, 20.7–99.3% in time-to-first-token) and throughput improvements (6.6–86.7%) over vLLM.
6. Compositional Generalization: Dual-Process Mirage (Noviello et al., 25 Jul 2025)
MIRAGE (“Meta-Inference with Rules and Abstractions from Generalized Experience”) operationalizes a dual-process model for systematic compositional generalization. Its architecture, inspired by hippocampus–prefrontal cortex interaction, combines:
- A meta-trained Transformer Neural Decomposer (System 1), which implements prioritized pattern matching and single-step compositional decomposition.
- A Schema Engine (System 2), which performs dynamic extraction/ranking/application of reusable schemas, maintains variable bindings in episodic memory, and iteratively expands compositional sequences.
A schema is formalized as:
with schema ranking via Copeland scores:
MIRAGE achieves >99% accuracy on the SCAN benchmark with only 1.19M transformer parameters, outperforming ablations that omit iterative refinement or schema quality regulation.
7. Other Mirage Frameworks: Multimodal Reasoning, Misinformation Detection, and Parallel RAG
Recent MIRAGE developments include:
- (Dongre et al., 25 Jun 2025) Multimodal expert-guided reasoning in agricultural consultative settings, featuring open-world taxonomy, multi-turn clarification strategy, and a reasoning-scored LLM-as-a-judge pipeline.
- (Wei et al., 25 Aug 2025) Parallel graph-retrieval augmented reasoning chains for medical QA, decomposing queries into entity-grounded subquestions, executing multiple evidence chains, and cross-verifying claims using knowledge graph exploration and majority-based synthesis.
- (Shopnil et al., 20 Oct 2025) Agentic misinformation detection that orchestrates four modules—visual authenticity, cross-modal semantic alignment, retrieval-augmented fact-checking, and calibrated judgment—to yield structured, citation-linked output. Demonstrated F1 score improvements and reduced false positive rates over judge-only or monolithic VLM baselines.
8. Technical Features and Comparative Analysis
While the “Mirage Framework” as a term encompasses diverse algorithms across fields, commonalities include modular, decomposed architectures, emphasis on scalability (MapReduce, actor-based job provisioners, parallel multi-chain inference), formal evaluation protocols, and public availability of code and datasets. The frameworks repeatedly achieve superior resource utilization, interpretability, or adaptation over prior state-of-the-art, especially in domains requiring distributed reasoning, multi-modal assessment, and robust generative-augmented inference.
9. Availability and Future Directions
Source code for major Mirage frameworks is publicly released (e.g., graph mining at www.cs.iupui.edu/~alhasan/software/, LLM role-play and RAG at respective GitHub repositories). Future research directions encompass improved pruning and partitioning strategies (Bhuiyan et al., 2013), adaptation to emerging hardware architectures (Li et al., 15 Jul 2025), extensible multi-chain reasoning protocols (Wei et al., 25 Aug 2025), and integrable agentic modules for misinformation mitigation (Shopnil et al., 20 Oct 2025). The modularization ethos underlying Mirage frameworks across research domains is likely to inform continued advances in scalable reasoning and robust model evaluation.