Adaptive Diversity Cache (ADC)

Updated 1 December 2025

Adaptive Diversity Cache (ADC) is a caching approach that integrates content importance with semantic diversity to adapt to non-uniform data distributions and resource constraints.
It employs strategies such as frequency-aware scaling and joint selection functions to optimize cache allocations for diverse applications like computer vision and edge streaming.
Empirical evaluations demonstrate that ADC improves performance metrics, including detection mAP, inference speed, and reduced network stall times across multiple domains.

The Adaptive Diversity Cache (ADC) is a class of mechanisms and algorithms that utilize diversity-driven caching strategies to improve performance in a variety of resource-constrained inference and transmission systems. ADC frameworks integrate selection and allocation policies that optimize not solely for content importance, but also for representational or semantic diversity, potentially adapting in real-time to non-uniform data distributions, task imbalance, head redundancy, or networking context. This entry surveys the core principles, algorithmic designs, operational workflows, and experimental validations of ADCs across modern computer vision, streaming, diffusion modeling, and networking applications.

1. Fundamental Principles of Adaptive Diversity Caching

ADC methods are unified by a core principle: caching resources (memory, key-value pairs, features, file segments, etc.) should be adaptively assigned and carefully curated to balance the retention of high-utility or high-confidence representations with explicit coverage of underrepresented, rare, or semantically distinct entities. Unlike naive or greedy cache strategies that emphasize only the most frequently seen or "important" elements, ADCs introduce mechanisms—diversity-aware selection, frequency-aware resizing, or dynamic trade-offs—to mitigate information loss due to redundancy, long-tail distributions, or semantic collapse.

This principle manifests in multiple forms:

Per-class or per-type cache allocation that grows for rare events (Jiang et al., 24 Nov 2025)
Joint scoring functions that combine confidence/importance with spread or uniqueness (Jiang et al., 24 Nov 2025, Liu et al., 23 Oct 2025)
Scheduling strategies that adapt cache update frequency or selection to runtime data statistics (Bu et al., 24 Aug 2025, Liu et al., 2017)

ADCs contrast with static caching by evolving allocation, replacement, or compression rules in response to data-driven signals—including request distributions in networking, attention head redundancy in transformer caches, or feature-space distances in vision models.

2. Algorithmic Instantiations across Domains

2.1. Human-Object Interaction (HOI) Detection

In HOI detection, ADC is implemented as a training-free plug-and-play inference module that builds and adaptively manages per-class caches of features and logits (Jiang et al., 24 Nov 2025). Its architecture comprises:

Class-Specific Joint Confidence-Diversity Selection (CJCS): Each interaction class's cache retains up to $K$ feature–logit pairs, scored by a linear mixture:

$S_{\text{joint}}(f_k, h_k) = \tau S_{\text{div}}(f_k) + (1-\tau) S_{\text{conf}}(h_k)$

where $S_{\text{div}}$ penalizes cosine similarity and $S_{\text{conf}}$ reflects logit entropy.

Frequency-Aware Cache Adaptation (FACA): The cache capacity $K_c$ for class $c$ is

$K_c = K_{\text{base}} G(n_c),\quad G(n_c) = \left(\frac{n_{\max}+\gamma}{n_c+\gamma}\right)^{\alpha}\exp(-\lambda n_c/n_{\text{total}})$

favoring higher capacities for rare classes.

Calibration at Inference: Detector scores are recalibrated using affinity-weighted aggregations over the cached features.

Modern LVLMs and LLMs accumulate KV caches that are both memory-intensive and semantically redundant. MixKV (Liu et al., 23 Oct 2025) realizes ADC compression by:

Per-head Joint Scoring: For each attention head, MixKV computes an importance score (attention + value-norm) and a diversity score (negative cosine similarity to head mean). The two are mixed with a weight $w_h^l$ :

$s_{\text{comp},i} = (1-w_h^l) s_{\text{imp},i} + w_h^l s_{\text{div,scaled},i}$

where $w_h^l$ is determined from the head redundancy.

Budgeted Selection: Only the top $B$ pairs by $s_{\text{comp}}$ are retained per head.

2.3. Adaptive Video Streaming and Edge Caching

In wireless edge networks, ADC (as formulated in CaVe-CoP (Sasikumar et al., 2019)) targets not only content popularity but also version and placement diversity:

Distributed Dual Decomposition: ADC decomposes the optimization over user cache-version selection and node content placement, with dual variables pricing resource contention and storage.
Utility-Driven Placement: By distributing file versions and adapting placement according to network load and device preference, ADC increases system-wide utility and cache hit.

2.4. Acceleration of Diffusion Models

DiCache (Bu et al., 24 Aug 2025) generalizes the ADC principle to runtime acceleration by:

When to Cache: Online probe profiling monitors shallow-layer changes as a proxy for output difference. If the cumulative shallow-layer difference $\Sigma_{\mathrm{error}}$ exceeds a threshold $\delta$ , a cache update is triggered.
How to Use Cache: Cached model states (residuals) are dynamically fused via trajectory alignment in feature space, leveraging probe-derived interpolation weights.

2.5. Content Diversity vs. Transmission Diversity in Edge Networks

ADC mechanisms in multi-cell networks navigate the tension between content diversity (cache hit probability) and transmission diversity (multi-node coherence) (Liu et al., 2017):

Partition-based Caching: Files are split and partially redundantly cached across edge nodes, with the split size $t_f$ per file adapting based on SNR regime and popularity, optimizing the overall outage probability.

3. Joint Selection and Diversity-Driven Scoring Approaches

Nearly all ADC instantiations introduce a scoring function that integrates two (or more) criteria for cache entry selection or retention:

Domain	Importance/Confidence	Diversity	Mixing Parameter
HOI Detection (Jiang et al., 24 Nov 2025)	Logit entropy	Cosine and Euclidean diversity	$\tau$ (user-set)
MixKV (Liu et al., 23 Oct 2025)	Attention + V-norm	Negative cosine to mean key	$w_h^l$ (head redundancy)
DiCache (Bu et al., 24 Aug 2025)	N/A (usage trigger only)	Probe-relative position	$\hat\gamma_t$ (probe-based interpolation ratio)

The diversity term frequently quantifies the orthogonality, uniqueness, or low similarity of entries within some feature or latent space. The importance or confidence component often derives from task-specific metrics (e.g., high-entropy logits in detection, attention relevance in transformers).

The dynamic or head-adaptive weighting (e.g., $w_h^l$ in MixKV) allows the trade-off between importance and diversity to adjust in response to redundancy or imbalance, which empirically increases cache efficiency in the presence of long-tail or multi-modal data.

4. Frequency and Resource-Aware Adaptation

ADCs universally acknowledge the need for non-uniform allocation of caching resources. Representative strategies include:

Frequency-Aware Scaling: ADC in HOI detection assigns larger caches to rare classes, with a smooth functional dependence on observed frequencies and bounded by minimum and maximum constraints (Jiang et al., 24 Nov 2025).
Head-Wise Semantic Redundancy Assessment: MixKV quantifies attention head redundancy to determine the optimal weight between pure importance and pure diversity (Liu et al., 23 Oct 2025).
Regime-Specific Partitioning: In wireless and edge caching, the relative benefits of content vs transmission diversity shift with system SNR; ADC dynamically interpolates between regime-optimal allocation schemes (Liu et al., 2017).

A plausible implication is that ADC approaches are most effective when system or data regime is non-stationary (e.g., time-varying channel conditions, changing class prevalence) or when cacheable entities exhibit wide diversity in popularity or representational spread.

5. Empirical Evaluation and Impact

ADC implementations consistently report substantial gains over non-diversity-aware or static caching baselines:

HOI Detection (Jiang et al., 24 Nov 2025): On HICO-DET, ADC yields +3.96 mAP (rare), +1.41 mAP (full) over ADA-CM, with up to +8.57% on rare classes in ablations. For V-COCO, a +4.4 gain in AP_role.
KV Compression (Liu et al., 23 Oct 2025): MixKV confers 5.1% average improvement under extreme compression budgets, with up to 9.0 pts gain on specific GUI grounding tasks.
Edge Video Streaming (Sasikumar et al., 2019): Distributed ADC achieves >95% of centralized optimum utility, a 20% utility increase, and 50% stall time reduction relative to cache-all-versions baselines.
Diffusion Model Acceleration (Bu et al., 24 Aug 2025): DiCache achieves up to 3.2× inference speedup and minimal LPIPS degradation (as low as 0.17) on WAN 2.1.
Edge Caching (Transmission/Content Diversity) (Liu et al., 2017): ADC achieves up to 2–3 dB SNR gain at $10^{-2}$ target outage by adaptively trading content vs transmission diversity.

In all cases, the computational overhead of ADC mechanisms is reported as negligible compared to the underlying model or network forwarding costs. No training or parameter updates are required for inference-time ADC modules; all operations are simple vector or matrix computations amenable to batch processing.

6. Practical Considerations and Extensions

ADC modules are typically designed to be training-free and architecture-agnostic, requiring only access to intermediate model representations, network statistics, or cached entities. Practical deployment benefits include:

Plug-and-Play Operability: ADC can be attached to existing inference pipelines, edge caches, or model KV stores with no architectural modification.
Memory-Efficiency: By adaptively compressing or resizing caches, ADC matches resource allocation to task difficulty, regime, or redundancy profile.
Scalability: Most ADC algorithms have per-unit computational complexity linear in cache size or candidate set, and scale efficiently via batch operations or distributed dual decompositions.
Generalization: The core ADC paradigm—mixing confidence/importance with diversity under resource constraints—applies whenever redundancy and imbalance limit standard cache efficacy. As such, extensions to new domains (e.g., U-Net-based diffusion generations) are plausible, especially when shallow layers/heads provide meaningful guidance for cache decisions (Bu et al., 24 Aug 2025).

The evidence from ablation studies further demonstrates that omitting diversity or failing to adapt mixing weights sharply reduces the realized gains, underscoring the necessity of the dual-criterion and frequency/resource-aware allocation at the heart of ADC.

7. Theoretical Insights and Design Guidelines

ADC methods are grounded in the formal trade-off between maximizing predictable (often frequent or high-confidence) coverage and ensuring sufficient spread to mitigate rare, high-impact loss events (e.g., cache misses, rare class misclassifications):

Optimization frameworks (e.g., joint selection in edge caching (Sasikumar et al., 2019), partition-based redundancy (Liu et al., 2017)) show that regime-specific optimality emerges by allocating resources to maximize either diversity (high SNR/content-rich environments) or importance (low SNR/predictable regimes), with smooth interpolation required for general robustness.
Lookup and real-time adaptation: Practical ADC deployments cache offline optimization results indexed by key context metrics (e.g., SNR, user distribution) for fast runtime adaptation (Liu et al., 2017).
Rule-of-thumb parameterization: Key hyperparameters (e.g., $K_\text{base}$ in HOI-ADC, $\delta$ in DiCache) can be tuned to balance rare/non-rare performance, with optimal values empirically determined in reported studies (Jiang et al., 24 Nov 2025, Bu et al., 24 Aug 2025).

These theoretical constructs reveal that ADC designs are a principled solution to the bias-variance-redundancy trade-offs endemic to modern machine learning, networking, and distributed inference.

References

"Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache" (Jiang et al., 24 Nov 2025)
"Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-LLMs" (Liu et al., 23 Oct 2025)
"DiCache: Let Diffusion Model Determine Its Own Cache" (Bu et al., 24 Aug 2025)
"Cache-Version Selection and Content Placement for Adaptive Video Streaming in Wireless Edge Networks" (Sasikumar et al., 2019)
"Exploiting Tradeoff Between Transmission Diversity and Content Diversity in Multi-Cell Edge Caching" (Liu et al., 2017)