STC-Cacher: Stochastic Caching Systems

Updated 7 December 2025

STC-Cacher is a family of stochastic, state-aware, and streaming-based caching algorithms designed to optimize cache allocation and performance across wireless, encrypted, and video systems.
It employs techniques such as water-filling allocation, dynamic cache partitioning, and selective token recomputation to reduce subpacketization overhead and achieve low latency while preserving privacy.
The framework has broad applications from coded caching in broadcast networks to token-level caching in Vision Transformers, demonstrating significant improvements in hit rate and computational efficiency.

STC-Cacher refers to a family of algorithms and systems in the caching and compression literature, unified by the use of stochastic, state-aware, or streaming-based mechanisms for cache allocation, management, and token compression. Major domains of STC-Cacher development include optimized coded caching in broadcast wireless networks, dynamic cache partitioning for privacy-preserving content delivery, state transition models for time-varying popularity, and token-level compression in vision transformer architectures. These systems share a foundation in probabilistic modeling, cache sizing, and selective recomputation or replacement strategies, with the goal of improving delivery latency, reducing subpacketization, maximizing hit rate, and preserving privacy or computational efficiency.

1. Optimized Shared-Cache Coded Caching in Stochastic Networks

STC-Cacher in the context of stochastic coded caching addresses broadcast scenarios in which a set of $K$ users are randomly associated with $\Lambda$ shared caches, subject to a total cache memory budget $tN$ , where $N$ is the number of files and $t \in [0, \Lambda]$ is the normalized cache size parameter. The key insight is to allocate cache capacities proportionally to each cache's expected user load, thereby minimizing the expected worst-case delivery time across all random user–cache association realizations (Malik et al., 2021).

Key Algorithmic Phases:

Storage Allocation: The normalized cache sizes $\gamma_\lambda$ are chosen to minimize the expected worst-case delivery time, leading to a water-filling allocation over cache load intensities. For statistically identical users, the optimal allocation is given by:

$\gamma_\lambda = \frac{\sum_{\tau \subseteq [\Lambda], |\tau|=t, \lambda \in \tau}\prod_{j \in \tau} \hat v_j}{\sum_{\tau \subseteq [\Lambda], |\tau|=t}\prod_{j \in \tau} \hat v_j}$

where $\hat v_\lambda = \bar v_\lambda / \alpha$ and $\alpha = \gcd(\bar v_1, ..., \bar v_\Lambda)$ .

Content Placement: Cache $\lambda$ is subdivided into $\hat v_\lambda$ "virtual caches". Files are split into $S$ subpackets, with $S = \sum_{\tau \subseteq [\Lambda], |\tau|=t}\prod_{j \in \tau} \hat v_j$ ; this construction yields exponentially reduced subpacketization compared to classical schemes, especially when $K \gg \Lambda$ .
Delivery Phase: Once user–cache associations realize, the system partitions user groups accordingly and multicasts coded XOR combinations to serve each user population. The achieved expected delivery time $\overline T(t)$ is within a small multiplicative factor of existing near-optimal coded caching bounds, but with dramatically lower subpacketization overhead.

Context and Significance: STC-Cacher achieves two objectives: it (i) mitigates cache-load imbalance by adapting cache sizes to empirical association probabilities, and (ii) offers exponentially smaller subpacketization ( $S = S_{MN}/\alpha^t$ versus the Maddah-Ali–Niesen $S_{MN}$ ), making the solution more practical for real networks with large user populations but limited caches (Malik et al., 2021).

2. Stochastic Dynamic Cache Partitioning for Encrypted Content Delivery

STC-Cacher also denotes a cache partitioning algorithm for encrypted content delivery, where an Internet Service Provider (ISP) manages a shared, content-oblivious cache for multiple content providers (CPs) using only aggregate miss-rate statistics—never observing individual object requests (Araldo et al., 2016).

Core Workflow and Algorithm:

System Architecture: Each CP is served by a proxy process managing a separate cache partition. Encrypted objects are stored and delivered transparently; the ISP logs only per-CP miss statistics.
Formalization: The task is to partition cache slots among $P$ CPs to minimize the total expected miss rate $L(\theta) = \sum_{p=1}^P L_p(\theta_p)$ .
SDCP Algorithm: The key technique is a perturbed stochastic subgradient update, in which allocations are randomly perturbed, per-CP miss counts are recorded, and a gradient estimator is computed for an adaptive update:

$\hat g_p(k) = \delta y_p(k) D_p(k) - \frac{1}{P}\sum_j \delta y_j(k) D_j(k)$

The continuous allocations are updated via a diminishing step size and projected back onto the simplex.

Convergence Guarantees: Under mild convexity and bounded-arrival assumptions, the method converges almost surely to the global optimum up to a small integer rounding gap. For typical instances, the achieved miss rates are within $10\%$ of optimal, even under non-stationary content popularity.

Privacy and Practical Aspects: No object-level information is needed; the algorithm dynamically tracks non-stationary request distributions, preserves CP privacy, and recovers most of the theoretical hit-rate advantage of jointly managed caches (Araldo et al., 2016).

3. Streaming Token Compression for VideoLLMs

In streaming multimodal applications, STC-Cacher refers to a token-level caching strategy for Vision Transformer (ViT) based video encoders deployed with LLMs (VideoLLMs). The primary objective is to accelerate per-frame ViT encoding by reusing computations for spatial tokens that remain nearly unchanged across temporally adjacent video frames (Wang et al., 30 Nov 2025).

Design and Operation:

Periodic Reference Caching: Every $N$ -th frame (reference frame), all intermediate states of each ViT block are cached.
Dynamic Token Identification: For subsequent frames, tokenwise cosine similarity is computed between current and cached reference Key projections. The $k = \lfloor T \cdot (1 - R_C) \rfloor$ most dissimilar tokens (with lowest similarity) are considered dynamic and recomputed; the remainder are treated as static and reuse cached computations.
Selective Recompute: Only the dynamic tokens undergo full attention/MLP updates; static tokens use the cached outputs.
Cache Management: The reference cache is refreshed every $N$ frames. No eviction policy is needed beyond periodic refresh.
Latency and Fidelity: With $R_C = 75\%$ and $N=4$ , the measured encoding latency per frame is reduced by $25$– $35\%$ , with task accuracy preserved within $1\%$ of baseline.

Implementation: No model architecture changes are needed. The method involves wrapping ViT blocks with routines for token similarity computation, dynamic index selection, and scatter/gather updates (Wang et al., 30 Nov 2025).

4. Dynamic Caching with State Transition Fields

The STC-Cacher framework is also a lens for analyzing cache replacement under time-varying content popularity through the construction of a state transition field (STF) (Gao et al., 2019). Each caching policy is viewed as inducing a vector field over the probability simplex of cache states.

Key Concepts:

State Caching Probability (SCP): The vector $\boldsymbol{\eta}^{(n)}$ tracks probabilities of each cache state after $n$ requests.
Instantaneous STF: At time $n$ , the transition vector

$\mathbf{u}^{(n)}(\boldsymbol{\eta}^{(n-1)}) = \Theta^{(n)} \boldsymbol{\eta}^{(n-1)} - \boldsymbol{\eta}^{(n-1)}$

describes how cache state distribution evolves due to a content request and policy.

Hit Probability Decomposition: The instantaneous and average cache hit rates decompose as sums over these STFs and the sequence of popularity vectors $\{\boldsymbol{\upsilon}^{(n)}\}$ .

Policy Insights:

Policies exploiting "full knowledge" (TLP) are superior if popularity evolves slowly/smoothly; "partial knowledge" (LP) is less effective; "no knowledge" (LRU, RR) can outperform when popularity shifts abruptly.
Simulation models (e.g., shot noise, short life-span, and Gaussian-peak request rates) confirm that the efficacy of any replacement policy is determined by the alignment of its update timescale and the underlying popularity dynamics (Gao et al., 2019).

5. Subpacketization, Latency, and Performance Analyses

Across network and streaming contexts, STC-Cacher is characterized by its ability to achieve performance close to theoretical optima with substantially reduced overheads.

Subpacketization in Coded Caching: The exponential reduction in subpacketization (an $\alpha^t$ factor decrease, where $\alpha = \gcd(\bar v_1, ..., \bar v_\Lambda)$ ) makes coded shared-cache strategies feasible for large $K$ and limited $\Lambda$ (Malik et al., 2021).
Latency Savings in VideoLLMs: A measured $24.5\%$ reduction in ViT prefill latency and similar retrieval speedups are obtainable with negligible accuracy loss, consistent across several baselines (Wang et al., 30 Nov 2025).
Dynamic Partitioning: The stochastic subgradient dynamic partitioning keeps the cache miss rate within $10\%$ of the global optimum and adapts to non-stationary regimes without object-level information (Araldo et al., 2016).

6. Limitations and Future Directions

Coded Caching: In the presence of highly dynamic user–cache association patterns or extremely skewed access, performance gains depend on accurate load estimation and may degrade if real association distributions violate modeling assumptions (Malik et al., 2021).
Caching for Encrypted Delivery: Noisy gradient estimation requires careful step-size selection; performance converges over $10^2$ – $10^3$ slots and re-initialization in ON–OFF popularity regimes (Araldo et al., 2016).
Streaming Compression: For high-motion videos where most tokens are dynamic, cache hit rates and speedup diminish; adaptive thresholds and motion estimators may ameliorate this (Wang et al., 30 Nov 2025).
STF Caching: Overfitting to instantaneous popularity risks overshooting in bursty regimes; recency-based schemes may be more robust in such cases (Gao et al., 2019).

A plausible implication is that future research will investigate finely tuned or hybrid variants—such as adaptive cache intervals, learnable dynamic thresholds, and combined popularity–recency predictors—to further optimize the tradeoffs explored by current STC-Cacher systems across modalities and deployment scenarios.