CompACT: Pruning, Compression, and Catalogue Methods
- CompACT is a multifaceted term that defines methods for LLM pruning, multimodal tuning, activation compression, and galaxy cluster detection with precision.
- In LLM pruning, the approach uses a common-token-aware metric to simultaneously remove rare tokens and least important FFN channels, reducing latency and memory without retraining.
- The framework also extends to multimodal vision-language tuning and astrophysical applications, achieving enhanced compositional capability and higher completeness in galaxy cluster cataloguing.
COMPACT is an acronym used to refer to several distinct methods and datasets in contemporary machine learning and astrophysics. The following article systematically documents major COMPACT-related systems, focusing particularly on state-of-the-art model pruning in LLMs, compositional tuning for multimodal models, activation compression, active document compression, compact latent tokenization for planning, and a galaxy cluster catalogue. Each entry is treated separately, with priority given to fidelity and factual exactness.
1. COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
COMPACT (“COMMON-token Optimized Model Pruning Across Channels and Tokens”) is a training-free structured pruning framework designed for large decoder-only Transformer architectures. It is motivated by the need to improve memory usage, end-to-end latency, throughput, and serving costs for LLMs in deployment scenarios, without architectural deviations or retraining. Prior pruning methods—depth (removing layers) and width (removing channels)—trade off between model simplicity and inference engine compatibility. COMPACT achieves simultaneous vocabulary and channel pruning with deployment-friendliness and strong empirical performance (Kwek et al., 8 Sep 2025).
Joint Pruning Objective
COMPACT performs simultaneous rare-token vocabulary pruning and feedforward (FFN) channel pruning under a unified, common-token-aware importance measure. For a model with original vocabulary size and target , all embeddings and unembeddings corresponding to the rarest tokens are removed. Each FFN of dimension (inputintermediate) is pruned to by removing the least important channels.
The key innovation is that FFN channel importance is measured only with respect to non-pruned (“common”) tokens. Given calibrating examples , activations for each layer 0 and channel 1 are scored as 2, with weighting 3 if 4 is a common token, else 5.
Algorithmic Pipeline
- Compute rare-token set 6, prune associated embedding/unembedding rows.
- Forward-pass calibration examples, aggregate per-channel activation statistics, zeroing rare tokens.
- Prune FFN channels per layer using the importance measure.
- Output model preserves the standard Transformer skeleton: all layers, attention heads, and dataflow remain unmodified; only embedding and FFN widths are reduced.
No retraining or auxiliary optimization is performed. The method is applicable even to 70B-parameter LLMs with only minutes of calibration.
Pruning Control and Trade-offs
Two axes are exposed: 7 (vocabulary size) and 8 (hidden/FFN width). For a target parameter-reduction 9, any 0 pair satisfying
1
is valid, and the choice may be tuned to enhance metric retention.
Deployment and Engine Compatibility
By retaining the original Transformer depth, attention configuration, and computation graph, COMPACT does not require custom inference kernels or engine modifications. Pruned checkpoints function directly with standard software stacks (Huggingface, vLLM, FlashAttention).
Scale-Adaptivity
Parametric analysis reveals that smaller LLMs are embedding-dominated, while larger LLMs are FFN-dominated. COMPACT allows selective targeting:
- For small models: aggressively prune vocabulary (2 small, 3).
- For large models: prune FFN channels (4, 5).
- For intermediate: hybrid schedule.
This adaptivity yields robustness across Qwen, LLaMA, and Gemma families (6–7B).
Empirical Performance
- At comparable prune ratios, COMPACT outperforms ShortGPT, LaCo, SliceGPT, and 2SSP baselines.
- Example: On Qwen 0.5B, 8 prune retains 9 dense accuracy (mean across MMLU, GSM8K, HSWAG), where baselines approach random.
- LLaMA 8B memory: 0 MB1 MB; throughput: 2 qps3 qps.
- LLaMA 8B prune in 4 versus SliceGPT’s 5 (min:sec).
Training-Free Operation
All pruning is post hoc and training-free, requiring only a small 6 of forward passes for activation calibration. Pruning 8B to 70B models takes 7 minutes on a single A100.
2. COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
COMPACT (“COMPositional Atomic-to-complex visual Capability Tuning”) is a dataset generation and training recipe for multimodal LLMs (MLLMs), specifically targeting compositional generalization in vision-language tasks. Standard Visual Instruction Tuning (VIT), as in LLaVA-665k, underrepresents complex (high-8) capability combinations, impairing model performance on tasks that require coordinated use of many basic vision skills (Wu et al., 30 Apr 2025).
Atomic Capability Taxonomy
A fixed taxonomy of 10 atomic visual capabilities is used:
- Attribute: color, shape.
- Recognition: object, action, text, spatial, counting.
- Relation: spatial relationship, object interaction, scene understanding.
Compositional complexity 9 of a sample 0 is 1, where 2 are the required atomic capabilities.
Dataset Construction
Images from LLaVA-665k are used as the pool. For each image 3, and each 4:
- Sample a combination 5 of 6 atomic capabilities.
- Use Gemini-2.0-Flash to generate corresponding Q-A pairs with confidence 7 and diversity checks.
- Use a verification model to ensure correspondence between Q, A, and 8.
- Curate dataset 9 with balanced coverage over all 0.
Only images with at least two verified multi-1 Q-A pairs are retained. The final training set is 2.
Model Integration and Objective
Starting from LLaVA-7B-VIT pre-checkpoint, LoRA adapters are tuned using standard token-level cross-entropy loss over answer tokens, with vision encoder (CLIP-ViT) and most language parameters frozen.
Empirical Results
COMPACT achieves:
- Comparable or superior performance to LLaVA-665k VIT on all standard benchmarks using 310% of the data budget.
- Especially strong improvements on tasks requiring 4 atomic capabilities (e.g., 5 on MMStar, 6 on MM-Vet relative to full-scale VIT).
- Table 2 from the paper summarizes overall gains; see also Figure 1 showing compositional generalization.
Ablation Analyses
- Balanced 7 training is critical; reverting to the LLaVA-665k 8-distribution sharply reduces performance.
- All ten atomic capabilities are necessary for optimal performance; some (scene understanding, spatial) are more critical.
- Increasing instruction-tuning data beyond 95% offers little additional benefit.
3. CompAct: Compressed Activations for Memory-Efficient LLM Training
CompAct is a technique for reducing memory utilization during LLM training via low-rank random projections of activations between forward and backward passes. This directly targets activation buffers, which are the dominant contributor to peak GPU memory in LLM pretraining and fine-tuning (Shamshoum et al., 2024).
Methodology
For each linear layer with input activation 0, a Gaussian random matrix 1 (2) compresses 3. Only 4 is stored during the forward pass. Weight gradients are computed in this compressed domain, with the full gradient reconstructed at the update step. Random matrices are seeded, not materialized.
Pseudocode (Forward, Backward, Optimizer)
Full pseudocode is provided for all steps. Notably, the backward pass unrolls only the gradients required for optimizer steps using 5, minimizing reconstruction overhead.
Memory and Throughput
CompAct results in 6–7 peak GPU memory reduction in pretraining, 8 in fine-tuning. Throughput increases with decreasing rank 9 due to smaller GEMMs. Combined with activation checkpointing, CompAct outperforms prior gradient-sketching methods and does not incur recomputation or communication penalties.
Limitations
Only linear activations are compressed; attention, nonlinearities, normalization remain full-rank. Excessively small 0 can degrade fine-tuning accuracy on small tasks.
4. CompAct: Compressing Retrieved Documents Actively for Question Answering
CompAct is an active compression framework for context conditioning in retrieval-augmented question answering (QA). It addresses the issue that large readers degrade when provided with extensive, noisy or multi-hop contexts, and single-pass filtering often misses critical factual connections (Yoon et al., 2024).
Active, Iterative Compression Strategy
CompAct interposes between the retriever and generative reader. Documents are split into segments (e.g., 5 per step), and an LLM-based compressor 1 iteratively summarizes, progressively updating a compressed context 2, and emits an action token [COMPLETE|INCOMPLETE], allowing for early stopping. The compressor is trained under supervised labels generated by GPT-4o.
Mathematical Formulation
3
with 4 state, 5 action.
Empirical Performance
Exemplified on HotpotQA:
- CompAct attains 6 compression rate and F1 score 7 (raw docs 8).
- Cost-efficiency: On GPT-3.5-Turbo, CompAct reduces per-sample API cost from \$I-I'$90.04.
- Maintains performance as $\{x_i\}_{i=1}^N$0 (retrieved docs) increases, unlike raw concatenation or static pruning.
Deployment
CompAct is a plug-and-play compressor between arbitrary retrievers and readers and accommodates short-context LMs prevalent on HuggingFace. Early stopping reduces mean computation.
5. CompACT: A Discrete Tokenizer for Planning in Latent World Models
CompACT, as introduced in action-conditioned world modeling, is a discrete observation tokenizer yielding extreme compression—representing each image frame as as few as 8 discrete tokens—without substantial accuracy loss in planning and simulation benchmarks (Kim et al., 5 Mar 2026).
Architecture
- Frozen semantic encoder (DINOv3-B) extracts patch embeddings.
- Latent resampler (transformer) distills these into 8 latent queries.
- Finite Scalar Quantization (FSQ) discretizes to $\{x_i\}_{i=1}^N$1 possible codes per query.
- A MaskGIT-style generative decoder reconstructs high-fidelity targets in VQGAN latent space.
Training Objective
Masked generative cross-entropy loss on target MaskGIT-VQGAN tokens, conditional on compact tokens $\{x_i\}_{i=1}^N$2, replaces pixel-space loss: $\{x_i\}_{i=1}^N$3
World Model and Planning
- Tokenizer output is used as state in a masked generative discrete world model.
- Model-predictive control with CEM leverages extreme compression for 40–80$\{x_i\}_{i=1}^N$4 rollout latency reduction, facilitating real-time planning.
Results
- CompACT-8 achieves rFID $\{x_i\}_{i=1}^N$5, IS $\{x_i\}_{i=1}^N$6 on ImageNet reconstruction.
- Navigation planning: ATE $\{x_i\}_{i=1}^N$7, RPE $\{x_i\}_{i=1}^N$8, latency $\{x_i\}_{i=1}^N$9s—competitive with, or faster than, larger-token baselines.
- Ablations confirm the necessity of frozen semantic encoders, generative decoders, and masked autoencoding in preserving planning-essential information.
6. ComPACT: Galaxy Cluster Catalogue (ACT + Planck)
ComPACT (“Convolutional neural network–based Planck + ACT Cluster‐finding Technique”) is a large, machine-learning–curated catalogue of Sunyaev–Zel'dovich (tSZ)–selected galaxy clusters identified in combined ACT and Planck microwave data (Voskresenskaia et al., 19 May 2026).
Construction and Detection Pipeline
- A U-Net–style CNN scans 32$a_i^{(\ell)}$032 ACT+Planck intensity patches, producing per-pixel tSZ-cluster probabilities.
- Detection thresholds: $a_i^{(\ell)}$1, contiguous region $a_i^{(\ell)}$2 pixels above $a_i^{(\ell)}$3.
- Positive labels are generated by injecting simulated clusters into real maps, negative from random/known non-cluster sky.
Catalogue Properties
| Statistic | Value/Range |
|---|---|
| Footprint | $a_i^{(\ell)}$4 |
| Total candidates | 2,962 |
| Confirmed clusters | 1,668 + 116 new redshifts ($a_i^{(\ell)}$560%) |
| Redshift ($a_i^{(\ell)}$6) | $a_i^{(\ell)}$7 |
| Mass ($a_i^{(\ell)}$8) | $a_i^{(\ell)}$9 (56%) |
New high-mass, high-$V'| 5 clusters ($V' | |
Completeness ($V'| $V' | |
Cluster physical properties are derived from tSZ Compton-$V' ComPACT recovers higher completeness (especially for $V' COMPACT/CompACT/CompAct refer to several independent methodologies:Scientific Impact
7. Terminological Caveat