CompACT: Pruning, Compression, and Catalogue Methods

Updated 3 July 2026

CompACT is a multifaceted term that defines methods for LLM pruning, multimodal tuning, activation compression, and galaxy cluster detection with precision.
In LLM pruning, the approach uses a common-token-aware metric to simultaneously remove rare tokens and least important FFN channels, reducing latency and memory without retraining.
The framework also extends to multimodal vision-language tuning and astrophysical applications, achieving enhanced compositional capability and higher completeness in galaxy cluster cataloguing.

COMPACT is an acronym used to refer to several distinct methods and datasets in contemporary machine learning and astrophysics. The following article systematically documents major COMPACT-related systems, focusing particularly on state-of-the-art model pruning in LLMs, compositional tuning for multimodal models, activation compression, active document compression, compact latent tokenization for planning, and a galaxy cluster catalogue. Each entry is treated separately, with priority given to fidelity and factual exactness.

1. COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

COMPACT (“COMMON-token Optimized Model Pruning Across Channels and Tokens”) is a training-free structured pruning framework designed for large decoder-only Transformer architectures. It is motivated by the need to improve memory usage, end-to-end latency, throughput, and serving costs for LLMs in deployment scenarios, without architectural deviations or retraining. Prior pruning methods—depth (removing layers) and width (removing channels)—trade off between model simplicity and inference engine compatibility. COMPACT achieves simultaneous vocabulary and channel pruning with deployment-friendliness and strong empirical performance (Kwek et al., 8 Sep 2025).

Joint Pruning Objective

COMPACT performs simultaneous rare-token vocabulary pruning and feedforward (FFN) channel pruning under a unified, common-token-aware importance measure. For a model with original vocabulary size $V$ and target $V'<V$ , all embeddings and unembeddings corresponding to $S=$ the $V-V'$ rarest tokens are removed. Each FFN of dimension $D\times I$ (input $\times$ intermediate) is pruned to $D\times I'$ by removing the $I-I'$ least important channels.

The key innovation is that FFN channel importance is measured only with respect to non-pruned (“common”) tokens. Given calibrating examples $\{x_i\}_{i=1}^N$ , activations $a_i^{(\ell)}$ for each layer $V'<V$ 0 and channel $V'<V$ 1 are scored as $V'<V$ 2, with weighting $V'<V$ 3 if $V'<V$ 4 is a common token, else $V'<V$ 5.

Algorithmic Pipeline

Compute rare-token set $V'<V$ 6, prune associated embedding/unembedding rows.
Forward-pass calibration examples, aggregate per-channel activation statistics, zeroing rare tokens.
Prune FFN channels per layer using the importance measure.
Output model preserves the standard Transformer skeleton: all layers, attention heads, and dataflow remain unmodified; only embedding and FFN widths are reduced.

No retraining or auxiliary optimization is performed. The method is applicable even to 70B-parameter LLMs with only minutes of calibration.

Pruning Control and Trade-offs

Two axes are exposed: $V'<V$ 7 (vocabulary size) and $V'<V$ 8 (hidden/FFN width). For a target parameter-reduction $V'<V$ 9, any $S=$ 0 pair satisfying

$S=$ 1

is valid, and the choice may be tuned to enhance metric retention.

Deployment and Engine Compatibility

By retaining the original Transformer depth, attention configuration, and computation graph, COMPACT does not require custom inference kernels or engine modifications. Pruned checkpoints function directly with standard software stacks (Huggingface, vLLM, FlashAttention).

Scale-Adaptivity

Parametric analysis reveals that smaller LLMs are embedding-dominated, while larger LLMs are FFN-dominated. COMPACT allows selective targeting:

For small models: aggressively prune vocabulary ( $S=$ 2 small, $S=$ 3).
For large models: prune FFN channels ( $S=$ 4, $S=$ 5).
For intermediate: hybrid schedule.

This adaptivity yields robustness across Qwen, LLaMA, and Gemma families ( $S=$ 6– $S=$ 7B).

Empirical Performance

At comparable prune ratios, COMPACT outperforms ShortGPT, LaCo, SliceGPT, and 2SSP baselines.
Example: On Qwen 0.5B, $S=$ 8 prune retains $S=$ 9 dense accuracy (mean across MMLU, GSM8K, HSWAG), where baselines approach random.
LLaMA 8B memory: $V-V'$ 0 MB $V-V'$ 1 MB; throughput: $V-V'$ 2 qps $V-V'$ 3 qps.
LLaMA 8B prune in $V-V'$ 4 versus SliceGPT’s $V-V'$ 5 (min:sec).

Training-Free Operation

All pruning is post hoc and training-free, requiring only a small $V-V'$ 6 of forward passes for activation calibration. Pruning 8B to 70B models takes $V-V'$ 7 minutes on a single A100.

2. COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

COMPACT (“COMPositional Atomic-to-complex visual Capability Tuning”) is a dataset generation and training recipe for multimodal LLMs (MLLMs), specifically targeting compositional generalization in vision-language tasks. Standard Visual Instruction Tuning (VIT), as in LLaVA-665k, underrepresents complex (high- $V-V'$ 8) capability combinations, impairing model performance on tasks that require coordinated use of many basic vision skills (Wu et al., 30 Apr 2025).

Atomic Capability Taxonomy

A fixed taxonomy of 10 atomic visual capabilities is used:

Attribute: color, shape.
Recognition: object, action, text, spatial, counting.
Relation: spatial relationship, object interaction, scene understanding.

Compositional complexity $V-V'$ 9 of a sample $D\times I$ 0 is $D\times I$ 1, where $D\times I$ 2 are the required atomic capabilities.

Dataset Construction

Images from LLaVA-665k are used as the pool. For each image $D\times I$ 3, and each $D\times I$ 4:

Sample a combination $D\times I$ 5 of $D\times I$ 6 atomic capabilities.
Use Gemini-2.0-Flash to generate corresponding Q-A pairs with confidence $D\times I$ 7 and diversity checks.
Use a verification model to ensure correspondence between Q, A, and $D\times I$ 8.
Curate dataset $D\times I$ 9 with balanced coverage over all $\times$ 0.

Only images with at least two verified multi- $\times$ 1 Q-A pairs are retained. The final training set is $\times$ 2.

Model Integration and Objective

Starting from LLaVA-7B-VIT pre-checkpoint, LoRA adapters are tuned using standard token-level cross-entropy loss over answer tokens, with vision encoder (CLIP-ViT) and most language parameters frozen.

Empirical Results

COMPACT achieves:

Comparable or superior performance to LLaVA-665k VIT on all standard benchmarks using $\times$ 310% of the data budget.
Especially strong improvements on tasks requiring $\times$ 4 atomic capabilities (e.g., $\times$ 5 on MMStar, $\times$ 6 on MM-Vet relative to full-scale VIT).
Table 2 from the paper summarizes overall gains; see also Figure 1 showing compositional generalization.

Ablation Analyses

Balanced $\times$ 7 training is critical; reverting to the LLaVA-665k $\times$ 8-distribution sharply reduces performance.
All ten atomic capabilities are necessary for optimal performance; some (scene understanding, spatial) are more critical.
Increasing instruction-tuning data beyond $\times$ 95% offers little additional benefit.

3. CompAct: Compressed Activations for Memory-Efficient LLM Training

CompAct is a technique for reducing memory utilization during LLM training via low-rank random projections of activations between forward and backward passes. This directly targets activation buffers, which are the dominant contributor to peak GPU memory in LLM pretraining and fine-tuning (Shamshoum et al., 2024).

Methodology

For each linear layer with input activation $D\times I'$ 0, a Gaussian random matrix $D\times I'$ 1 ( $D\times I'$ 2) compresses $D\times I'$ 3. Only $D\times I'$ 4 is stored during the forward pass. Weight gradients are computed in this compressed domain, with the full gradient reconstructed at the update step. Random matrices are seeded, not materialized.

Pseudocode (Forward, Backward, Optimizer)

Full pseudocode is provided for all steps. Notably, the backward pass unrolls only the gradients required for optimizer steps using $D\times I'$ 5, minimizing reconstruction overhead.

Memory and Throughput

CompAct results in $D\times I'$ 6– $D\times I'$ 7 peak GPU memory reduction in pretraining, $D\times I'$ 8 in fine-tuning. Throughput increases with decreasing rank $D\times I'$ 9 due to smaller GEMMs. Combined with activation checkpointing, CompAct outperforms prior gradient-sketching methods and does not incur recomputation or communication penalties.

Limitations

Only linear activations are compressed; attention, nonlinearities, normalization remain full-rank. Excessively small $I-I'$ 0 can degrade fine-tuning accuracy on small tasks.

4. CompAct: Compressing Retrieved Documents Actively for Question Answering

CompAct is an active compression framework for context conditioning in retrieval-augmented question answering (QA). It addresses the issue that large readers degrade when provided with extensive, noisy or multi-hop contexts, and single-pass filtering often misses critical factual connections (Yoon et al., 2024).

Active, Iterative Compression Strategy

CompAct interposes between the retriever and generative reader. Documents are split into segments (e.g., 5 per step), and an LLM-based compressor $I-I'$ 1 iteratively summarizes, progressively updating a compressed context $I-I'$ 2, and emits an action token [COMPLETE|INCOMPLETE], allowing for early stopping. The compressor is trained under supervised labels generated by GPT-4o.

Mathematical Formulation

$I-I'$ 3

with $I-I'$ 4 state, $I-I'$ 5 action.

Empirical Performance

Exemplified on HotpotQA:

CompAct attains $I-I'$ 6 compression rate and F1 score $I-I'$ 7 (raw docs $I-I'$ 8).
Cost-efficiency: On GPT-3.5-Turbo, CompAct reduces per-sample API cost from \$I-I'$90.04.
Maintains performance as $\{x_i\}_{i=1}^N$0 (retrieved docs) increases, unlike raw concatenation or static pruning.

Deployment

CompAct is a plug-and-play compressor between arbitrary retrievers and readers and accommodates short-context LMs prevalent on HuggingFace. Early stopping reduces mean computation.

5. CompACT: A Discrete Tokenizer for Planning in Latent World Models

CompACT, as introduced in action-conditioned world modeling, is a discrete observation tokenizer yielding extreme compression—representing each image frame as as few as 8 discrete tokens—without substantial accuracy loss in planning and simulation benchmarks (Kim et al., 5 Mar 2026).

Architecture

Frozen semantic encoder (DINOv3-B) extracts patch embeddings.
Latent resampler (transformer) distills these into 8 latent queries.
Finite Scalar Quantization (FSQ) discretizes to $\{x_i\}_{i=1}^N$1 possible codes per query.
A MaskGIT-style generative decoder reconstructs high-fidelity targets in VQGAN latent space.

Training Objective

Masked generative cross-entropy loss on target MaskGIT-VQGAN tokens, conditional on compact tokens $\{x_i\}_{i=1}^N$2, replaces pixel-space loss: $\{x_i\}_{i=1}^N$3

World Model and Planning

Tokenizer output is used as state in a masked generative discrete world model.
Model-predictive control with CEM leverages extreme compression for 40–80$\{x_i\}_{i=1}^N$4 rollout latency reduction, facilitating real-time planning.

Results

CompACT-8 achieves rFID $\{x_i\}_{i=1}^N$5, IS $\{x_i\}_{i=1}^N$6 on ImageNet reconstruction.
Navigation planning: ATE $\{x_i\}_{i=1}^N$7, RPE $\{x_i\}_{i=1}^N$8, latency $\{x_i\}_{i=1}^N$9s—competitive with, or faster than, larger-token baselines.
Ablations confirm the necessity of frozen semantic encoders, generative decoders, and masked autoencoding in preserving planning-essential information.

6. ComPACT: Galaxy Cluster Catalogue (ACT + Planck)

ComPACT (“Convolutional neural network–based Planck + ACT Cluster‐finding Technique”) is a large, machine-learning–curated catalogue of Sunyaev–Zel'dovich (tSZ)–selected galaxy clusters identified in combined ACT and Planck microwave data (Voskresenskaia et al., 19 May 2026).

Construction and Detection Pipeline

A U-Net–style CNN scans 32$a_i^{(\ell)}$032 ACT+Planck intensity patches, producing per-pixel tSZ-cluster probabilities.
Detection thresholds: $a_i^{(\ell)}$1, contiguous region $a_i^{(\ell)}$2 pixels above $a_i^{(\ell)}$3.
Positive labels are generated by injecting simulated clusters into real maps, negative from random/known non-cluster sky.

Catalogue Properties

Statistic	Value/Range
Footprint	$a_i^{(\ell)}$4
Total candidates	2,962
Confirmed clusters	1,668 + 116 new redshifts ($a_i^{(\ell)}$560%)
Redshift ($a_i^{(\ell)}$6)	$a_i^{(\ell)}$7
Mass ($a_i^{(\ell)}$8)	$a_i^{(\ell)}$9 (56%)
New high-mass, high-$V'	5 clusters ($V'
Completeness ($V'	$V'

Cluster physical properties are derived from tSZ Compton-$V'

Scientific Impact

ComPACT recovers higher completeness (especially for $V'

7. Terminological Caveat

COMPACT/CompACT/CompAct refer to several independent methodologies:

LLM pruning (Kwek et al., 8 Sep 2025), multimodal compositional tuning (Wu et al., 30 Apr 2025), activation compression (Shamshoum et al., 2024), active document compression (Yoon et al., 2024), discrete latent tokenization (Kim et al., 5 Mar 2026), and astronomical cluster cataloguing (Voskresenskaia et al., 19 May 2026) are unrelated except in nomenclature. Each should be precisely disambiguated in context.

Markdown Report Issue Upgrade to Chat

References (6)

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens (2025)

COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning (2025)

CompAct: Compressed Activations for Memory-Efficient LLM Training (2024)

CompAct: Compressing Retrieved Documents Actively for Question Answering (2024)

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model (2026)

ComPACT: Mass-Redshift Properties of the galaxy cluster catalogue (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CompACT.

CompACT: Pruning, Compression, and Catalogue Methods

1. COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

Joint Pruning Objective

Algorithmic Pipeline

Pruning Control and Trade-offs

Deployment and Engine Compatibility

Scale-Adaptivity

Empirical Performance

Training-Free Operation

2. COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

Atomic Capability Taxonomy

Dataset Construction

Model Integration and Objective

Empirical Results

Ablation Analyses

3. CompAct: Compressed Activations for Memory-Efficient LLM Training

Methodology

Pseudocode (Forward, Backward, Optimizer)

Memory and Throughput

Limitations

4. CompAct: Compressing Retrieved Documents Actively for Question Answering

Active, Iterative Compression Strategy

Mathematical Formulation

Empirical Performance

Deployment

5. CompACT: A Discrete Tokenizer for Planning in Latent World Models

Architecture

Training Objective

World Model and Planning

Results

6. ComPACT: Galaxy Cluster Catalogue (ACT + Planck)

Construction and Detection Pipeline

Catalogue Properties

Scientific Impact

7. Terminological Caveat

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics