Image-Centric Compression: Methods & Trends

Updated 3 August 2025

Image-centric compression is a systematic approach that tailors coding methods for image data by balancing bit rate, visual fidelity, and computational efficiency.
Modern techniques integrate lossless, near-lossless, and lossy methods with classical transforms and learning-based strategies to optimize performance.
Emerging trends focus on semantic-aware, generative, and observer-dependent paradigms to enhance visual quality and machine analytic tasks.

Image-centric compression refers to the systematic design, analysis, and implementation of coding methods specifically tailored for compressing image data, with a focus on the trade-off between rate (bit budget), visual and semantic fidelity, and computational requirements. While the early objective in image compression was to maximize storage or transmission efficiency for human consumption, contemporary approaches increasingly consider downstream perceptual and analytic tasks—addressing both human visual system (HVS) metrics and machine perception. The field comprises a progression of lossless, near-lossless, and lossy methods, alongside innovations in transform, adaptive, learning-based, and generative paradigms. Contemporary research pursues context-awareness, hybrid and modular designs, observer-dependent optimization, and resource-efficient deployment.

1. Classical and Modern Principles of Image Compression

Image-centric compression is built upon several foundational classes:

Lossless Compression: Compresses images such that the original data can be recovered exactly. Dominant techniques include dictionary-based methods (LZ77, LZ78, LZW), prediction-based algorithms (such as the Gradient Adaptive Predictor, GAP; Adaptive Linear Prediction and Classification, ALPC; and neural network predictors), and wavelet-based coders (Discrete Wavelet Transform (DWT) with ordering algorithms like EZW or SPIHT). Methods such as JPEG2000 are underpinned by DWT for both lossless and lossy modes (Prantl, 2014).
Near-Lossless and Lossy Compression: Near-lossless approaches allow bounded error in pixel values (e.g., ±1), extending lossless predictors and RLE/graph-based variants. Lossy schemes, which admit greater irrecoverable changes for reduced rate, rely predominantly on block or transform coding (DCT, KLT/PCA, DWT), neural network bottleneck approaches, and contour or SVD-based methods (Prantl, 2014).

The formal trade-off is governed by rate-distortion protocols, often quantified through classic metrics such as PSNR: $MSE = \frac{1}{M \cdot N} \sum_{i=0}^{M-1} \sum_{j=0}^{N-1} (I(i,j)-K(i,j))^2$

$PSNR = 10 \cdot \log_{10}\left(\frac{255^2}{MSE}\right)$

where $I$ and $K$ are the original and reconstructed images. As compression ratio increases, distortion usually increases, requiring perceptual tolerances to be exploited (Prantl, 2014).

Recent classical innovations introduced adaptive edge-aware predictors, advanced coefficient ordering, and hybrid scanning strategies to improve upon established transforms. Neural and contour-based techniques have also been augmented by metaheuristics and advanced training algorithms.

2. Learning-Based, Observer-Dependent, and Machine-Centric Paradigms

The field has shifted toward learned approaches with several trends:

Observer-Dependent Losses: Compression methods increasingly optimize for specific observer objectives. The family of observer-dependent loss functions allows interpolation between human visual quality (e.g., MS-SSIM) and machine-classification accuracy (deep feature loss). The loss can be convexly combined as: $d_{\alpha,I}(x, \hat{x}) = (1 - \alpha) \lambda_H d_H(x, \hat{x}) + \alpha d_{C,I}(x, \hat{x})$ enabling tuning toward the human or the machine observer (Weber et al., 2019).
Perceptual and Discernibility Constraints: Models like Discernible Image Compression (DIC) incorporate pre-trained networks as fixed perceptual regularizers. The training loss not only minimizes pixel-level differences but also penalizes discrepancies in high-level features (e.g., as extracted by ResNet-18), sometimes further aligned by maximum mean discrepancy (MMD) to ensure feature distribution consistency between the compressed and reference domains (Yang et al., 2020).
Rate-Distortion-Utility Optimization: Some frameworks, such as in “Learned Image Compression for Machine Perception,” formalize the joint optimization of rate, distortion, and utility (for tasks): $\mathcal{L} = \mathcal{R}(Q(f(x))) + \lambda_d \mathcal{D}(g(Q(f(x))), x) + \lambda_u \mathcal{U}(Q(f(x)))$ where the utility term $\mathcal{U}$ captures downstream classification, detection, or segmentation performance, enabling both human and machine fidelity goals to be met simultaneously (Codevilla et al., 2021).
Recognition-Aware Loss: Extensions explicitly integrate a recognition loss (e.g., cross-entropy from a recognition branch, such as EfficientNet), combined with bitrate and distortion, with hyperparameters controlling the trade-off (Kawawa-Beaudan et al., 2022).
Generative and Semantic-Aware Compression: Methods such as “Machine Perceptual Quality” (Jacobellis et al., 2024), “Machine Perception-Driven Image Compression” (Zhang et al., 2023), and “Rethinking Image Compression on the Web with Generative AI” (Hassan et al., 2024) demonstrate that generative models (e.g., GANs, diffusion models) and implicit (or explicit) semantic priors can preserve machine-relevant features at extremely low bitrates. In these settings, deep perceptual metrics (LPIPS, VGG16 cosine similarity) are shown to correlate with downstream recognition quality, and sometimes aggressive compression even improves generalization when pretraining and test distribution are matched.

3. Algorithmic Innovations, Adaptive and Hybrid Approaches

Compression pipelines now encompass modular, hybrid, and adaptive elements:

Preprocessing and Modularity: Kuchen (Zhang et al., 2022) proposes a preprocessing stage that adaptively fuses multiple classical (denoising, high-pass filtering) and learning-based techniques, selected per-image by a hybrid scoring system balancing VQscore (perceptual) and bitrate. The backbone network is a U-net with global residuals, enabling codecs such as JPEG, HEVC, and WebP to achieve 22–34% improved compression ratio at preserved or even enhanced visual quality.
Edge and Region-of-Interest Awareness: Scene text-preserving compression (Uchigasaki et al., 2023) uses a deep scene text image quality assessment (STIQA) model (CRNN + Transformer fusion) to allocate bits dynamically via pixel-wise quality maps, iteratively tuned to maximize readability in critical text regions. This selective approach yields subjective and objective gains over holistic schemes in preserving information salience.
Frequency-Oriented and Color-Adaptive Coding: Recent models exploit frequency decomposition (via Laplacian or Haar pyramids) and dedicated color structure processing. For example, a model may include separate branches for luminance and chrominance (with CIEDE2000 color difference in the loss), optimizing human-perceptual tolerances and aligning encoding structure with HVS (Zhang et al., 2024, Prativadibhayankaram et al., 2023, Wei et al., 19 Feb 2025).
Transformers and Graph-Based Attention: Attention-based architectures, including QPressFormer (Luka et al., 2023) and GABIC (Spadaro et al., 2024), either replace convolutional priors with pure transformer blocks or introduce graph-based local attention by aggregating only the most informative, non-redundant patch features. GABIC, for instance, uses per-window k-nearest-neighbors (k-NN) to select features for local attention, outperforming conventional window attention in rate-distortion, particularly at high fidelity.
Codebook and Generative-Based Coding: Extreme compression via VQGAN (Mao et al., 2023) quantizes images in latent space using discrete codewords; the indices are losslessly compressed and optionally repaired by a transformer that predicts missing codes, enabling robust recovery at low bitrates and under transmission loss.

4. Performance Metrics and Evaluation

Evaluation strategies in image-centric compression are multi-faceted:

Metric	Domain	Typical Usage
PSNR	Signal	Pixel-level error, inverse-log of MSE, easy comparison but poor perceptual alignment
SSIM/MS-SSIM	Perceptual	Structural similarity, widely used in HVS evaluation
LPIPS	Perceptual/Machine	Deep similarity metric, shown to track both human and machine task performance (Jacobellis et al., 2024)
VQscore	Perceptual	No-reference quality estimate for subjective visual appraisal (Zhang et al., 2022)
CIEDE2000	Perceptual	Color fidelity metric, aligns with HVS frequency/color sensitivity (Prativadibhayankaram et al., 2023)
Task Utility	Machine	Accuracy in downstream recognition/detection/segmentation tasks (Codevilla et al., 2021, Kawawa-Beaudan et al., 2022)
mAP, mIoU	Machine	Standard object detection and segmentation metrics

Objective and subjective studies consistently show that human-perceived quality and machine-perceived fidelity can diverge, especially under severe compression. Counterintuitive phenomena, such as classification accuracy increasing with lossy compression when pretraining and deployment domains are aligned, have been observed (Jacobellis et al., 2024). Thus, comprehensive evaluation typically reports distortion, perceptual fidelity, and application-specific performance as functions of the bitrate.

5. Applications, Deployability, and Practical Considerations

Image-centric compression methods are adopted and deployed in a range of platforms:

Web and Mobile: Adaptive triangulation (Marwood et al., 2018) and content-aware preprocessing (Zhang et al., 2022) permit ultra-low size thumbnail transmission (as small as 200 bytes) with favorable PSNR and SSIM over JPEG/WebP, highly relevant to bandwidth-constrained mobile clients.
Edge, Storage, and Transmission: Layered and semantic-aware models allow “scalable coding”—the transmission or storage of select latent or frequency components, with commensurate trade-offs in quality and semantic retention (Zhang et al., 2024, Zhang et al., 2023). Efficient implementation (e.g., TinyLIC (Ma et al., 2024) and ICISP (Wei et al., 19 Feb 2025)) is a critical area for edge deployment and real-time inference, leveraging lightweight parameterizations and modular blocks.
Human and Machine Consumption: Unified frameworks such as “Learned Image Compression for Machine Perception” (Codevilla et al., 2021) allow joint optimization for both human viewing and direct use in machine pipelines, reducing the need for decompression and improving inference speed (20% speedup observed). Generative AI frameworks (Hassan et al., 2024) reconstruct images client-side from highly compressed conditioning signals, reducing bandwidth (up to 99.8% savings) with only minimal loss in high-level content, verified by perceptual similarity and user studies.

6. Open Challenges and Future Directions

Image-centric compression continues to advance along several axes:

Semantic-Aware, Lightweight Models: Maintaining or exceeding state-of-the-art perceptual or analytic fidelity with fewer parameters and lower FLOPs (as in implicit prior-based ICISP (Wei et al., 19 Feb 2025)) is essential for resource-constrained applications.
Task-Optimized Losses and Mixed-Modal Priors: Expanding beyond explicit priors (segmentation/text maps) to implicit, learned features (e.g., DINOv2 embeddings in the discriminator) and dynamic perceptual losses provides new pathways for efficient coding that generalizes across domains.
Granular Control and Scalable Coding: High-level frameworks support fine-grained adjustment on which frequency, semantic, or structural elements are preserved/transmitted, supporting incremental quality improvements and flexible deployment.
Generative and Diffusion-Based Paradigms: The use of powerful generative models to reconstruct high-fidelity, semantically faithful images from extremely compressed representations—with client-side generation or edge AI—suggests future internet infrastructure may offload much of the transmission cost to powerful, pre-trained generative models, especially as client hardware becomes more capable.
Evaluation and Theoretical Analysis: The increased focus on deep perceptual and utility metrics demonstrates a need for improved theoretical understanding and practical benchmarks that reflect the end-use case, whether human inspection, automatic analysis, or embedded sensor processing.

By integrating techniques from transforms, prediction, learning, generative modeling, and semantic representation, image-centric compression continues to evolve to meet the demands of modern digital media, machine vision, and constrained computation environments. The field is marked by a persistent balance of algorithmic sophistication, observer alignment, and practical efficiency (Prantl, 2014, Shaham et al., 2018, Weber et al., 2019, Codevilla et al., 2021, Kawawa-Beaudan et al., 2022, Zhang et al., 2023, Zhang et al., 2024, Spadaro et al., 2024, Wei et al., 19 Feb 2025).