Feature Coding for Machines (FCM)

Updated 13 December 2025

Feature Coding for Machines (FCM) is a framework that compresses and transmits intermediate neural network features instead of raw pixels, optimizing machine vision tasks.
FCM pipelines involve feature extraction, reduction, quantization, encoding, and decoding specifically designed for split inference scenarios.
FCM achieves significant bitrate reductions while maintaining near-native accuracy, enabling efficient, secure, and scalable edge-cloud deployments.

Feature Coding for Machines (FCM) refers to a class of methods, pipelines, and international standards for compressing and transmitting intermediate neural network features (“activation tensors”) rather than raw visual signals (pixels) in computer vision systems, especially in edge–cloud or machine-to-machine inference scenarios. FCM is motivated by the inefficiency of pixel-oriented codecs (e.g., H.264/AVC, HEVC, VVC) for machine vision, where the primary consumer of transmitted content is an algorithmic agent, not a human observer. FCM offers superior bandwidth efficiency, accuracy, privacy, and scalability by exploiting the task-relevance and statistical structure of intermediate features and by providing a standardized or learnable codec interface for split inference deployments.

1. Motivation and Conceptual Foundations

Feature Coding for Machines arises from the convergence of three trends: the proliferation of deep neural network (DNN)–based vision, constraints on edge-device compute and bandwidth, and the non-essentiality of pixel reconstructions for machine tasks (Eimon et al., 11 Dec 2025, Eimon et al., 10 Dec 2025). The traditional remote inference paradigm—sending raw or compressed video to a remote server—incurs excessive bitrate, exposes sensitive content, and frequently squanders resources preserving perceptual details that are irrelevant for DNN inference.

Split inference (collaborative intelligence) decomposes a vision network at an intermediate layer: the front-end (on-device) computes early layers, producing feature tensors $X=\{x_n\}_{n=1}^N$ , which are then feature-encoded and transmitted to the back-end (cloud/server), where inference is completed by the remaining network tail (Eimon et al., 11 Dec 2025, Eimon et al., 11 Dec 2025). This shift necessitates coding techniques that treat feature tensors as the atomic “signal” to be compressed and reconstructed, subject to minimal degradation in downstream accuracy.

2. Standardized FCM Codec Architectures

Modern FCM frameworks, as specified in the MPEG FCM Test Model (FCTM) (Eimon et al., 11 Dec 2025, Eimon et al., 10 Dec 2025), implement a multi-stage pipeline comprising:

Feature Extraction Interface: The device computes intermediate feature maps $X=\{x_n\}$ (with arbitrary spatial/channel configurations, often FPN activations).
Feature Reduction: Techniques such as temporal downsampling, multi-scale fusion (e.g., FENet), and channel selection reduce redundancy and spatial/semantic dimensionality.
Packing and Quantization: Feature maps are tiled into monochrome frames, min–max normalized, and quantized to $b$ bits (typically 8–12), possibly with Z-score normalization for global statistics preservation (Eimon et al., 10 Dec 2025).
Inner Encoding: The resulting “feature frames” are coded by a standard codec (e.g., VVC), typically using only a subset of tools (e.g., no in-loop filters or block partitions) optimized for feature sparsity rather than human vision (Eimon et al., 9 Dec 2025).
Transmission: The packed and quantized frames, including side information (feature statistics, channel activity, min/max, or Z-scores), are encapsulated into bitrate-efficient FCM bitstreams.
Decoding and Restoration: The server decodes, unpacks, dequantizes, and applies learned transforms (e.g., DRNet) to reconstruct the original intermediate features for downstream inference.

Key elements of the MPEG FCM bitstream include a Feature Coding Parameter Set (FCPS) containing statistics and structural metadata, and a series of Inner-Codec Data Units (ICDUs) carrying the quantized feature video (Eimon et al., 11 Dec 2025).

3. Rate–Distortion and Accuracy Trade-offs

The FCM design objective is to minimize the bitrate required to transmit features while maintaining the target accuracy for specific downstream tasks (e.g., detection, segmentation, tracking). Unlike human-oriented codecs where distortion is measured by PSNR or MS-SSIM, FCM metrics are primarily rate–accuracy curves (e.g., bits vs. mAP, bits vs. MOTA). The fundamental loss function is a Lagrangian cost: $L = D + \lambda R,$ where $D$ is the accuracy loss (often $1 - \text{accuracy}$ ), $R$ is bitrate, and $\lambda$ controls the trade-off (Eimon et al., 11 Dec 2025, Eimon et al., 10 Dec 2025).

Key findings:

Bitrate reductions up to 85–95% are reported relative to pixel-based remote inference pipelines for equivalent detection/segmentation/tracking accuracy (Eimon et al., 10 Dec 2025, Eimon et al., 11 Dec 2025, Eimon et al., 11 Dec 2025).
Temporal downsampling, channel pruning, and global-statistics-preserving quantization (e.g., Z-score normalization) further improve efficiency, especially for slowly varying or static scenes (Eimon et al., 10 Dec 2025).
Task fidelity is maintained within <1% of native (uncompressed) inference for all benchmarks in CTTC (Eimon et al., 11 Dec 2025).

4. Specialized Coding Tools and Profiles

Unlike pixel codecs, FCM-optimized codecs benefit from tool ablation and reconfiguration:

In-loop filters and perceptual transforms (e.g., SAO, DBF, ALF) are removed, as they disrupt activation distributions and impair machine accuracy (Eimon et al., 9 Dec 2025).
Motion estimation, partitioning, and advanced intra/inter coding tools are selectively disabled, yielding up to 23x encoder speedup at minimal BD-rate cost (≤1.7%) (Eimon et al., 9 Dec 2025).
Lightweight profiles (“Fast”, “Faster”, “Fastest”) offer practitioners trade-offs between encoder complexity, bitrate, and accuracy, with Fast yielding a 2.96% BD-rate gain and 21.8% speedup, and Fastest achieving 95.6% speedup with only 1.71% BD-rate penalty (Eimon et al., 9 Dec 2025).

Tool-level analysis shows negligible impact (<0.5% absolute drop) on mAP/MOTA for detection/tracking when using these custom profiles. For instance segmentation tasks, the benefit is even more pronounced (Eimon et al., 9 Dec 2025).

5. Feature Compression Strategies and Adaptations

Multiple adaptations are layered above the codec core to further optimize the FCM pipeline:

Z-score normalization aligns quantized features to match pre-quantization statistics, improving task accuracy and reducing overhead bitrates by 17% on average (up to 66% for tracking) compared to framewise min/max scaling (Eimon et al., 10 Dec 2025).
Channel pruning is performed using channel activity maps, which are signaled as a bitmask; pruned channels can be restored by mean-filling or similar strategies (Eimon et al., 11 Dec 2025).
Temporal downsampling and linear interpolation enable further bandwidth reduction without compromising performance in low-motion applications (Eimon et al., 10 Dec 2025).
Advanced bit allocation using per-feature importance enables targeted rate–accuracy optimization, particularly for multiscale features (FPN) or stereo/multi-view tasks (Liu et al., 25 Mar 2025, Jin et al., 20 Feb 2025).

6. Experimental Validation and Task-Specific Performance

Standardized experiments on multi-task and multi-dataset benchmarks demonstrate the effectiveness of FCM:

Instance segmentation (OpenImagesV6, Mask R-CNN): –94.24% BD-rate
Object detection (OpenImagesV6, SFU, Faster R-CNN): –95.45% BD-rate (OpenImagesV6), –85.91% (SFU class D)
Tracking (TVD, HiEve): –94.57% and –94.58% BD-rate (Eimon et al., 11 Dec 2025, Eimon et al., 10 Dec 2025)

Accuracy for all tasks remains within <1% of edge-only inference, and the modular feature codec supports arbitrary split points and DNN backbones. Encoder complexity exceeds the on-device network tail by a factor of 4–12×; decoder complexity is ~0.3× that of on-device network head (Eimon et al., 11 Dec 2025, Eimon et al., 10 Dec 2025).

7. Deployment, Privacy, and Future Evolution

FCM has significant practical benefits:

Bandwidth and Compute Efficiency: Edge devices transmit dramatically reduced payloads, with bandwidth savings exceeding an order of magnitude.
Privacy: Activation maps lack human-interpretable content, greatly raising the difficulty of model inversion attacks (Azizian et al., 2022).
Interoperability and Scalability: The bitstream syntax and codec pipeline invite cross-vendor, cross-platform support.
Deployment: FCM underpins emerging smart city, automotive, and consumer vision applications, enabling low-latency, distributed AI analytics (Eimon et al., 11 Dec 2025, Eimon et al., 10 Dec 2025).

Ongoing directions include reducing encoder complexity, improving universality (task/architecture agnosticism), integrating dynamic and task-aware bit allocation, and adaptive signaling of statistics. Flexible profile selection and plug-in codec configurations facilitate deployment in resource-constrained and latency-sensitive environments (Eimon et al., 9 Dec 2025).

Selected Table: FCM Fast/Faster/Fastest Profiles—BD-rate and Speedup (Eimon et al., 9 Dec 2025)

Profile	Avg. BD-Rate (%)	Encoder Speedup
Fast	–2.96	1.28×
Faster	–1.85	2.06×
Fastest	+1.71	23×

These profiles illustrate the trade-offs available to FCM adopters, with speedup options spanning from moderately faster encoding with BD-rate savings, to ultra-fast encoding with minimal accuracy penalty.

Feature Coding for Machines is now a mature, standardized, and experimentally validated technology, anchoring a new regime for scalable, privacy-enhanced, and task-adaptive visual analytics pipelines in the era of intelligent devices and pervasive machine-to-machine communication (Eimon et al., 11 Dec 2025, Eimon et al., 10 Dec 2025, Eimon et al., 9 Dec 2025, Eimon et al., 10 Dec 2025, Eimon et al., 11 Dec 2025).