Feature Coding Test Model (FCTM)

Updated 31 December 2025

FCTM is a formal framework for evaluating and standardizing neural feature coding, compression, and transmission in edge/cloud split inference systems.
It defines encoder pipelines using feature fusion, channel tiling, quantization, and VVC-based codec compression to optimize performance and efficiency.
The model ensures interoperability and privacy while achieving significant BD-Rate savings and practical deployment benefits in machine learning pipelines.

The Feature Coding Test Model (FCTM) is a formal framework for evaluating, standardizing, and deploying the coding, compression, and transmission of neural intermediate features in machine learning systems—particularly in edge/cloud split inference scenarios. FCTM is foundational to the MPEG-AI Feature Coding for Machines (FCM) standard, enabling interoperable, efficient, and accurate exchange of abstract, high-dimensional features extracted by neural networks. The model comprises rigorous pipeline architectures, standardized datasets, encoder/decoder toolsets, and comprehensive metrics evaluating compression efficiency, downstream task accuracy, computational complexity, and practical deployment constraints (Eimon et al., 9 Dec 2025, Gao et al., 2024, Eimon et al., 11 Dec 2025, Chen et al., 26 Sep 2025).

1. Formal Definition and Architectural Frameworks

FCTM formalizes the problem of feature coding as the construction and evaluation of pipeline configurations, codec profiles, and benchmarks aimed at compressing intermediate neural activations. In split-inference systems, a source neural network is partitioned into NN Part-1 on the client/edge device and NN Part-2 in the cloud/server. NN Part-1 processes $T$ frames to yield $N$ intermediate feature tensors $X = \{ x_n \}_{n=1}^N$ , where $x_n \in \mathbb{R}^{T \times C_n \times H_n \times W_n}$ (Eimon et al., 9 Dec 2025, Eimon et al., 11 Dec 2025).

The FCTM encoder pipeline typically comprises:

Feature Fusion: Splicing $\{ x_n \}$ into a single tensor $x_f$ to reduce channel/spatial dimensionality.
Channel Tiling and Quantization: Mapping $x_f$ to 2D tiled frames and quantizing to $b$ bits (commonly $b=10$ ).
Codec Compression: Encoding quantized frames using a standardized video codec (e.g., VVC), with profiles tailored for feature statistics instead of perceptual fidelity.

The decoder performs the inverse mapping to reconstruct features for continued inference by NN Part-2. This design sits at the interface between feature extraction and signal compression, underpinning efficient feature data transport irrespective of human perceptual requirements (Eimon et al., 9 Dec 2025, Eimon et al., 11 Dec 2025, Gao et al., 2024).

2. Codec Profiles, Bitstream Syntax, and Tool-Level Tradeoffs

FCTM specifies encoder-side configurations designed for efficient feature coding using the Versatile Video Coding (VVC) reference software. Three key profiles are defined (Eimon et al., 9 Dec 2025):

Profile	Disabled Tool Groups	Encoding Time Reduction	BD-Rate Change
Fast	SAO, DBF, ALF (in-loop filters)	21.8%	–2.96%
Faster	+ Affine, SbTMVP, MRL, IMV, CIIP, MMVD, BCW, GEO, motion search ≤16	51.5%	–1.85%
Fastest	+ Block partition depth >1, MRL, ISP	95.6%	+1.71%

All profiles are decoder-agnostic and leave bitstream syntax unmodified, using profile identifiers within the SPS/VUI for signaling. Tool-level ablation studies confirm that in-loop filters (SAO, DBF, ALF) degrade feature statistics, and disabling them can improve BD-Rate. Transform tools (MTS, SBT, dependent quantization), intra sub-partitioning (ISP), and extended directional intra modes remain crucial for coding abstract neural features; their removal results in significant BD-Rate losses (Eimon et al., 9 Dec 2025).

3. Unified Evaluation Pipelines and Metrics

FCTM provides standardized testbeds and evaluation pipelines for benchmarking feature coding methods (Gao et al., 2024, Chen et al., 26 Sep 2025). The canonical FCTM pipeline consists of:

Truncation and quantization of feature maps.
Packing features into 2D images compatible with codecs.
Encoding and decoding via baselines (VVC-VTM and hyperprior VAEs).
Reconstructing quantized features for downstream inference.

Quantitative metrics include:

Compression Ratio (CR):

$\mathrm{CR} = \frac{\text{original feature size (bits)}}{\text{encoded bitstream size (bits)}}$

Rate–Distortion:

$D(R) = \min_{F_c:\,\mathrm{rate}(F_c)\le R}\; \mathbb{E}\bigl[\|F - \widehat F\|^2\bigr]$

BD-Rate Calculation:

$\mathrm{BD\!-\!Rate} = \frac{1}{R_2 - R_1}\int_{D_1}^{D_2} \! R(D)\,dD \;-\;\frac{1}{R_2 - R_1}\int_{D_1}^{D_2} \! R_{\mathrm{anchor}}(D)\,dD$

Task-Specific Metrics: classification accuracy, mIoU, RMSE, CLIP score (Gao et al., 2024).

Practical results demonstrate that VVC-based coding achieves near-lossless performance until a critical Bits Per Feature Point (BPFP) threshold (~0.2), below which task accuracy collapses (Gao et al., 2024). Hyperprior baselines yield smoother curves but lower peak accuracy, particularly on non-image features.

4. Experimental Findings and Compression Efficiency

Across common test conditions and large model benchmarks, FCTM profiles deliver substantial bandwidth reduction while preserving downstream inference fidelity. Notable results include (Eimon et al., 11 Dec 2025, Eimon et al., 9 Dec 2025):

Instance Segmentation: –94.24% BD-Rate savings
Object Detection: –95.45% BD-Rate savings (OpenImagesV6)
Object Tracking: –92.67% to –94.58% savings (TVD/HiEve datasets)
Overall average: 85.14% rate reduction at <1% mAP/mIoU accuracy loss

Profile Fastest provides encoding time reduction of 95.6% with only a 1.71% BD-Rate loss—beneficial for ultra-low-power edge devices. Encoder/decoder complexity ratios indicate that the encoder is $~4.4\times$ cheaper than NN-Part2, while the decoder is $~0.27\times$ the cost of NN-Part1 (Eimon et al., 11 Dec 2025).

5. Interoperability, Scalability, and Privacy Considerations

FCTM’s strict bitstream and toolset standardization guarantees that any compliant decoder reconstructs features from any encoder configuration, promoting heterogeneous deployment across vendors and modalities. The modular architecture supports flexible split points for adaptation to device and network constraints (Eimon et al., 11 Dec 2025, Gao et al., 2024).

Privacy is enhanced since only quantized, shape-reduced features—not raw images—are transmitted. The inversion from feature tensors to original input is non-trivial, mitigating privacy risks. However, a plausible implication is that feature-level leakage remains possible, inviting future research on encryption and differential privacy methods for intermediate representations (Gao et al., 2024).

6. Extensions to Large Model Coding, Benchmarks, and Code Agent Evaluation

FCTM has been extended to benchmark large model feature coding and automated code agents. Datasets span visual and textual modalities (DINOv2, Llama3, SD3) with unified test conditions (Gao et al., 2024). Baseline codecs (VTM, hyperprior VAE) and open-source pipelines enable reproducible evaluation.

In coding agent evaluation (FeatBench), FCTM formalizes feature-implementation as the tuple $d_j=(R_j,C_j,T_j,P^*_j)$ , specifying natural-language requests, code repositories, ground-truth patches, and dual Fail-to-Pass (F2P)/Pass-to-Pass (P2P) test suites. Metrics include Resolved Rate (SR), Patch Apply Rate (AR), File-level Localization Rate (FLR), Regression Rate (RegR), and Aggressive Implementation Failure Rate (AIR). State-of-the-art agents (e.g., Trae-agent + GPT-5) achieve only 29.94% SR, highlighting semantic and generalization challenges in natural-language feature implementation (Chen et al., 26 Sep 2025).

7. Future Directions and Benchmark Expansion

Emerging research suggests further FCTM development in several areas:

Adding modalities (audio, video, multi-sensor) and scaling beyond 100B+ parameters (Gao et al., 2024).
Designing learned codecs purpose-built for feature data, supplanting adapted image/video codecs.
Incorporating semantic-aware distortion metrics to better correlate compression-induced feature degradation with downstream task performance.
Extending automated code-benchmarks to broader languages and architectural metrics (coupling, cohesion), and evolving difficulty calibration for robust agent assessment (Chen et al., 26 Sep 2025).

The FCTM framework thus underpins scalable, standardized, and privacy-aware deployment of neural feature coding across edge-to-cloud pipelines, large model inference, and automated code generation agents.