Bits Per Class (BPC) Efficiency Metric
- Bits Per Class (BPC) is a metric that quantifies the average number of bits required per distinct class, integrating entropy limits, system overhead, and resource constraints.
- It is applied across diverse domains such as optical reading, hardware throughput, neural computation, and data compression, enabling a unified efficiency evaluation.
- BPC informs design trade-offs in error correction, quantization, and model distillation by linking theoretical limits with practical system performance.
Bits Per Class (BPC) is a metric that quantifies the amount of information, storage, or transmission resources required per distinguishable category or "class" within a system. Its interpretation is context-dependent and spans fields such as optical reading, coding theory, hardware throughput, neural computation, machine learning data compression, quantized neural networks, and digital signal processing. In each setting, BPC acts as a unifying measure of efficiency, expressing the minimum or achieved bit cost for reliably encoding, transmitting, or distinguishing units that represent distinct classes, events, or categories.
1. Foundations and Definitions of Bits Per Class
BPC emerges as a principled, unit-consistent measure of information or resource usage per class. Formally, BPC is often defined as:
- The minimal average number of bits needed to distinguish or transmit one instance of a class (e.g., symbol, label, address, or pixel), given all system and redundancy constraints.
- A measure of physical or algorithmic efficiency, encapsulating entropy limits, system overhead, and the full bit cost of representations, auxiliary parameters, or contextual information.
Distinctions in terminology arise across disciplines:
- In optical systems, BPC is often synonymous with "bits per pixel" or "photon information efficiency," representing the number of bits reliably encoded per optical mode or memory cell.
- In neural computation, it morphs into "bits per joule" or bits transmitted per signaling event, interpreted over energy-constrained transmission processes.
- In hardware, BPC may denote "bits per clock cycle" or throughput, measuring hardware-level resource utilization.
- In dataset distillation and quantized networks, BPC generalizes to the average bit cost (storage, transmission, or computational) per class, sample, or filter, incorporating both compression efficiency and model accuracy.
2. BPC in Communication, Coding, and Sensing Systems
Optical Reading and Photon Information Efficiency
In optical reading, BPC plays a central role in quantifying the achievable information density per physical resource, typically a photon. The photon information efficiency (PIE) is defined as the ratio
where is the channel capacity per pixel (in bits), and is the average number of probe photons per pixel. Here, BPC corresponds to how many bits can be reliably extracted per "class" (e.g., pixel position) under constraints imposed by quantum mechanics and measurement noise (1207.6435).
The attainable PIE is fundamentally limited by the modulation format, the quantum measurement strategy, and the physical properties of the probe:
- With conventional coherent-state probes (modulating amplitude and phase) and direct detection, PIE is capped at approximately 0.5 bits per photon, limiting BPC per pixel.
- Advanced receivers employing joint (collective) detection and phase coding (e.g., the Hadamard code with a "Green Machine" joint receiver) can approach the Holevo limit, with PIE and thus BPC diverging as .
- Nonclassical probes (notably, single-photon W-states) allow error-free discrimination of exponentially many classes, yielding a practical implementation where BPC per photon can be
and, in the ideal, lossless scenario, arbitrarily large.
The engineering significance is that system-level BPC determines trade-offs between energy efficiency, achievable density, scalability, and sensitivity to loss.
Binary Detector Readout and Data Compression
In high-energy physics and distributed sensor systems, BPC is interpreted as the bits required per class of event, such as an address or hit in a strip detector (Garcia-Sciveres et al., 2013). Information theory prescribes that the entropy for encoding hits among channels is
when physical constraints prevent adjacent addresses. BPC thus lower-bounds the encoding cost per event.
Encoding methods are compared at multiple efficiency levels:
- Level 0: Address data only.
- Level 1: Includes engineering overhead (e.g., DC-balance, framing).
- Level 2: Context, cluster, and protocol bits are added.
Aggregative techniques, such as Pattern Overlay Compression, approach the entropy limit, minimizing effective BPC relative to the minimum achievable with naive address listing methods.
Error Correction and Soft-Detection Decoding
In soft-detection block code decoders, minimal BPC is important for reducing soft information bandwidth while retaining near-optimal decoding (Duffy, 2020). Here, per-received bit "class" reliability can be quantized to as little as bits for an -bit block, ranking bit importances and guiding efficient, codebook-agnostic noise search. Such compact representations preserve error-rate performance with minimal bit overhead, optimizing BPC in ultra-reliable low-latency (URLLC) applications.
3. BPC in Machine Learning: Compression, Quantization, and Distillation
Dataset Distillation and Rate-Utility Optimization
In dataset distillation, BPC is formalized as a central metric for evaluating the storage efficiency of compressed synthetic datasets (Bao et al., 23 Jul 2025). Instead of counting images per class (ipc), BPC is computed as the number of bits necessary to encode all components (latent code, class label, and decoder parameters) of the synthetic dataset, divided by the number of classes:
By jointly optimizing both the bit cost (rate) and downstream task performance (utility), the framework enables direct, information-theoretic comparisons between methods and allows for exploration of Pareto-optimal trade-offs under storage constraints. Experiments demonstrate up to greater compression than traditional approaches with comparable accuracy.
Neural Network Quantization
In resource-constrained deployment, class-based quantization leverages per-class importance to adaptively assign bit-widths to filters, neurons, or weights (Sun et al., 2022). The quantization process involves three main steps:
- Computing the importance of each neuron/filter for every class using a first-order Taylor expansion of the network output,
- Partitioning components according to their importance and assigning quantization levels such that average BPC meets hardware or accuracy targets,
- Refining the quantized network through knowledge distillation, thus preserving accuracy at low BPC settings.
The result is a quantized network where critical filters/neurons are preserved at higher precision, while others are aggressively compressed or pruned, yielding substantial reductions in average BPC with little loss of accuracy.
Variable-Bit Quantization in Edge Vision Systems
When digitizing sensor data at the edge, variable resolution quantization, such as via the Hadamard transform, allows the per-channel BPC to be set according to channel variance (Deb et al., 7 Oct 2024):
where is the bit-width for channel with variance and is the maximal bit-depth. This variable BPC allocation reduces wasted bits in low-variance (high-frequency) components, lowering system-wide energy and storage cost while preserving the information needed for accurate inference in convolutional neural networks.
4. BPC as a Throughput and Efficiency Metric in Hardware
In digital hardware architectures, especially in cryptographic implementations, "bits per class" frequently aligns with "bits per clock" (also "bpc" in the literature), serving as a measure of per-cycle throughput (Aagaard et al., 2019). For example, candidate ciphers for lightweight cryptography such as ACE and WAGE exhibit bpc ranging from 0.5 (serial processing) to over 4.5 (fully parallel, unrolled designs). The relationship is formalized as:
Optimizing bpc must be balanced against hardware area, energy per bit, and critical path constraints. In practice, a higher bpc achieved through parallelization can decrease energy per bit despite increasing area, contingent on the cryptographic structure’s compatibility with parallel hardware.
This formulation highlights how BPC, even when defined as bits per clock, encapsulates practical concerns in architectural throughput, hardware scaling, and area/energy trade-offs within constrained environments.
5. BPC in Neural Coding and Biological Systems
The concept of BPC generalizes to neurobiology as the number of bits reliably transmitted per class of spike or neural signaling event, normalized to physical or metabolic cost (Levy et al., 2016). Here, Jaynes’ maximum entropy method is used to identify, for a neuron serving as an estimator, the distribution of inter-pulse intervals (IPIs) that maximizes mutual information under energy and estimation constraints:
- Constraints include constant per-pulse energy cost, time-linear leak costs, and unbiased estimation requirements.
- The resulting bits-per-class, interpreted as bits per signal (or per joule), is
where is the mutual information between the latent variable and observed response . The relationship quantifies coding efficiency, linking cellular energy budgets to the information content of neural firing patterns.
The framework offers testable predictions about the dependency of informational efficiency on biophysical constraints, suggesting that evolutionary processes may favor neurons that maximize BPC under metabolic limits.
6. BPC as a Metric for Model and Signal Enhancement
BPC also surfaces in domains where model supervision or enhancement relies on class-based abstraction:
- In speech enhancement, contextual broad phonetic class (BPC) information is incorporated as auxiliary supervision (via end-to-end ASR loss) (Lu et al., 2020). Grouping similar phonemes into broad categories reduces loss due to misclassified fine-grained classes, enhancing robustness, generalization, and perceptual quality.
- In LLMing, bits-per-character (BPC) is used to benchmark sequential model performance (e.g., fast-slow RNNs), where lower BPC reflects greater predictive or compression efficiency across classes of output tokens (Mujika et al., 2017).
7. BPC: Integrative Perspectives and Practical Implications
Across application domains, BPC offers a unified lens for evaluating and optimizing information processing, coding, and storage efficiency. Its key roles include:
- Establishing lower bounds on achievable efficiency (entropy limits),
- Quantifying the effectiveness of new encoding, compression, or quantization algorithms,
- Informing architectural choices in hardware and algorithm design,
- Enabling fair, method-agnostic comparisons across disparate systems or scenarios.
BPC metrics also raise practical considerations and challenges:
- For quantum and nonclassical systems, the potential for unbounded BPC is restricted in practice by stringent loss and implementation constraints.
- In hardware and transmission, the trade-off between BPC, area, power, and latency requires careful, system-specific analysis.
- In deep learning and compression, maximizing BPC without unacceptable drops in accuracy or utility depends on fine-grained, task-informed quantization and encoding strategies.
Taken together, BPC serves as a cross-disciplinary measure of resource-constrained efficiency, guiding the design and evaluation of physical, digital, and learning systems for tasks involving classification, communication, enhancement, and storage.