Floating-Floating-Point (F2P)
- Floating-Floating-Point (F2P) is a variable-precision representation that dynamically allocates bits between exponent and mantissa using a hyper-exponent field.
- It employs a dynamic partitioning algorithm that selects the smallest hyper-exponent to match the bit-length of the intended exponent, optimizing precision in targeted value intervals.
- F2P has shown significant performance improvements in applications like federated learning and neural network quantization, reducing mean-squared error compared to traditional formats.
Floating-Floating-Point (F2P) is a variable-precision floating-point number representation introduced to optimize the trade-off between dynamic range and numerical accuracy in narrow bit-width formats, such as 8-bit words, frequently used in resource-constrained contexts like federated learning, natural language processing, and high-speed network measurement. F2P achieves this flexibility by allowing the partitioning between exponent and mantissa to be dynamically determined on a per-value basis via a compact “hyper-exponent” field, enabling large counting ranges along with selective local precision enhancement over critical value intervals (Cohen et al., 2024).
1. Formal Specification and Bit-Level Construction
Floating-Floating-Point numbers utilize a flexible partitioning strategy where the total bit-width is statically defined, and the division between exponent and mantissa is determined dynamically at encode time. Each F2P word consists of the following:
- s: Sign bit (optional, present in signed mode)
- h: Hyper-exponent field (H bits, unsigned integer)
- e: Exponent field (length bits, where denotes the decoded integer value of )
- m: Mantissa field (occupies the remaining bits)
The hyper-exponent encodes , which directly determines the width of the exponent field . The exponent’s value is , and the mantissa fraction is . The decoded real value 0 is given by:
1
where 2 is a format-dependent bias analogous to IEEE-754 (Cohen et al., 2024).
2. Dynamic Partitioning Algorithm
The F2P encoding process operates as follows:
- Exponent/Mantissa Derivation: Normalize the input real 3 to the nearest floating-point representation, identifying the candidate exponent 4 and mantissa 5.
- Hyper-exponent Determination: Select the smallest hyper-exponent 6 such that 7 the bit-length of 8.
- Packing: Encode 9, then the lower 0 bits of 1 into 2, and finally the leading 3 bits of the normalized mantissa fraction into 4.
Decoding in hardware requires first extracting 5 to determine 6, then partitioning the remaining bits accordingly to recover 7 and 8, and reconstruct 9 via the value formula. The dynamic bit-wiring is accomplished with modest logic resources—a small 0-entry lookup for 1 and select/mux structures for shifting bits between exponent and mantissa fields. The partitioning thresholds in the value domain occur precisely where 2 increments, typically aligning to powers of two (Cohen et al., 2024).
3. Representable Range and Precision Properties
F2P’s flexible partitioning yields representational properties determined by 3 and the overall bit-width 4:
- Range: The minimum exponent 5, and maximum 6 (with 7 and 8 at maximal values).
- F2P Value Range:
- 9
- 0
- Error Analysis:
- Within any exponent bin 1, the spacing is 2, so
- 3.
- F2P admits local adjustment of mantissa width 4 and thus can selectively minimize quantization error in application-critical subranges. This behavior contrasts with fixed-field IEEE-754 variants (e.g., FP8/16), wherein 5 is globally determined by static mantissa and exponent widths.
This dynamic range and accuracy distribution—tunable through “flavor” selection (e.g., Short-Range, Long-Range, Short-Integer, Long-Integer)—allows significant mean-squared-error (MSE) improvements for workloads targeting specific value domains (Cohen et al., 2024).
4. Hardware Realization
F2P overlays existing floating-point datapaths for common operations (normalization, rounding, basic arithmetic). The only additional hardware is confined to:
- Hyper-exponent decoding (small combinational logic or lookup)
- Multiplexers to shift bits dynamically between exponent and mantissa
- Biasing logic conditioned on the chosen F2P flavor
No full post-layout timing, power, or gate-count analysis is provided for F2P in 8-bit deployments. However, as major arithmetic units (adders, multipliers) are re-used unmodified, only small increases in area and latency are anticipated, primarily due to packing/unpacking overhead. Full hardware implementation and physical design space characterization remain open areas for future investigation (Cohen et al., 2024).
5. Empirical Evaluation Across Applications
F2P’s merits are quantitatively established by application-centric benchmarks:
- Approximate Counters: In per-flow packet counting (on-arrival increments), F2P_LI achieves the lowest MSE at 8–16 bits versus schemes such as Morris, CEDAR, SEAD. Example: at 8 bits, F2P_LI is 6 MSE (best), CEDAR 7, Morris 8, SEAD 9 (relative MSE per bit-width) (Cohen et al., 2024).
- Neural Network Quantization: For 8-, 16-, and 19-bit quantization of pretrained ResNet18/50 and MobileNetV2/V3, F2P_SR and F2P_LR formats yield up to 0 reduction in quantization MSE versus conventional FP8/16, BF16, TF32 formats at 16/19 bits. At 8 bits, the overhead of the hyper-exponent field can outweigh the precision benefit, leading to slightly inferior performance compared with optimal fixed FP8 encodings in most models (Cohen et al., 2024).
These results illustrate F2P’s proficiency in domains requiring either very wide dynamic range, locally concentrated precision, or both.
6. Advantages, Limitations, and Future Outlook
Advantages:
- Fine-grained precision control within user-relevant subranges (via four main flavors: SR, LR, SI, LI)
- Retention of classic floating-point arithmetic semantics enables drop-in compatibility with existing FP logic
- Empirically validated MSE reduction for both approximate large integer counters and mid-precision DNN weight quantization
Limitations:
- Hyper-exponent field consumes 1 bits, diminishing the effective bit budget for mantissa and exponent, which may degrade overall utility in ultra-narrow encodings (e.g., 8 bits)
- Absence of published ASIC/FPGA PPA (performance, power, area) data to support hardware cost claims
- Quantization performance is primarily measured under static min–max scaling; dynamic or mixed-precision deployments may require supplementary control logic (Cohen et al., 2024)
A plausible implication is that automated toolchains for F2P encoding, as well as dynamic flavor selection and integration into mixed-precision training pipelines, are likely priorities for future research and adoption.
7. Relationship to Prior Art and Research Directions
F2P stands distinct from conventional IEEE-754 floating-point factorization methods, which rely on fixed radix and rigid exponent/mantissa partitioning (Andrlon, 2021). It also diverges from mixed-radix conversion hardware (e.g., IEEE754-2008 binary-to-decimal/decimal-to-binary) where precision is tuned by changing the radix but not by adapting the intra-word field widths (Kupriianova et al., 2013). By introducing in-word dynamic structural variation, F2P broadens the design landscape for number representations in compact hardware and software environments, suggesting additional research in:
- Full-system hardware synthesis and PPA characterization
- Adaptive/mixed-precision flows for deep learning accelerators
- Compiler and tooling support for workload-aware flavor selection and automated bit allocation
Empirical demonstrations of up to 2 MSE reductions over state-of-the-art for relevant network and ML tasks substantiate F2P’s technical significance and motivate continued exploration in narrow-word, accuracy-flexible arithmetic (Cohen et al., 2024).