Functional Batch Codes
- Functional Batch Codes are families of linear codes designed for distributed storage systems that enable disjoint recovery sets to serve any multiset of k linear combination requests.
- They generalize standard k-batch and functional PIR codes, employing simplex and Hadamard constructions to achieve near-optimal code lengths and performance.
- Current research leverages combinatorial designs, algebraic methods, and algorithmic recovery strategies to improve bounds and address open problems in code optimality and locality.
A functional batch code is a family of linear codes designed for distributed storage systems which guarantee that any multiset of requests—each a linear combination of independent information symbols—can be answered by disjoint recovery sets of servers, with each recovery set yielding its respective requested combination. The central question is, for given and , what is the minimum code length (i.e., the minimum number of servers) required for such a code. Functional batch codes generalize standard -batch codes and functional PIR codes, and are deeply connected to simplex code constructions, combinatorial designs, and storage-efficient distributed retrieval.
1. Definition and Core Properties
A linear functional -batch code of dimension and length , denoted , is specified by a generator matrix . Each server stores a code symbol , where is the information vector and is the -th column of . For any multiset of request vectors , there exist pairwise disjoint recovery sets such that for each ,
with all computations over . The code must serve all such multisets, including repeated and arbitrary requests. Special cases include:
- Functional -PIR code (FP): all requests coincide.
- -batch code: each request is a unit vector (single information symbol).
Standard parameters include and analogous PIR/batch parameters (Yohananov et al., 2021).
2. Main Bounds and Existence Results
Functional batch codes are subject to tight bounds relating , , and .
Binary Codes, Maximal Batch Size:
- Conjecture (Zhang–Etzion–Yaakobi): For , the minimal code length is , i.e., (Yohananov et al., 2021, Yohananov et al., 19 Jan 2025).
- Verified for via computer-assisted proofs; all known constructions use the binary simplex code, whose columns enumerate nonzero vectors of .
Improved Existence:
- Hadamard-based Construction: There exists an code for , closing the prior gap where only was achievable (Yohananov et al., 2021).
- Optimal for : ; constructed via double-Hadamard matrices (Yohananov et al., 2021).
- For general , tight lower bounds are supplied (e.g., sphere-covering arguments), with asymptotic minimum length as (Zhang et al., 2019).
Generalized Regimes (Nonbinary):
- Over , for , and ,
with explicit values computed for small parameters, e.g., (Kilic et al., 4 Aug 2025).
- Asymptotically, for , (Kilic et al., 4 Aug 2025).
3. Constructions: Simplex and Hadamard Codes
The most prominent constructions for functional batch codes use simplex (Hadamard) codes:
- Simplex Code Construction: The simplex code, with generator matrix containing all nonzero vectors of , achieves functional batch codes optimal for . For , it serves requests (Kong et al., 2023). The double-simplex serves requests and is optimal (Yohananov et al., 2021).
- Coset-graph and polynomial partitioning: Recovery sets correspond to disjoint cosets or pair partitions in the support space, relying on combinatorial and algebraic methods, including Nullstellensatz and Vandermonde matrix criteria (Yohananov et al., 19 Jan 2025).
Algorithmic Recovery (Combinatorial):
Codes exploit multigraph decompositions (using an offset vector ). Cycles in the multigraph correspond to recovery set partitions, and recovery is guaranteed via careful path selection and reordering, ensuring disjoint recovery sets and collision avoidance (Yohananov et al., 2021).
4. Lower Bounds, Asymptotics, and Rate Analysis
Information-theoretic techniques provide fundamental limits:
- Redundancy bounds: For functional -batch codes, the minimum redundancy matches ordinary batch code bounds, with explicit polynomial-counting inequalities given (Kong et al., 2023).
- Labelling recursion for restricted locality: For batch codes with maximum recovery set size , code length must satisfy
growing exponentially with for constant (Oksner et al., 18 Jan 2026).
- Asymptotic tightness: For , optimal constructions use double-simplex codes with columns for . Double-simplex constructions achieve near-optimality within a factor of 2 for small (Oksner et al., 18 Jan 2026).
5. Functional Batch Array Codes and Locality Constraints
Functional batch codes are extensively studied in array format, generalizing recovery to multiple requests and multiple reads per storage column (Nassar et al., 2020):
- (s,k,m,t,\ell) functional batch array codes: Designed for arrays of bits, allowing up to bits to be read from each column per request. Recovery sets are subsets of columns, bounded by locality .
- Lower bounds via Stirling counts: Minimal number of columns grows with the number of possible requests and the combinatorics of partitioning recovery sets.
- Construction paradigms: Codes arise from partitioning the data space into spreads, using combinatorial designs, and covering code reductions, providing flexibility in controlling retrieval locality and code rate.
6. Algebraic and Graph-Theoretic Methods
Recent advances recast the existence question as equivalent to algebraic and graph-theoretic problems:
- Pairing and polynomial criteria: The optimality conjecture for is equivalent to finding partitions of into pairs satisfying specified vector sums, or establishing non-vanishing of special polynomials in a suitable quotient ring (Yohananov et al., 19 Jan 2025).
- Nullstellensatz and Vandermonde conditions: Codes corresponding to full-rank Vandermonde-type matrices over extension fields yield new sufficiency criteria, broadening known ad-hoc sufficient conditions and relating combinatorial existence to algebraic nondegeneracy (Yohananov et al., 19 Jan 2025).
7. Open Problems and Future Directions
Despite rapid progress, several questions remain unresolved:
- Full generality of the simplex code conjecture: The statement , while verified for small , is open for all . Algebraic sufficient conditions have been recognized for large parameter ranges, but exhaustive or combinatorial proofs are lacking (Yohananov et al., 19 Jan 2025).
- Extension to nonbinary and larger fields: Precisely determining minimal code lengths for functional batch codes over , especially for , remains open, with ongoing research offering asymptotic and constructive bounds (Kilic et al., 4 Aug 2025).
- Locality-optimal constructions: Explicit codes for fixed locality matching established lower bounds are essentially unknown; finding such families remains an outstanding challenge (Oksner et al., 18 Jan 2026).
- Tight asymptotics for small locality: Quantifying the precise constant-factor increase in code length when constraining recovery sets to small size is an active research front (Oksner et al., 18 Jan 2026).
The study of functional batch codes continues to impact storage system design, random I/O codes, and associated combinatorial structures, with simplex-based, algebraic, and array formulations providing the foundation for current and future advances.