Linear Functional Batch Codes
- Linear functional batch codes are combinatorial structures that encode k symbols into n coded symbols, enabling recovery of t arbitrary, nonzero linear combinations using disjoint, small-sized recovery sets.
- They ensure low locality by limiting recovery set size to r, which is crucial for efficient performance in distributed storage systems and private information retrieval applications.
- Constructions based on simplex, double-simplex, and Hadamard methods illustrate trade-offs between redundancy, batch size, and recovery locality, while lower bounds reveal exponential growth in code length with increasing k.
A linear functional batch code is a combinatorial structure for encoding information symbols %%%%1%%%% over a field (typically ) into coded symbols such that, for any batch of queries—where each query is an arbitrary nonzero linear combination of the information symbols—it is possible to efficiently recover each query using disjoint, small-size recovery sets of coded symbols. Functional batch codes generalize classical batch and PIR codes by supporting requests for arbitrary linear combinations, rather than just individual coordinates. These codes are motivated by applications in load balancing and private information retrieval in distributed storage systems, where controlling the locality (i.e., the number of coded symbols accessed per query) is essential for practical efficiency.
1. Formal Definitions and Notation
A linear functional batch code over is defined by a generator matrix with columns , encoding an information vector as . For parameters :
- For any nonzero vectors (the queries), there exist pairwise disjoint recovery sets , each of size at most , and for each , scalars such that
so that upon querying for and linearly combining as , one obtains .
- The code locality parameter bounds the maximum recovery set size; codes with small are of practical interest.
Classical batch codes are the special case where each is a standard basis vector, while functional batch codes admit arbitrary nonzero .
2. Lower Bounds on Length and Redundancy
A central research direction is to quantify, for given , the minimum code length (alternatively, redundancy ).
General Counting Bounds
For functional batch codes with locality , a core bound involves counting the number of possible labellings of the positions by labels (one for each query) such that each label appears between $1$ and times and all labels correspond to disjoint recovery sets. This count, denoted , must satisfy
since there are ordered -tuples of nonzero queries . Analysis using exponential generating functions yields explicit recursions and asymptotic estimates for .
Key Lower Bound Formulas
Plugging explicit estimates for leads to the following lower bounds for (see (Oksner et al., 18 Jan 2026)):
- For fixed and ,
or, via recursion,
- Specializing to :
These results demonstrate that, for small , must exhibit exponential growth in for large .
3. Code Constructions and Optimality Conjectures
Simplex and Double-Simplex Constructions
The canonical construction is the binary simplex code, whose generator matrix consists of all nonzero vectors in . This code:
- Is conjectured to realize functional batch code parameters for all (Yohananov et al., 19 Jan 2025, Oksner et al., 18 Jan 2026, Zhang et al., 2019).
- Supports efficient recovery with minimal length when , and is verified for small by computer.
- Can be doubled (double-simplex) to achieve codes.
Hadamard and Parallel RIO Code Constructions
Hadamard-based and RIO (Random I/O) code constructions can yield functional batch codes with nearly optimal parameters for larger batches and recovery set sizes (Yohananov et al., 2021, Zhang et al., 2019). However, their tightness with respect to general lower bounds is less well-understood compared to simplex-based approaches.
Table: Exemplary Constructions and Conjectured Optimality
| Code Family | Length | Dimension | Batch Size | Locality | Status (binary case) |
|---|---|---|---|---|---|
| Simplex | $2$ | Conjectured optimal | |||
| Double-simplex | $2$ | Proven | |||
| Hadamard-based | $2$ | Achievable (Yohananov et al., 2021) |
4. Asymptotic Behavior and Parameter Scaling
For fixed small locality , the required code length for functional batch codes scales as (Oksner et al., 18 Jan 2026), meaning that the redundancy grows exponentially in for constant locality. When locality is unbounded, the batch size can scale linearly with at fixed rate, as formalized in
However, fixing yields vanishing rate exponentially fast as .
Generalizing to nonbinary fields , bounds are obtainable by similar counting techniques. For and -functional batch codes of dimension ,
demonstrating qualitatively similar scaling (Kilic et al., 4 Aug 2025).
5. Connections to Other Combinatorial Structures
Equivalence with algebraic structures leads to new sufficient (and in some cases necessary) conditions for optimality. Particularly, for requests and servers, the existence problem is equivalent to:
- Partitioning into disjoint pairs with prescribed sums,
- Nonsingularity of certain Vandermonde matrices,
- Nonvanishing of a multivariate polynomial in a quotient ring (Yohananov et al., 19 Jan 2025).
These characterizations allow algebraic techniques (e.g., Nullstellensatz, degree bounds) to establish optimality for new parameter regimes or to design computer-aided verification for small .
6. Locality, Recovery Sets, and Practical Implications
Practical deployment in distributed storage (e.g., for load balanced I/O or PIR) requires bounding the recovery set size . Theoretical results show that, to provide for arbitrary linear query batches while constraining remains extremely expensive in terms of code length, owing to the exponential dependence on .
Open questions include:
- Proving or refuting the simplex code conjecture for all ,
- Constructing functional batch codes with prescribed small locality that come close to lower bounds up to multiplicative constants,
- Understanding two-regime scaling—fixed versus growing ,
- Extending constructions and bounds to nonbinary fields or asynchronized request models (Oksner et al., 18 Jan 2026, Kilic et al., 4 Aug 2025, Kong et al., 2023).
7. Open Problems and Research Directions
Substantial gaps persist between lower and upper bounds for general parameters, especially when and for nonbinary alphabets. Open problems include (Oksner et al., 18 Jan 2026, Yohananov et al., 19 Jan 2025, Zhang et al., 2019):
- Closing the gap for the minimum length of general functional batch codes with small locality,
- Proving the functional batch code conjecture for all ,
- Discovering explicit constructions matching lower bounds for larger ,
- Formulating and analyzing the correct list size for functional batch codes in nonbinary settings,
- Tightening the additive terms in lower bounds, especially for practical batch sizes,
- Combinatorial and algebraic characterization of recovery set assignment and their algorithmic construction.
Ongoing work leverages methods from algebraic combinatorics, finite geometry, and probabilistic methods. The algebraic approach via quotient rings and polynomial degree conditions continues to be a promising avenue for certifying optimality and extending the theory into new regimes (Yohananov et al., 19 Jan 2025, Kilic et al., 4 Aug 2025).