Papers
Topics
Authors
Recent
Search
2000 character limit reached

Linear Functional Batch Codes

Updated 26 January 2026
  • Linear functional batch codes are combinatorial structures that encode k symbols into n coded symbols, enabling recovery of t arbitrary, nonzero linear combinations using disjoint, small-sized recovery sets.
  • They ensure low locality by limiting recovery set size to r, which is crucial for efficient performance in distributed storage systems and private information retrieval applications.
  • Constructions based on simplex, double-simplex, and Hadamard methods illustrate trade-offs between redundancy, batch size, and recovery locality, while lower bounds reveal exponential growth in code length with increasing k.

A linear functional batch code is a combinatorial structure for encoding kk information symbols x1,…,xkx_1, \ldots, x_k over a field (typically F2\mathbb{F}_2) into nn coded symbols y1,…,yny_1, \ldots, y_n such that, for any batch of tt queries—where each query is an arbitrary nonzero linear combination of the information symbols—it is possible to efficiently recover each query using disjoint, small-size recovery sets of coded symbols. Functional batch codes generalize classical batch and PIR codes by supporting requests for arbitrary linear combinations, rather than just individual coordinates. These codes are motivated by applications in load balancing and private information retrieval in distributed storage systems, where controlling the locality (i.e., the number of coded symbols accessed per query) is essential for practical efficiency.

1. Formal Definitions and Notation

A linear functional batch code over F2\mathbb{F}_2 is defined by a k×nk \times n generator matrix GG with columns g(j)g^{(j)}, encoding an information vector x1,…,xkx_1, \ldots, x_k0 as x1,…,xkx_1, \ldots, x_k1. For parameters x1,…,xkx_1, \ldots, x_k2:

  • For any x1,…,xkx_1, \ldots, x_k3 nonzero vectors x1,…,xkx_1, \ldots, x_k4 (the queries), there exist x1,…,xkx_1, \ldots, x_k5 pairwise disjoint recovery sets x1,…,xkx_1, \ldots, x_k6, each of size at most x1,…,xkx_1, \ldots, x_k7, and for each x1,…,xkx_1, \ldots, x_k8, scalars x1,…,xkx_1, \ldots, x_k9 such that

F2\mathbb{F}_20

so that upon querying F2\mathbb{F}_21 for F2\mathbb{F}_22 and linearly combining as F2\mathbb{F}_23, one obtains F2\mathbb{F}_24.

  • The code locality parameter F2\mathbb{F}_25 bounds the maximum recovery set size; codes with small F2\mathbb{F}_26 are of practical interest.

Classical batch codes are the special case where each F2\mathbb{F}_27 is a standard basis vector, while functional batch codes admit arbitrary nonzero F2\mathbb{F}_28.

2. Lower Bounds on Length and Redundancy

A central research direction is to quantify, for given F2\mathbb{F}_29, the minimum code length nn0 (alternatively, redundancy nn1).

General Counting Bounds

For functional batch codes with locality nn2, a core bound involves counting the number of possible labellings of the nn3 positions by nn4 labels (one for each query) such that each label nn5 appears between nn6 and nn7 times and all labels correspond to disjoint recovery sets. This count, denoted nn8, must satisfy

nn9

since there are y1,…,yny_1, \ldots, y_n0 ordered y1,…,yny_1, \ldots, y_n1-tuples of nonzero queries y1,…,yny_1, \ldots, y_n2. Analysis using exponential generating functions yields explicit recursions and asymptotic estimates for y1,…,yny_1, \ldots, y_n3.

Key Lower Bound Formulas

Plugging explicit estimates for y1,…,yny_1, \ldots, y_n4 leads to the following lower bounds for y1,…,yny_1, \ldots, y_n5 (see (Oksner et al., 18 Jan 2026)):

  • For fixed y1,…,yny_1, \ldots, y_n6 and y1,…,yny_1, \ldots, y_n7,

y1,…,yny_1, \ldots, y_n8

or, via recursion,

y1,…,yny_1, \ldots, y_n9

  • Specializing to tt0:

tt1

These results demonstrate that, for small tt2, tt3 must exhibit exponential growth in tt4 for large tt5.

3. Code Constructions and Optimality Conjectures

Simplex and Double-Simplex Constructions

The canonical construction is the binary tt6 simplex code, whose generator matrix consists of all nonzero vectors in tt7. This code:

  • Is conjectured to realize tt8 functional batch code parameters for all tt9 (Yohananov et al., 19 Jan 2025, Oksner et al., 18 Jan 2026, Zhang et al., 2019).
  • Supports efficient recovery with minimal length when F2\mathbb{F}_20, and is verified for small F2\mathbb{F}_21 by computer.
  • Can be doubled (double-simplex) to achieve F2\mathbb{F}_22 codes.

Hadamard and Parallel RIO Code Constructions

Hadamard-based and RIO (Random I/O) code constructions can yield functional batch codes with nearly optimal parameters for larger batches and recovery set sizes (Yohananov et al., 2021, Zhang et al., 2019). However, their tightness with respect to general lower bounds is less well-understood compared to simplex-based approaches.

Table: Exemplary Constructions and Conjectured Optimality

Code Family Length F2\mathbb{F}_23 Dimension F2\mathbb{F}_24 Batch Size F2\mathbb{F}_25 Locality F2\mathbb{F}_26 Status (binary case)
Simplex F2\mathbb{F}_27 F2\mathbb{F}_28 F2\mathbb{F}_29 k×nk \times n0 Conjectured optimal
Double-simplex k×nk \times n1 k×nk \times n2 k×nk \times n3 k×nk \times n4 Proven
Hadamard-based k×nk \times n5 k×nk \times n6 k×nk \times n7 k×nk \times n8 Achievable (Yohananov et al., 2021)

4. Asymptotic Behavior and Parameter Scaling

For fixed small locality k×nk \times n9, the required code length GG0 for functional batch codes scales as GG1 (Oksner et al., 18 Jan 2026), meaning that the redundancy grows exponentially in GG2 for constant locality. When locality is unbounded, the batch size GG3 can scale linearly with GG4 at fixed rate, as formalized in

GG5

However, fixing GG6 yields vanishing rate GG7 exponentially fast as GG8.

Generalizing to nonbinary fields GG9, bounds are obtainable by similar counting techniques. For g(j)g^{(j)}0 and g(j)g^{(j)}1-functional batch codes of dimension g(j)g^{(j)}2,

g(j)g^{(j)}3

demonstrating qualitatively similar scaling (Kilic et al., 4 Aug 2025).

5. Connections to Other Combinatorial Structures

Equivalence with algebraic structures leads to new sufficient (and in some cases necessary) conditions for optimality. Particularly, for g(j)g^{(j)}4 requests and g(j)g^{(j)}5 servers, the existence problem is equivalent to:

  • Partitioning g(j)g^{(j)}6 into g(j)g^{(j)}7 disjoint pairs with prescribed sums,
  • Nonsingularity of certain Vandermonde matrices,
  • Nonvanishing of a multivariate polynomial in a quotient ring (Yohananov et al., 19 Jan 2025).

These characterizations allow algebraic techniques (e.g., Nullstellensatz, degree bounds) to establish optimality for new parameter regimes or to design computer-aided verification for small g(j)g^{(j)}8.

6. Locality, Recovery Sets, and Practical Implications

Practical deployment in distributed storage (e.g., for load balanced I/O or PIR) requires bounding the recovery set size g(j)g^{(j)}9. Theoretical results show that, to provide for arbitrary linear query batches while constraining x1,…,xkx_1, \ldots, x_k00 remains extremely expensive in terms of code length, owing to the exponential dependence on x1,…,xkx_1, \ldots, x_k01.

Open questions include:

  • Proving or refuting the simplex code conjecture for all x1,…,xkx_1, \ldots, x_k02,
  • Constructing functional batch codes with prescribed small locality x1,…,xkx_1, \ldots, x_k03 that come close to lower bounds up to multiplicative constants,
  • Understanding two-regime scaling—fixed x1,…,xkx_1, \ldots, x_k04 versus growing x1,…,xkx_1, \ldots, x_k05,
  • Extending constructions and bounds to nonbinary fields or asynchronized request models (Oksner et al., 18 Jan 2026, Kilic et al., 4 Aug 2025, Kong et al., 2023).

7. Open Problems and Research Directions

Substantial gaps persist between lower and upper bounds for general parameters, especially when x1,…,xkx_1, \ldots, x_k06 and for nonbinary alphabets. Open problems include (Oksner et al., 18 Jan 2026, Yohananov et al., 19 Jan 2025, Zhang et al., 2019):

  • Closing the gap for the minimum length of general functional batch codes with small locality,
  • Proving the functional batch code conjecture x1,…,xkx_1, \ldots, x_k07 for all x1,…,xkx_1, \ldots, x_k08,
  • Discovering explicit constructions matching lower bounds for larger x1,…,xkx_1, \ldots, x_k09,
  • Formulating and analyzing the correct list size for functional batch codes in nonbinary settings,
  • Tightening the additive x1,…,xkx_1, \ldots, x_k10 terms in lower bounds, especially for practical batch sizes,
  • Combinatorial and algebraic characterization of recovery set assignment and their algorithmic construction.

Ongoing work leverages methods from algebraic combinatorics, finite geometry, and probabilistic methods. The algebraic approach via quotient rings and polynomial degree conditions continues to be a promising avenue for certifying optimality and extending the theory into new regimes (Yohananov et al., 19 Jan 2025, Kilic et al., 4 Aug 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Linear Functional Batch Codes.