Papers
Topics
Authors
Recent
Search
2000 character limit reached

Linear Functional Batch Codes

Updated 26 January 2026
  • Linear functional batch codes are combinatorial structures that encode k symbols into n coded symbols, enabling recovery of t arbitrary, nonzero linear combinations using disjoint, small-sized recovery sets.
  • They ensure low locality by limiting recovery set size to r, which is crucial for efficient performance in distributed storage systems and private information retrieval applications.
  • Constructions based on simplex, double-simplex, and Hadamard methods illustrate trade-offs between redundancy, batch size, and recovery locality, while lower bounds reveal exponential growth in code length with increasing k.

A linear functional batch code is a combinatorial structure for encoding kk information symbols %%%%1%%%% over a field (typically F2\mathbb{F}_2) into nn coded symbols y1,,yny_1, \ldots, y_n such that, for any batch of tt queries—where each query is an arbitrary nonzero linear combination of the information symbols—it is possible to efficiently recover each query using disjoint, small-size recovery sets of coded symbols. Functional batch codes generalize classical batch and PIR codes by supporting requests for arbitrary linear combinations, rather than just individual coordinates. These codes are motivated by applications in load balancing and private information retrieval in distributed storage systems, where controlling the locality (i.e., the number of coded symbols accessed per query) is essential for practical efficiency.

1. Formal Definitions and Notation

A linear functional batch code over F2\mathbb{F}_2 is defined by a k×nk \times n generator matrix GG with columns g(j)g^{(j)}, encoding an information vector xF2kx \in \mathbb{F}_2^k as y=xG=(y1,,yn)y = x G = (y_1, \dots, y_n). For parameters (n,k,t,r)(n, k, t, r):

  • For any tt nonzero vectors v1,,vtF2kv_1, \ldots, v_t \in \mathbb{F}_2^k (the queries), there exist tt pairwise disjoint recovery sets R1,,Rt[n]R_1, \ldots, R_t \subseteq [n], each of size at most rr, and for each \ell, scalars γj,\gamma_{j,\ell} such that

v=jRγj,g(j),v_\ell = \sum_{j \in R_\ell} \gamma_{j,\ell} g^{(j)},

so that upon querying yjy_j for jRj \in R_\ell and linearly combining as jRγj,yj\sum_{j \in R_\ell} \gamma_{j,\ell} y_j, one obtains xvx \cdot v_\ell.

  • The code locality parameter rr bounds the maximum recovery set size; codes with small rr are of practical interest.

Classical batch codes are the special case where each vv_\ell is a standard basis vector, while functional batch codes admit arbitrary nonzero vv_\ell.

2. Lower Bounds on Length and Redundancy

A central research direction is to quantify, for given (k,t,r)(k, t, r), the minimum code length nn (alternatively, redundancy nkn - k).

General Counting Bounds

For functional batch codes with locality rr, a core bound involves counting the number of possible labellings of the nn positions by tt labels (one for each query) such that each label [t]\ell \in [t] appears between $1$ and rr times and all labels correspond to disjoint recovery sets. This count, denoted θt,r(n)\theta_{t, r}(n), must satisfy

θt,r(n)(2k1)t,\theta_{t, r}(n) \geq (2^k - 1)^t,

since there are (2k1)t(2^k - 1)^t ordered tt-tuples of nonzero queries vv_\ell. Analysis using exponential generating functions yields explicit recursions and asymptotic estimates for θt,r(n)\theta_{t, r}(n).

Key Lower Bound Formulas

Plugging explicit estimates for θt,r(n)\theta_{t, r}(n) leads to the following lower bounds for nn (see (Oksner et al., 18 Jan 2026)):

  • For fixed rr and nt+rn \geq t + r,

ntt+12r+[(2k1)(r1)!]1/r,n \geq t - \frac{t+1}{2r} + \left[ (2^k -1)(r-1)! \right]^{1/r},

or, via recursion,

nt+r21+[(2k1)(r1)!]1/r.n \geq \frac{t + r}{2} - 1 + \left[ (2^k -1)(r-1)! \right]^{1/r}.

  • Specializing to r=2r=2:

n2(2k1)+34t54.n \geq \sqrt{2(2^k -1)} + \frac{3}{4}t - \frac{5}{4}.

These results demonstrate that, for small rr, nn must exhibit exponential growth in kk for large kk.

3. Code Constructions and Optimality Conjectures

Simplex and Double-Simplex Constructions

The canonical construction is the binary [2k1,k][2^k-1, k] simplex code, whose generator matrix consists of all nonzero vectors in F2k\mathbb{F}_2^k. This code:

  • Is conjectured to realize [2k1,k,2k1,2][2^k-1, k, 2^{k-1}, 2] functional batch code parameters for all kk (Yohananov et al., 19 Jan 2025, Oksner et al., 18 Jan 2026, Zhang et al., 2019).
  • Supports efficient recovery with minimal length when r=2r=2, and is verified for small kk by computer.
  • Can be doubled (double-simplex) to achieve [2k+12,k,2k,2][2^{k+1}-2, k, 2^k, 2] codes.

Hadamard and Parallel RIO Code Constructions

Hadamard-based and RIO (Random I/O) code constructions can yield functional batch codes with nearly optimal parameters for larger batches and recovery set sizes (Yohananov et al., 2021, Zhang et al., 2019). However, their tightness with respect to general lower bounds is less well-understood compared to simplex-based approaches.

Table: Exemplary Constructions and Conjectured Optimality

Code Family Length nn Dimension kk Batch Size tt Locality rr Status (binary case)
Simplex 2k12^k-1 kk 2k12^{k-1} $2$ Conjectured optimal
Double-simplex 2k+122^{k+1}-2 kk 2k2^k $2$ Proven
Hadamard-based 2s12^s-1 ss 562s1s\lfloor \frac{5}{6}\,2^{s-1}\rfloor-s $2$ Achievable (Yohananov et al., 2021)

4. Asymptotic Behavior and Parameter Scaling

For fixed small locality rr, the required code length nn for functional batch codes scales as n(2k(r1)!)1/r+Θ(t)n \gtrsim (2^k (r-1)!)^{1/r} + \Theta(t) (Oksner et al., 18 Jan 2026), meaning that the redundancy grows exponentially in kk for constant locality. When locality is unbounded, the batch size tt can scale linearly with kk at fixed rate, as formalized in

nktlog2(t+1)+o(1).\frac{n}{k} \geq \frac{t}{\log_2(t+1)} + o(1).

However, fixing rr yields vanishing rate k/n0k/n \to 0 exponentially fast as kk \to \infty.

Generalizing to nonbinary fields Fq\mathbb{F}_q, bounds are obtainable by similar counting techniques. For q2q \neq 2 and tt-functional batch codes of dimension kk,

ntklogq1(t(q1)+1),n \geq \frac{t k}{\log_{q-1}(t(q-1)+1)},

demonstrating qualitatively similar scaling (Kilic et al., 4 Aug 2025).

5. Connections to Other Combinatorial Structures

Equivalence with algebraic structures leads to new sufficient (and in some cases necessary) conditions for optimality. Particularly, for k=2s1k=2^{s-1} requests and n=2s1n=2^s-1 servers, the existence problem is equivalent to:

  • Partitioning F2s{0}\mathbb{F}_{2^s}\setminus\{0\} into kk disjoint pairs with prescribed sums,
  • Nonsingularity of certain Vandermonde matrices,
  • Nonvanishing of a multivariate polynomial in a quotient ring (Yohananov et al., 19 Jan 2025).

These characterizations allow algebraic techniques (e.g., Nullstellensatz, degree bounds) to establish optimality for new parameter regimes or to design computer-aided verification for small kk.

6. Locality, Recovery Sets, and Practical Implications

Practical deployment in distributed storage (e.g., for load balanced I/O or PIR) requires bounding the recovery set size rr. Theoretical results show that, to provide for arbitrary linear query batches while constraining rr remains extremely expensive in terms of code length, owing to the exponential dependence on kk.

Open questions include:

  • Proving or refuting the simplex code conjecture for all kk,
  • Constructing functional batch codes with prescribed small locality r>2r>2 that come close to lower bounds up to multiplicative constants,
  • Understanding two-regime scaling—fixed rr versus growing rr,
  • Extending constructions and bounds to nonbinary fields or asynchronized request models (Oksner et al., 18 Jan 2026, Kilic et al., 4 Aug 2025, Kong et al., 2023).

7. Open Problems and Research Directions

Substantial gaps persist between lower and upper bounds for general parameters, especially when r>2r>2 and for nonbinary alphabets. Open problems include (Oksner et al., 18 Jan 2026, Yohananov et al., 19 Jan 2025, Zhang et al., 2019):

  • Closing the gap for the minimum length of general functional batch codes with small locality,
  • Proving the functional batch code conjecture FB(s,2s1)=2s1FB(s,2^{s-1})=2^s-1 for all ss,
  • Discovering explicit constructions matching lower bounds for larger rr,
  • Formulating and analyzing the correct list size for functional batch codes in nonbinary settings,
  • Tightening the additive Θ(t)\Theta(t) terms in lower bounds, especially for practical batch sizes,
  • Combinatorial and algebraic characterization of recovery set assignment and their algorithmic construction.

Ongoing work leverages methods from algebraic combinatorics, finite geometry, and probabilistic methods. The algebraic approach via quotient rings and polynomial degree conditions continues to be a promising avenue for certifying optimality and extending the theory into new regimes (Yohananov et al., 19 Jan 2025, Kilic et al., 4 Aug 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Linear Functional Batch Codes.