Papers
Topics
Authors
Recent
2000 character limit reached

Linear Tile-Coding for Value Function

Updated 8 January 2026
  • Linear tile-coding is a method that transforms continuous states into sparse, high-dimensional binary vectors for efficient value function approximation.
  • Left-sided sketching applies dimensionality reduction only to the constraint side, ensuring unbiased estimation while preserving key curvature information.
  • The approach significantly reduces computational costs and improves sample efficiency, underpinning robust RL algorithms like LSTD and ATD-L.

Linear tile-coding for value function approximation is a methodology within reinforcement learning (RL) for representing and estimating the value function of a policy, especially in high-dimensional continuous state spaces. This approach leverages tile coding to create sparse binary feature encodings suitable for linear approximators, and, when combined with left-sided random sketching techniques, yields powerful and computationally efficient solutions for policy evaluation with least-squares temporal difference learning (LSTD) and quasi-Newton accelerated TD methods. Recent theoretical and empirical findings have clarified the bias–variance trade-offs intrinsic to sketching, provided robust approaches for matrix-based learning, and led to practical guidelines for scaling RL with tile coding to thousands of features (Pan et al., 2017).

1. Standard Linear Value-Function Approximation with Tile Coding

Tile coding transforms each continuous state ss into a high-dimensional binary feature vector ϕ(s)Rd\phi(s)\in\mathbb{R}^d, with dd potentially in the thousands, where each feature represents the activation of a tile over the state space. The value function under policy π\pi is linearly approximated as wϕ(s)vπ(s)w^\top\phi(s) \approx v_\pi(s), with weight vector wRdw\in\mathbb{R}^d to be learned.

The LSTD(λ\lambda) algorithm collects samples (st,rt+1,st+1)(s_t, r_{t+1}, s_{t+1}) and eligibility traces et=γtλet1+ϕ(st)e_t = \gamma_t\lambda e_{t-1} + \phi(s_t), then forms the normal equations Aw=bA w = b, with

A=tet[ϕ(st)γt+1ϕ(st+1)],b=tetrt+1.A = \sum_t e_t [\phi(s_t) - \gamma_{t+1}\phi(s_{t+1})]^\top, \qquad b = \sum_t e_t r_{t+1}.

Batch solution w=A1bw = A^{-1}b has O(d3)O(d^3) cost and maintaining A1A^{-1} incrementally by Sherman–Morrison costs O(d2)O(d^2) per step. Stochastic TD(λ\lambda) is O(d)O(d) but less sample-efficient and sensitive to stepsize choices.

2. Limitations of Two-Sided (Feature) Sketching

A common sketching approach forms a random projection SRk×dS\in\mathbb{R}^{k\times d} (kdk\ll d) and generates low-dimensional features ψ(s)=Sϕ(s)\psi(s)=S\phi(s). The reduced system (SAS)w^=Sb(SAS^\top)\hat{w}=Sb is solved for w^Rk\hat{w}\in\mathbb{R}^k. However, for tile-coded features, this introduces substantial bias:

  • The projection Sϕ(s)S\phi(s) unpredictably folds coordinates, mixing tile activations.
  • Though E[SS]=I\mathbb{E}[S^\top S]=I, the projected normal equations minimize the TD error in the sketched space, not the original, so the fixed point shifts.

Empirically, this yields high asymptotic error unless kk approaches dd. The bias w^w\| \hat{w} - w^\star \| only decays when kk is nearly as large as dd, especially for sparse, discontinuous tile-coded features.

3. Left-Sided Sketching: Unbiased System Reduction

Left-side sketching applies the sketch SRk×dS\in\mathbb{R}^{k\times d} only to the constraint side, yielding the system SAwSbSAw \approx Sb. The true solution ww^\star to Aw=bAw=b also satisfies SAw=SbSAw^\star=Sb, so no bias is introduced; all solutions of Aw=bAw=b remain valid. In expectation, SSIS^\top S\approx I preserves curvature information with cost for operations on AA reduced from O(d2)O(d^2) to O(kd)O(kd).

The system SAw=SbSAw=Sb is under-determined (many solutions), so the minimum-norm solution is preferred: w=A~(A~A~)1b~,w = \tilde{A}^\top (\tilde{A}\tilde{A}^\top)^{-1}\tilde{b}, where A~=SA\tilde{A}=SA and b~=Sb\tilde{b}=Sb.

4. Incremental Sketched-LSTD and Algorithmic Efficiency

The incremental update process maintains A~t=SAtRk×d\tilde{A}_t = S A_t \in \mathbb{R}^{k\times d} and b~t=SbtRk\tilde{b}_t = S b_t \in \mathbb{R}^k. Each sample induces a rank-one update: ΔAt=et[ϕ(st)γϕ(st+1)],Δbt=etrt+1,\Delta A_t = e_t [\phi(s_t)-\gamma\phi(s_{t+1})]^\top, \quad \Delta b_t = e_t r_{t+1}, yielding the recursive averages: A~t+1=A~t+1t+1(SΔAtA~t),b~t+1=b~t+1t+1(SΔbtb~t).\tilde{A}_{t+1} = \tilde{A}_t + \frac{1}{t+1}(S\Delta A_t - \tilde{A}_t), \quad \tilde{b}_{t+1} = \tilde{b}_t + \frac{1}{t+1}(S\Delta b_t - \tilde{b}_t). The minimum-norm solution employs the SVD A~t=UΣV\tilde{A}_t=U\Sigma V^\top and is computed as wt=VΣ1Ub~t=A~t(A~tA~t)1b~tw_t = V\Sigma^{-1}U^\top \tilde{b}_t = \tilde{A}_t^\top (\tilde{A}_t\tilde{A}_t^\top)^{-1}\tilde{b}_t, with incremental maintenance of Mt=A~tA~tM_t=\tilde{A}_t\tilde{A}_t^\top at O(k2)O(k^2) per sample and inversion at O(k3)O(k^3) when needed.

Each sample update costs O(dk+k3)O(dk + k^3), making left-sided sketched-LSTD feasible for kdk\ll d (e.g., k=50k=50, d103d\approx10^3) and representing a substantial efficiency improvement over unsketched methods.

5. Variance Reduction via Quasi-Newton ATD-L Methods

Directly solving wt=A~tMt1b~tw_t = \tilde{A}_t^\top M_t^{-1}\tilde{b}_t is unbiased but susceptible to numerical instability and infrequent inversion of k×kk \times k matrices. The accelerated gradient TD update (ATD-L) leverages the sketched matrix as a preconditioner: wt+1=wt+[αA~tMt1+ηI]δtet,w_{t+1} = w_t + [\alpha \tilde{A}_t^\top M_t^{-1} + \eta I] \delta_t e_t, where δt\delta_t is the TD error and η\eta is a small regularizer. Under mild conditions (on AA and η\eta), this iteration retains convergence to the unique LSTD solution w=A1bw^\star=A^{-1}b without requiring full d×dd \times d inversion.

ATD-L inherits the robustness of LSTD with respect to the trace parameter λ\lambda and regularization η\eta, further lowering per-step computational overhead.

6. Bias–Variance Trade-Offs and Theoretical Guarantees

Two-sided sketching creates nonzero bias—w^w\|\hat{w} - w^\star\|—decaying slowly unless kdk \sim d. Left-side sketching is unbiased in expectation, but variance is increased depending on SSI\|S^\top S - I\|. The Johnson–Lindenstrauss lemma gives the bound: P[SSI2ϵ]δ,P[\|S^\top S - I\|_2 \geq \epsilon] \leq \delta, if k=O(ϵ2log(1/δ))k = O(\epsilon^{-2}\log(1/\delta)). Practically, increasing kk reduces variance and RMS error but increases computational time, with k=30100k=30\ldots100 serving as an effective range for tile-coding d1,000d \approx 1,000.

7. Empirical Results and Practical Recommendations

Experiments in Mountain Car, Puddle World, Acrobot, and Energy Allocation domains, with tile coding (d1,0248,192d\approx1,024 \ldots 8,192) and RBF features, demonstrate:

Method Sample Efficiency Bias (Tile Coding) Typical Per-Step Runtime
Full LSTD (d×d) Best, unaffected by λ None O(d2)O(d^2) (\sim150 ms)
Two-sided sketch LSTD Poor, high error High (unless kdk\approx d) O(dk+k2)O(dk+k^2)
Left-side sketched LSTD Excellent Unbiased O(dk)O(dk) (\sim1 ms)
ATD-L (Quasi-Newton) Excellent, robust to λ Unbiased O(dk)O(dk) (\sim0.5 ms)

In Mountain Car with d=1,024, k=50d=1,024,\ k=50:

  • Full LSTD yields RMS error \approx0.05 in 2,000 steps at \sim150 ms/step.
  • Two-sided sketching gives high bias, RMS error \approx0.15.
  • Left-sided LSTD attains RMS error \approx0.05 at \sim1 ms/step—a \sim150×\times speedup.
  • ATD-L achieves similar error and \sim0.5 ms/step without matrix inversion.

Guidelines for practical implementation include:

  • Choice of kk: kdk\approx\sqrt{d} or O(1/ϵ2logd)O(1/\epsilon^2\log d) for desired JL distortion; for d1,000d\approx1,000, k=30100k=30\ldots100 is typical.
  • Sketch type: Gaussian, CountSketch, Subsampled Hadamard behave similarly; Gaussian is simplest.
  • Initialization: Set M0=ηIkM_0 = \eta I_k with small η\eta (10310^{-3} to 10110^{-1}), ensuring regularization and invertibility.
  • Sensitivity: Left-sided LSTD and ATD-L are robust to λ\lambda and η\eta, unlike TD(λ\lambda) which requires careful stepsize tuning.
  • Sparse tile coding: Increase tilings or mix tiles to expand the effective visit subspace for SS.

Employing only left-sided sketching within LSTD preserves unbiased estimation of ww^\star, lowers per-step cost from O(d2)O(d^2) to O(dk)O(dk), and, with quasi-Newton ATD updates, provides a sample-efficient and robust policy evaluation framework for large-scale tile-coded representations (Pan et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Linear Tile-Coding for Value Function.