Linear Tile-Coding for Value Function

Updated 8 January 2026

Linear tile-coding is a method that transforms continuous states into sparse, high-dimensional binary vectors for efficient value function approximation.
Left-sided sketching applies dimensionality reduction only to the constraint side, ensuring unbiased estimation while preserving key curvature information.
The approach significantly reduces computational costs and improves sample efficiency, underpinning robust RL algorithms like LSTD and ATD-L.

Linear tile-coding for value function approximation is a methodology within reinforcement learning (RL) for representing and estimating the value function of a policy, especially in high-dimensional continuous state spaces. This approach leverages tile coding to create sparse binary feature encodings suitable for linear approximators, and, when combined with left-sided random sketching techniques, yields powerful and computationally efficient solutions for policy evaluation with least-squares temporal difference learning (LSTD) and quasi-Newton accelerated TD methods. Recent theoretical and empirical findings have clarified the bias–variance trade-offs intrinsic to sketching, provided robust approaches for matrix-based learning, and led to practical guidelines for scaling RL with tile coding to thousands of features (Pan et al., 2017).

1. Standard Linear Value-Function Approximation with Tile Coding

Tile coding transforms each continuous state $s$ into a high-dimensional binary feature vector $\phi(s)\in\mathbb{R}^d$ , with $d$ potentially in the thousands, where each feature represents the activation of a tile over the state space. The value function under policy $\pi$ is linearly approximated as $w^\top\phi(s) \approx v_\pi(s)$ , with weight vector $w\in\mathbb{R}^d$ to be learned.

The LSTD( $\lambda$ ) algorithm collects samples $(s_t, r_{t+1}, s_{t+1})$ and eligibility traces $e_t = \gamma_t\lambda e_{t-1} + \phi(s_t)$ , then forms the normal equations $A w = b$ , with

$A = \sum_t e_t [\phi(s_t) - \gamma_{t+1}\phi(s_{t+1})]^\top, \qquad b = \sum_t e_t r_{t+1}.$

Batch solution $w = A^{-1}b$ has $O(d^3)$ cost and maintaining $A^{-1}$ incrementally by Sherman–Morrison costs $O(d^2)$ per step. Stochastic TD( $\lambda$ ) is $O(d)$ but less sample-efficient and sensitive to stepsize choices.

2. Limitations of Two-Sided (Feature) Sketching

A common sketching approach forms a random projection $S\in\mathbb{R}^{k\times d}$ ( $k\ll d$ ) and generates low-dimensional features $\psi(s)=S\phi(s)$ . The reduced system $(SAS^\top)\hat{w}=Sb$ is solved for $\hat{w}\in\mathbb{R}^k$ . However, for tile-coded features, this introduces substantial bias:

The projection $S\phi(s)$ unpredictably folds coordinates, mixing tile activations.
Though $\mathbb{E}[S^\top S]=I$ , the projected normal equations minimize the TD error in the sketched space, not the original, so the fixed point shifts.

Empirically, this yields high asymptotic error unless $k$ approaches $d$ . The bias $\| \hat{w} - w^\star \|$ only decays when $k$ is nearly as large as $d$ , especially for sparse, discontinuous tile-coded features.

3. Left-Sided Sketching: Unbiased System Reduction

Left-side sketching applies the sketch $S\in\mathbb{R}^{k\times d}$ only to the constraint side, yielding the system $SAw \approx Sb$ . The true solution $w^\star$ to $Aw=b$ also satisfies $SAw^\star=Sb$ , so no bias is introduced; all solutions of $Aw=b$ remain valid. In expectation, $S^\top S\approx I$ preserves curvature information with cost for operations on $A$ reduced from $O(d^2)$ to $O(kd)$ .

The system $SAw=Sb$ is under-determined (many solutions), so the minimum-norm solution is preferred: $w = \tilde{A}^\top (\tilde{A}\tilde{A}^\top)^{-1}\tilde{b},$ where $\tilde{A}=SA$ and $\tilde{b}=Sb$ .

4. Incremental Sketched-LSTD and Algorithmic Efficiency

The incremental update process maintains $\tilde{A}_t = S A_t \in \mathbb{R}^{k\times d}$ and $\tilde{b}_t = S b_t \in \mathbb{R}^k$ . Each sample induces a rank-one update: $\Delta A_t = e_t [\phi(s_t)-\gamma\phi(s_{t+1})]^\top, \quad \Delta b_t = e_t r_{t+1},$ yielding the recursive averages: $\tilde{A}_{t+1} = \tilde{A}_t + \frac{1}{t+1}(S\Delta A_t - \tilde{A}_t), \quad \tilde{b}_{t+1} = \tilde{b}_t + \frac{1}{t+1}(S\Delta b_t - \tilde{b}_t).$ The minimum-norm solution employs the SVD $\tilde{A}_t=U\Sigma V^\top$ and is computed as $w_t = V\Sigma^{-1}U^\top \tilde{b}_t = \tilde{A}_t^\top (\tilde{A}_t\tilde{A}_t^\top)^{-1}\tilde{b}_t$ , with incremental maintenance of $M_t=\tilde{A}_t\tilde{A}_t^\top$ at $O(k^2)$ per sample and inversion at $O(k^3)$ when needed.

Each sample update costs $O(dk + k^3)$ , making left-sided sketched-LSTD feasible for $k\ll d$ (e.g., $k=50$ , $d\approx10^3$ ) and representing a substantial efficiency improvement over unsketched methods.

5. Variance Reduction via Quasi-Newton ATD-L Methods

Directly solving $w_t = \tilde{A}_t^\top M_t^{-1}\tilde{b}_t$ is unbiased but susceptible to numerical instability and infrequent inversion of $k \times k$ matrices. The accelerated gradient TD update (ATD-L) leverages the sketched matrix as a preconditioner: $w_{t+1} = w_t + [\alpha \tilde{A}_t^\top M_t^{-1} + \eta I] \delta_t e_t,$ where $\delta_t$ is the TD error and $\eta$ is a small regularizer. Under mild conditions (on $A$ and $\eta$ ), this iteration retains convergence to the unique LSTD solution $w^\star=A^{-1}b$ without requiring full $d \times d$ inversion.

ATD-L inherits the robustness of LSTD with respect to the trace parameter $\lambda$ and regularization $\eta$ , further lowering per-step computational overhead.

6. Bias–Variance Trade-Offs and Theoretical Guarantees

Two-sided sketching creates nonzero bias— $\|\hat{w} - w^\star\|$ —decaying slowly unless $k \sim d$ . Left-side sketching is unbiased in expectation, but variance is increased depending on $\|S^\top S - I\|$ . The Johnson–Lindenstrauss lemma gives the bound: $P[\|S^\top S - I\|_2 \geq \epsilon] \leq \delta,$ if $k = O(\epsilon^{-2}\log(1/\delta))$ . Practically, increasing $k$ reduces variance and RMS error but increases computational time, with $k=30\ldots100$ serving as an effective range for tile-coding $d \approx 1,000$ .

7. Empirical Results and Practical Recommendations

Experiments in Mountain Car, Puddle World, Acrobot, and Energy Allocation domains, with tile coding ( $d\approx1,024 \ldots 8,192$ ) and RBF features, demonstrate:

Method	Sample Efficiency	Bias (Tile Coding)	Typical Per-Step Runtime
Full LSTD (d×d)	Best, unaffected by λ	None	$O(d^2)$ ( $\sim$ 150 ms)
Two-sided sketch LSTD	Poor, high error	High (unless $k\approx d$ )	$O(dk+k^2)$
Left-side sketched LSTD	Excellent	Unbiased	$O(dk)$ ( $\sim$ 1 ms)
ATD-L (Quasi-Newton)	Excellent, robust to λ	Unbiased	$O(dk)$ ( $\sim$ 0.5 ms)

In Mountain Car with $d=1,024,\ k=50$ :

Full LSTD yields RMS error $\approx$ 0.05 in 2,000 steps at $\sim$ 150 ms/step.
Two-sided sketching gives high bias, RMS error $\approx$ 0.15.
Left-sided LSTD attains RMS error $\approx$ 0.05 at $\sim$ 1 ms/step—a $\sim$ 150 $\times$ speedup.
ATD-L achieves similar error and $\sim$ 0.5 ms/step without matrix inversion.

Guidelines for practical implementation include:

Choice of $k$ : $k\approx\sqrt{d}$ or $O(1/\epsilon^2\log d)$ for desired JL distortion; for $d\approx1,000$ , $k=30\ldots100$ is typical.
Sketch type: Gaussian, CountSketch, Subsampled Hadamard behave similarly; Gaussian is simplest.
Initialization: Set $M_0 = \eta I_k$ with small $\eta$ ( $10^{-3}$ to $10^{-1}$ ), ensuring regularization and invertibility.
Sensitivity: Left-sided LSTD and ATD-L are robust to $\lambda$ and $\eta$ , unlike TD( $\lambda$ ) which requires careful stepsize tuning.
Sparse tile coding: Increase tilings or mix tiles to expand the effective visit subspace for $S$ .

Employing only left-sided sketching within LSTD preserves unbiased estimation of $w^\star$ , lowers per-step cost from $O(d^2)$ to $O(dk)$ , and, with quasi-Newton ATD updates, provides a sample-efficient and robust policy evaluation framework for large-scale tile-coded representations (Pan et al., 2017).

PDF Markdown Chat (Pro)

References (1)

Effective sketching methods for value function approximation (2017)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Linear Tile-Coding for Value Function.