Linear Tile-Coding for Value Function
- Linear tile-coding is a method that transforms continuous states into sparse, high-dimensional binary vectors for efficient value function approximation.
- Left-sided sketching applies dimensionality reduction only to the constraint side, ensuring unbiased estimation while preserving key curvature information.
- The approach significantly reduces computational costs and improves sample efficiency, underpinning robust RL algorithms like LSTD and ATD-L.
Linear tile-coding for value function approximation is a methodology within reinforcement learning (RL) for representing and estimating the value function of a policy, especially in high-dimensional continuous state spaces. This approach leverages tile coding to create sparse binary feature encodings suitable for linear approximators, and, when combined with left-sided random sketching techniques, yields powerful and computationally efficient solutions for policy evaluation with least-squares temporal difference learning (LSTD) and quasi-Newton accelerated TD methods. Recent theoretical and empirical findings have clarified the bias–variance trade-offs intrinsic to sketching, provided robust approaches for matrix-based learning, and led to practical guidelines for scaling RL with tile coding to thousands of features (Pan et al., 2017).
1. Standard Linear Value-Function Approximation with Tile Coding
Tile coding transforms each continuous state into a high-dimensional binary feature vector , with potentially in the thousands, where each feature represents the activation of a tile over the state space. The value function under policy is linearly approximated as , with weight vector to be learned.
The LSTD() algorithm collects samples and eligibility traces , then forms the normal equations , with
Batch solution has cost and maintaining incrementally by Sherman–Morrison costs per step. Stochastic TD() is but less sample-efficient and sensitive to stepsize choices.
2. Limitations of Two-Sided (Feature) Sketching
A common sketching approach forms a random projection () and generates low-dimensional features . The reduced system is solved for . However, for tile-coded features, this introduces substantial bias:
- The projection unpredictably folds coordinates, mixing tile activations.
- Though , the projected normal equations minimize the TD error in the sketched space, not the original, so the fixed point shifts.
Empirically, this yields high asymptotic error unless approaches . The bias only decays when is nearly as large as , especially for sparse, discontinuous tile-coded features.
3. Left-Sided Sketching: Unbiased System Reduction
Left-side sketching applies the sketch only to the constraint side, yielding the system . The true solution to also satisfies , so no bias is introduced; all solutions of remain valid. In expectation, preserves curvature information with cost for operations on reduced from to .
The system is under-determined (many solutions), so the minimum-norm solution is preferred: where and .
4. Incremental Sketched-LSTD and Algorithmic Efficiency
The incremental update process maintains and . Each sample induces a rank-one update: yielding the recursive averages: The minimum-norm solution employs the SVD and is computed as , with incremental maintenance of at per sample and inversion at when needed.
Each sample update costs , making left-sided sketched-LSTD feasible for (e.g., , ) and representing a substantial efficiency improvement over unsketched methods.
5. Variance Reduction via Quasi-Newton ATD-L Methods
Directly solving is unbiased but susceptible to numerical instability and infrequent inversion of matrices. The accelerated gradient TD update (ATD-L) leverages the sketched matrix as a preconditioner: where is the TD error and is a small regularizer. Under mild conditions (on and ), this iteration retains convergence to the unique LSTD solution without requiring full inversion.
ATD-L inherits the robustness of LSTD with respect to the trace parameter and regularization , further lowering per-step computational overhead.
6. Bias–Variance Trade-Offs and Theoretical Guarantees
Two-sided sketching creates nonzero bias——decaying slowly unless . Left-side sketching is unbiased in expectation, but variance is increased depending on . The Johnson–Lindenstrauss lemma gives the bound: if . Practically, increasing reduces variance and RMS error but increases computational time, with serving as an effective range for tile-coding .
7. Empirical Results and Practical Recommendations
Experiments in Mountain Car, Puddle World, Acrobot, and Energy Allocation domains, with tile coding () and RBF features, demonstrate:
| Method | Sample Efficiency | Bias (Tile Coding) | Typical Per-Step Runtime |
|---|---|---|---|
| Full LSTD (d×d) | Best, unaffected by λ | None | (150 ms) |
| Two-sided sketch LSTD | Poor, high error | High (unless ) | |
| Left-side sketched LSTD | Excellent | Unbiased | (1 ms) |
| ATD-L (Quasi-Newton) | Excellent, robust to λ | Unbiased | (0.5 ms) |
In Mountain Car with :
- Full LSTD yields RMS error 0.05 in 2,000 steps at 150 ms/step.
- Two-sided sketching gives high bias, RMS error 0.15.
- Left-sided LSTD attains RMS error 0.05 at 1 ms/step—a 150 speedup.
- ATD-L achieves similar error and 0.5 ms/step without matrix inversion.
Guidelines for practical implementation include:
- Choice of : or for desired JL distortion; for , is typical.
- Sketch type: Gaussian, CountSketch, Subsampled Hadamard behave similarly; Gaussian is simplest.
- Initialization: Set with small ( to ), ensuring regularization and invertibility.
- Sensitivity: Left-sided LSTD and ATD-L are robust to and , unlike TD() which requires careful stepsize tuning.
- Sparse tile coding: Increase tilings or mix tiles to expand the effective visit subspace for .
Employing only left-sided sketching within LSTD preserves unbiased estimation of , lowers per-step cost from to , and, with quasi-Newton ATD updates, provides a sample-efficient and robust policy evaluation framework for large-scale tile-coded representations (Pan et al., 2017).