Nonlinear memory capacity of parallel time-delay reservoir computers in the processing of multidimensional signals (1510.03891v1)

Published 13 Oct 2015 in cs.NE

Abstract: This paper addresses the reservoir design problem in the context of delay-based reservoir computers for multidimensional input signals, parallel architectures, and real-time multitasking. First, an approximating reservoir model is presented in those frameworks that provides an explicit functional link between the reservoir parameters and architecture and its performance in the execution of a specific task. Second, the inference properties of the ridge regression estimator in the multivariate context is used to assess the impact of finite sample training on the decrease of the reservoir capacity. Finally, an empirical study is conducted that shows the adequacy of the theoretical results with the empirical performances exhibited by various reservoir architectures in the execution of several nonlinear tasks with multidimensional inputs. Our results confirm the robustness properties of the parallel reservoir architecture with respect to task misspecification and parameter choice that had already been documented in the literature.

Citations (27)

View on Semantic Scholar

Summary

The paper introduces an approximating model that provides explicit formulas for computing memory capacity, enabling efficient parameter optimization in TDRs.
It employs a VAR(1) approximation with Taylor series expansion to derive performance metrics for both single and parallel TDR architectures handling multidimensional inputs.
The study demonstrates that parallel TDR arrays enhance robustness to parameter variations and task misspecification, as validated by Monte Carlo simulations.

This paper (1510.03891) addresses key challenges in applying time-delay reservoir computers (TDRs), particularly the need for architecture and parameter tuning for optimal performance on specific tasks. It extends previous work [GHLO2014_capacity] to handle multidimensional input signals and tasks, analyze parallel TDR architectures, and quantify the impact of finite training data. The core contribution is the development and validation of a simplified, approximating TDR model that provides explicit, computable formulas for reservoir performance metrics, such as memory capacity. This theoretical tool allows for efficient parameter exploration and optimization, overcoming the traditional reliance on costly empirical search.

The paper focuses on TDRs built by sampling solutions of time-delay differential equations driven by an input signal. Two common nonlinear kernels, Mackey-Glass and Ikeda, are considered, relevant to electronic and photonic implementations, respectively. Multidimensional discrete-time input signals $\mathbf{z}(t) \in \mathbb{R}^n$ are processed using an input mask $C$ to create an input forcing $\mathbf{I}(t) = C\mathbf{z}(t)$ for the TDR. The reservoir state is represented by neuron values $\mathbf{x}(t) \in \mathbb{R}^N$ , obtained through sampling or Euler discretization.

The performance of the TDR is evaluated on $q$ -dimensional $h$ -lag memory tasks, where the target output $\mathbf{y}(t) \in \mathbb{R}^q$ is a function $H$ of the current and $h$ past input values: $\mathbf{y}(t) = H(\text{vec}(\mathbf{z}(t), \dots, \mathbf{z}(t-h)))$ . The TDR performs the task by training a linear readout layer $(\mathbf{W}_{\text{out}}, \mathbf{a}_{\text{out}})$ that maps the reservoir state $\mathbf{x}(t)$ to the target output: $\mathbf{y}(t) \approx \mathbf{W}_{\text{out}}^\top \mathbf{x}(t) + \mathbf{a}_{\text{out}}$ . The optimal readout is found using ridge regression, minimizing the mean squared error plus a regularization term. The memory capacity $C_H$ is defined based on this minimum error.

The key innovation for practical application is the introduction of an approximating reservoir model. By linearizing the TDR dynamics around a stable fixed point (in the absence of input) and expanding the input term in a Taylor series up to order $R$ , the TDR dynamics are approximated by a Vector Autoregressive model of order 1 (VAR(1)):

$\mathbf{x}(t) - \boldsymbol{\mu}_x = A(\mathbf{x}_0, \boldsymbol{\theta}) (\mathbf{x}(t-1) - \boldsymbol{\mu}_x) + (\boldsymbol{\varepsilon}(t) - \boldsymbol{\mu}_\varepsilon)$

where $\mathbf{x}_0$ is the fixed point, $A$ is the connectivity matrix derived from the linearization, and $\boldsymbol{\varepsilon}(t)$ is an "innovation" term capturing the effect of the current input, derived from the Taylor expansion. For independent and identically distributed (IID) zero-mean inputs $\mathbf{z}(t)$ , the moments (mean $\boldsymbol{\mu}_\varepsilon$ and covariance $\Sigma_\varepsilon$ ) of the innovation term $\boldsymbol{\varepsilon}(t)$ can be explicitly computed in terms of the input mask $C$ , kernel parameters $\boldsymbol{\theta}$ , and higher-order moments of the input signal $\mathbf{z}(t)$ . For Gaussian inputs, these higher-order moments can be computed using hafnians.

Since the approximated model is a stable VAR(1), its state $\mathbf{x}(t)$ has a unique stationary solution whose mean $\boldsymbol{\mu}_x$ and autocovariance function $\Gamma(k)$ (including $\Gamma(0)$ ) can be explicitly computed using standard formulas (like the Yule-Walker equations) involving $A$ , $\boldsymbol{\mu}_\varepsilon$ , and $\Sigma_\varepsilon$ .

This modeling approach is extended to parallel TDR architectures. A parallel array consists of $p$ individual TDRs, each with its own parameters ( $\boldsymbol{\theta}^{(j)}$ , $\tau_j$ ) and input mask ( $C^{(j)}$ ). The input signal $\mathbf{z}(t)$ drives each TDR independently. The concatenated state vector $\mathbf{X}(t) = (\mathbf{x}^{(1)}(t), \dots, \mathbf{x}^{(p)}(t))$ forms the state of the parallel system. By modeling each individual TDR with the VAR(1) approximation, the parallel system is also approximated by a larger VAR(1) model for $\mathbf{X}(t)$ . The connectivity matrix $A$ for the parallel system is block diagonal, and the innovation term $\boldsymbol{\varepsilon}(t)^{(\mathbf{X}_0, \boldsymbol{\Theta})}$ is the concatenation of individual innovation terms. The moments ( $\boldsymbol{\mu}_X$ , $\Gamma(0)$ ) for the parallel system's state $\mathbf{X}(t)$ can thus be explicitly computed based on the individual TDRs' parameters and the input signal's moments.

Crucially, with explicit expressions for $\Gamma(0)$ , the task performance (memory capacity $C_H$ ) for both single and parallel TDRs can be computed using formula~\eqref{capacity formula}. This requires computing $\text{Cov}(\mathbf{X}(t), \mathbf{y}(t))$ and $\text{Cov}(\mathbf{y}(t), \mathbf{y}(t))$ . The paper provides explicit derivations for these covariances for multi-dimensional linear and quadratic memory tasks by leveraging the MA( $\infty$ ) representation of the VAR(1) model state $\mathbf{X}(t)$ and the explicitly computable moments of the innovation term $\boldsymbol{\varepsilon}(t)$ and the input signal $\mathbf{z}(t)$ .

In practice, the readout layer is trained using a finite sample of the reservoir output $X$ and the teaching signal $Y$ . This introduces an estimation error on top of the fundamental characteristic error of the reservoir. The paper quantifies this impact by analyzing the finite-sample ridge regression estimator $(\widehat{W}_\lambda, \widehat{\mathbf{a}}_\lambda)$ . Drawing from previous work [GO13] and assuming properties like Gaussian errors and stationarity, the paper presents formulas for the distribution properties of the estimator (mean, covariance) conditioned on the training data $X$ . This allows quantifying the total error ( $\text{MSE}_{\text{total}, \lambda}$ ), which is the characteristic error ( $\text{MSE}_{\text{char}, \lambda}$ ) plus the error due to finite-sample estimation. Approximations for the total error are provided, showing how it depends on the sample size $T$ , the regularization parameter $\lambda$ , and the properties of the reservoir and input.

The empirical paper validates the theoretical framework:

Model Accuracy: For single TDRs processing a 3D quadratic task, the error surfaces predicted by the explicit capacity formula from the model show remarkable similarity to those obtained via Monte Carlo simulations of the actual TDRs (both discrete and continuous time, using Mackey-Glass and Ikeda kernels). This confirms the model's utility in locating optimal parameter regions.
Parallel Robustness (Parameter Choice): Using the parallel reservoir model's capacity formula, the performance distribution (normalized MSE) for a fixed task with randomly varying parameters is analyzed for different numbers of parallel TDRs (1, 2, 5, 10, 20). The results show that increasing the number of parallel reservoirs significantly reduces the variance in performance, demonstrating improved robustness to parameter choices compared to a single TDR.
Parallel Robustness (Task Misspecification): Parallel architectures (optimized for a 3-lag task) exhibit less degradation in performance when tested on a set of 1000 randomly generated 9-lag tasks compared to a single TDR. This highlights the practical benefit of parallelization for scenarios where the exact task might deviate from the one used for initial optimization.

Practical Implications for Implementation:

Parameter Optimization: The explicit capacity formulas derived from the approximating model provide a computationally efficient way to search the parameter space ( $\boldsymbol{\theta}$ , $C$ , $\lambda$ ) for optimal performance on a given multidimensional task. Instead of running costly simulations for each parameter combination, one can evaluate the analytic capacity formula.
Architecture Design: The model allows comparing the theoretical performance and robustness of different TDR configurations (e.g., number of neurons, individual reservoir parameters in a parallel array) before physical implementation or large-scale simulation.
Handling Multidimensional Data: The framework explicitly accounts for multidimensional inputs and outputs, which is crucial for real-world applications involving time series of vectors (e.g., sensor data, video frames, multi-channel signals).
Real-time Multitasking: The model can estimate performance for simultaneous execution of multiple tasks by using a multi-dimensional output $\mathbf{y}(t)$ that combines different desired functions of the input history.
Deployment Considerations:
- The stability condition ( $|\partial_x f(x_0, 0, \boldsymbol{\theta})| < 1$ ) is a key requirement for the VAR(1) approximation and helps identify valid parameter ranges.
- The analysis of finite sample effects informs the trade-off between the size of the training dataset $T$ and the expected performance degradation due to readout estimation error. This helps in determining the required amount of training data for a desired accuracy level.
- Parallel architectures offer practical advantages in terms of simplified parameter tuning and robustness, making them potentially easier to deploy and less sensitive to environmental or task variations.
- The specific forms of the moments required by the model (like higher-order moments of the input) might need to be estimated from data if the input distribution is unknown or non-Gaussian.

In summary, the paper provides a valuable theoretical framework and practical tools based on an approximating model to design, analyze, and optimize time-delay reservoir computers for complex tasks involving multidimensional data, particularly highlighting the benefits and providing analysis tools for parallel architectures. The explicit capacity formulas are a significant asset for efficient parameter tuning and understanding performance limits.

PDF Markdown

Nonlinear memory capacity of parallel time-delay reservoir computers in the processing of multidimensional signals (1510.03891v1)

Summary

Related Papers