Sliding Window Generative Neural Networks

Updated 23 October 2025

Sliding Window Generative Neural Networks are architectures that predict target positions solely from recent sensor measurements, eliminating the need for explicit motion models.
They use a feedforward neural network with multiple ReLU-activated hidden layers and a sliding window mechanism to encode temporal context without recurrent connections.
Evaluations show the approach achieves lower MSE and closer trajectory tracking compared to EKF, particularly under conditions of low measurement noise.

A Sliding Window Generative Neural Network is an architecture for target tracking that predicts future states based solely on recent sensor measurements, dispensing with explicit dynamic or motion models. This approach, exemplified in the context of drone trajectory tracking, contrasts sharply with classical filters such as the Extended Kalman Filter (EKF) by relying exclusively on measurement history, integrated via a feedforward neural network equipped with a sliding window mechanism. The method is principally focused on cases where measurement model linearization and dynamic object model specification present operational challenges, leveraging deep learning to estimate states directly from measurement data.

1. Network Architecture

The core architecture consists of a standard feedforward neural network with an input layer, three hidden layers, and an output layer. The hidden layers employ the ReLU activation:

$f(Z) = \max(0, Z)$

with the derivative

$f'(Z) = \begin{cases} 0 & \text{if } Z < 0 \ 1 & \text{if } Z > 0 \end{cases}$

Each layer's node count is selected empirically; for example, arrangements of 7, 5, and 4 nodes in successive layers have been utilized. The final output layer utilizes an identity activation function to produce continuous real-valued outputs, representing position estimates in Cartesian (x, y) coordinates. Unlike recurrent networks, this architecture does not encode temporal dynamics explicitly; rather, it is the input format—enabled by the sliding window—that supplies the temporal context.

2. Sliding Window Mechanism

The sliding window is central to state estimation, functioning as the network's "local memory." For a window of size $n$ (e.g., $n=3$ ), the input vector at time $t$ consists of the concatenated measurements at $t$ , $t-1$ , and $t-2$ . Upon each prediction, the window advances one step forward; the new input for time $t+1$ becomes the measurements at times $t+1$ , $t$ , and $t-1$ , with time $t+1$ serving as the prediction target. This method does not require recurrent connections; context is preserved simply by the ordering and selection of inputs. The strategy allows non-recurrent feedforward networks to learn relationships between sequences and state transitions, enabling generative tracking from measurement data alone.

3. Measurement Model and Data Representation

All inputs to the network are derived directly from sensor measurements, generated in simulation by the Stone Soup software in polar coordinates (bearing $\theta$ , range $r$ ). These are converted to Cartesian coordinates as follows:

$[x, y] = [r \cos(\theta),\ r \sin(\theta)]$

with any necessary offset applied to account for sensor position. The network trains exclusively on these coordinates and the true target positions, without explicit use of a motion model. The advantage of this scheme is the avoidance of linearization required in non-linear filtering approaches such as the EKF, thereby reducing both mathematical complexity and the risk of errors introduced by linear approximation.

4. Comparative Performance: Neural Network vs Extended Kalman Filter

Evaluations against the Extended Kalman Filter (EKF) focus on scenarios of low measurement covariance—conditions of high measurement reliability. The neural network tracker achieves sum-of-Euclidean-distances and mean squared errors (MSE) significantly lower than those obtained from the EKF. For example, MSE values in training reach as low as 0.01 for the neural network, with the EKF exhibiting errors magnitudes greater. Crucially, the neural generative tracker replicates ground truth trajectories more accurately under low-noise conditions. When generalizing to test data that includes previously unseen measurement sets, the neural network maintains its advantage, though elevated measurement noise produces some observable performance degradation. The implication is that under reliable measurement regimes, a sliding window neural network achieves parity with, or outperforms, the EKF—without requiring an explicit motion model or linearization pipeline.

5. Mathematical Formulations

Key mathematical foundations underlying the approach include:

Mean Squared Error (MSE):

$\text{MSE} = \frac{1}{m} \sum_{i=1}^{m} (Y_i - \hat{Y}_i)^2$

where $m$ denotes the data point count, $Y_i$ is ground truth, and $\hat{Y}_i$ is the prediction.

Euclidean Distance:

$d(p,q) = \sqrt{\sum_{i=1}^{n} (q_i - p_i)^2}$

governing the geometric error between prediction and ground truth.

Ordinary Least Squares Cost Function:

$L(a, y) = \frac{1}{m} \sum_{i=1}^{m} (a_i - y_i)^2$

minimized via gradient descent in back-propagation.

Polar-to-Cartesian Conversion:

$[x\ y] = [r \cos(\theta)\ r \sin(\theta)]$

applied on raw measurement data with sensor offset correction as required.

These equations collectively describe both the learning process and the metric-oriented evaluation of the tracking capability conferred by the sliding window neural generative approach.

6. Implications and Prospects

The efficacy of the sliding window neural network in target tracking, particularly where the measurement model is sufficiently descriptive and reliable, points to an alternative paradigm to traditional filtering techniques. By obviating the explicit modeling and linearization of object dynamics, the method allows for reduced computational complexity and circumvents modeling inaccuracies endemic to EKF and related schemes. A plausible implication is the suitability of this approach for operational contexts where motion models are difficult to specify and measurement noise is minimal. The authors suggest that future directions may include integration of additional neural components (e.g., a network representing an implicit motion model) or hybridization with classical filters, thereby potentially enhancing robustness in broader operational domains.

7. Context within State Estimation Research

The sliding window generative neural architecture represents an intersection of measurement-centric data-driven tracking and conventional state estimation. Its distinctive features—exclusive dependence on a measurement model, empirical outperforming of EKF under low-covariance conditions, and avoidance of complex linearization—underscore its significance for applications demanding scalable, measurement-conditioned tracking without detailed physical modeling.

In summary, Sliding Window Generative Neural Networks constitute a data-driven approach wherein temporal context is captured via input representation rather than explicit dynamics modeling. Their performance suggests a viable avenue for target tracking in environments where direct measurement is more readily available and accurate than dynamic models—a prospect with implications for future generative and hybrid tracking architectures.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Sliding Window Generative Neural Networks.