Pixel Time Series Representation

Updated 16 April 2026

Pixel time series representation is a method that encodes temporal signals as pixel grids or images, unifying time series analysis with computer vision techniques.
It leverages various encodings such as lineplots, spectrograms, recurrence plots, and XIRPs to preserve distinctive features and enable diverse analytical approaches.
Modern pipelines use CNNs, Vision Transformers, and generative models on these pixelized representations to achieve superior performance in classification, forecasting, and clustering.

A pixel time series representation encodes the temporal evolution of signals, features, or patterns as sequences of pixels or 2D/3D images, enabling the application of computer vision and image processing techniques to time series analytics. In the context of univariate, multivariate, spatial, or high-dimensional time series, this approach provides both a unifying language for representation and a computational bridge to modern visual architectures such as convolutional neural networks (CNNs), Vision Transformers (ViTs), compact data structures, and generative models.

1. Fundamental Types of Pixel Time Series Representations

Pixel time series representations span a spectrum from direct “painting” of 1D data to more abstract image encodings:

Lineplot Image Embeddings: A univariate or multivariate time series $x_{t}$ is mapped to a 2D array by plotting amplitude vs. time, rasterizing the polyline to a grayscale or RGB image. Typical settings use default line-plot routines, yielding images such as $I(i,j)$ with pixel $(i,j)$ set if traversed by the plotted line (Rodrigues et al., 2021).
Time–Frequency Spectrograms: The raw signal is transformed by STFT or CWT to produce a 2D “scalogram” showing power, energy, or coefficients as pixel intensities indexed over frequency and time. This can be augmented with input intensity strips for mixed domain encodings (Zeng et al., 2024).
Recurrence Plots and Distance Matrices: Each element $D_{i,j}$ encodes the pairwise distance or similarity between signal values at times $i$ and $j$ , rendered as a single-channel image without thresholding (Wenninger et al., 2019).
Extended Intertemporal Return Plots (XIRPs): For financial or positive-valued series, the pixel at $(i,j)$ stores the log-return or simple return between $x_i$ and $x_j$ , generating invertible, scale-invariant images (Hellermann et al., 2021).
Dense Pixel Grids for Row-wise Comparison: Multiple time series are visualized as horizontal rows in a pixel matrix, where each pixel may encode a normalized value, model activation, or attribution relevance, supporting large-scale explainability (Schlegel et al., 2024).
Multivariate Temporal Pixel Series: For spatial or multi-spectral data, each spatial pixel evolves as a vector over time ( $x \in \mathbb{R}^{T \times C}$ ), and the stack of these vectors forms a raster time series or a satellite-imaging time cube (Vincent et al., 2023, Cruces et al., 2019).
Pixel Preemption Filtering: For visualization and storage efficiency, samples are pre-filtered in time–value pixel grids such that no two points fall within the same display pixel, reducing sample complexity while preserving Hausdorff distance (Kim et al., 2024).

Distinct choices in representation determine the information preserved (e.g., frequency content, pairwise similarity, amplitude, trends) and which vision models or queries become naturally applicable.

2. Construction, Normalization, and Invertibility

Spatial and Amplitude Scaling:

Most approaches begin by rescaling the temporal and amplitude axes to fit a prescribed image size:

Linear scaling: $I(i,j)$ 0, $I(i,j)$ 1 (Rodrigues et al., 2021).
Quartile-based normalization: $I(i,j)$ 2 for robust, outlier-resistant scaling (Roschmann et al., 10 Jun 2025).

Image Formation and Anti-aliasing:

Pixelization leverages standard line-segment rasterization (e.g., Bresenham), or for higher dimensional objects (as in spectrograms or recurrence plots) direct mapping of analytic transforms or pairwise distances into pixel values (Rodrigues et al., 2021, Zeng et al., 2024, Wenninger et al., 2019).

Normalization Preprocessing:

“Inversion” of pixel intensities (background-to-black) can accelerate CNN convergence and focus attention on the plotted trajectory (Rodrigues et al., 2021).
Sample-wise normalization (e.g., $I(i,j)$ 3) is used for standardizing input to downstream vision models.
For GAN integration or symmetric transforms, pixel values are linearly mapped to $I(i,j)$ 4 or $I(i,j)$ 5 according to the downstream architecture’s expected domain (Hellermann et al., 2021).

Invertibility:

Certain representations, notably the XIRP, are exactly invertible—the original time series is reconstructed by iterated exponential of the superdiagonal (one-step returns), up to an initial value (Hellermann et al., 2021). Others (recurrence plots, spectrograms) are only invertible under additional constraints or up to loss of phase/amplitude information.

3. Model Architectures and Algorithms for Pixel Time Series

Pixel-to-CNN and ViT Pipelines

Shallow CNNs: Five convolution+pooling blocks, each doubling the number of feature maps, feed into a multi-layer perceptron classifier. No batch normalization or shortcuts are required to reach state-of-the-art accuracy on lineplot pixel inputs (Rodrigues et al., 2021).
Deep ResNet Transfer Learning: Recurrence-plot images are input to ResNet-50/152. Networks are pretrained on ImageNet and fully fine-tuned to the pixel images; single-channel grayscale input is preferred over RGB or false-color (Wenninger et al., 2019).
Vision Transformers: Spectrogram and “time–value” images are tiled into fixed-size non-overlapping patches, embedded, and passed through a transformer encoder. Intermediate layers with highest intrinsic dimensionality yield the best classification features (Zeng et al., 2024, Roschmann et al., 10 Jun 2025).

Generative and Compact Structures

WGAN-GP on XIRP Images: GANs operate on return-plot pixel images. Invertibility allows stochastic generation of time series by sampling the first value and sequentially reconstructing from the main superdiagonal (Hellermann et al., 2021).
Compact k³-tree for Raster Series: Spatial–temporal–value cubes are linearized in Morton order and compressed by k³-tree, supporting sublinear-size storage and $I(i,j)$ 6 query time for value, window, or range queries (Cruces et al., 2019).
Pixel Preemption Filtering: AR-PPF processes a time series by binning into a predefined pixel grid. At most one sample per pixel is retained, guaranteeing that no point is more than one pixel removed from the high-resolution data (Kim et al., 2024).

Interpretability and Prototype Learning

Pixel-wise Time Series Prototypes: For satellite data, each pixel’s time–spectral trajectory is vectorized, then compared to class prototypes using distance metrics, optionally allowing channel-bias offsets and thin-plate spline time warping for invariance to calibration and phenological shifts (Vincent et al., 2023).
SOM-VAE: Each frame is encoded and then quantized to a discrete codebook on a 2D grid, with additional self-organizing map and Markov transition structure, supporting interpretable, low-dimensional discretization of pixel time series (e.g., video frames, medical images) (Fortuin et al., 2018).

4. Empirical Results and Application Domains

Time Series Classification:

Naïve pixel representations, such as lineplot images or recurrence plots, passed through standard CNNs or ResNets, achieved competitive or superior accuracy to tailored RNN, CNN, or ensemble methods (e.g., ROCKET, HIVE-COTE, TS-CHIEF) on multiple UCR and real-world datasets (Rodrigues et al., 2021, Wenninger et al., 2019).
Application of frozen ViTs pretrained on ImageNet to pixel-represented time series surpasses specialist time series models (e.g., Moment, Mantis) on UCR/UEA, with further gains from feature concatenation (Roschmann et al., 10 Jun 2025).

Time Series Forecasting:

Transformation into spectrogram–amplitude images and processing with ViT achieves state-of-the-art SMAPE, MASE, and sign accuracy across synthetic signals, climate, and S&P 500 datasets, outperforming DeepAR, ARIMA, and pure lineplot representations (Zeng et al., 2024).

Clustering and Segmentation:

Pixel-wise deformable prototype models enable robust, interpretable per-pixel series clustering and segmentation in high-dimensional satellite imaging, outperforming LSTM-FCN and other RNN-based models, especially under domain shift and few-shot regimes (Vincent et al., 2023).

Visualization and Querying:

Interactive pixel visualizations offer unified grids showing raw data, activations, and attributions, facilitate expert pattern discovery, and scale to thousands of samples via clustering-based reordering (Schlegel et al., 2024).
Advanced pixel filtering enables subsecond visualization of ultra-long time series with visual fidelity guarantees and high feature-retention, vastly outperforming uniform or local-aggregation baselines (Kim et al., 2024).
Compact data structures reduce storage of raster time cubes by up to 5× via k³-tree compression with preserved query performance (Cruces et al., 2019).

Generative Modeling:

XIRP-driven image GANs produce scale-invariant, stably-trained, and invertible time series generations, surpassing RNN-GANs on similarity and forecast-transfer metrics in the financial domain (Hellermann et al., 2021).

5. Comparative Methodology Table

Representation	Core Principle	Key Application
Lineplot (pixel raster)	Direct time-amplitude plotting	Classification, expl.
Spectrogram/scalogram	Time–frequency map (via CWT/STFT)	Forecasting, ViT input
Recurrence Plot / Distance matrix	Pairwise value or similarity encoding	Classification, ResNet
XIRP (log-return plot)	Return/ratio encoding, invertible	GAN-based generation
Dense row-pixel grid	Concatenated vectors as row-pixel blocks	Model interpretation
Raster time series (stacked images)	Per-pixel time evolution, spatial stack	Remote sensing, comp.
Pixel preemption filtering	One sample per display bucket	Fast visualization

6. Limitations, Design Trade-offs, and Future Directions

Information and Task Alignment:

Pixel representations inherently trade information content for compatibility with visual models; e.g., lineplot images lose high-frequency and sign information, spectrograms lose raw phase, recurrence plots can obscure temporal ordering, and preemption filtering may discard fine, temporally overlapping peaks. Exact invertibility is only guaranteed with representations like XIRP.

Scalability and Efficiency:

Filtering schemes such as AR-PPF achieve $I(i,j)$ 7 complexity with output cardinality bounded by display resolution (Kim et al., 2024); compact k³-tree exploits spatio-temporal locality for sublinear per-pixel storage (Cruces et al., 2019). However, in very low-temporal-resolution or high-velocity data, the clustering benefit diminishes.

Interpretability and Extendability:

Deformable prototypes, dense pixel visualizations, and SOM-VAE all target the interpretability-accuracy frontier, but often depend on appropriate selection of invariance classes (shift, scale, warping) and meaningful visualization designs (Vincent et al., 2023, Schlegel et al., 2024, Fortuin et al., 2018). Dense pixel grids presently scale only to univariate series or moderate latent/activation dimensions.

Directions for Further Research:

Comparative analysis of alternate visual encodings (e.g., derivatives, recurrence variants, color-coded multivariate stacks) for downstream classification and forecasting (Rodrigues et al., 2021).
Unified generative–discriminative pipelines using invertible pixel encodings for synthetic data augmentation and transfer learning (Hellermann et al., 2021).
Application of pixel-centric clustering and filtering to online/real-time analytics and adaptive visualization (Kim et al., 2024).
Harmonization of foundation vision models with time-series feature spaces via token-fusion and intrinsic-dimension maximization (Roschmann et al., 10 Jun 2025).

7. Concluding Synthesis

Pixel time series representations provide a formalism that bridges temporal data analysis and computer vision by mapping temporal sequences to image or structured pixel domains. They enable the direct application of deep vision architectures, compact structures, dense visualizations, and generative models, often matching or surpassing specialized sequence models in diverse domains including sensor data, remote sensing, finance, and explainable AI. The design space is characterized by trade-offs between information preservation, model compatibility, storage or computational efficiency, and interpretability. Significant empirical evidence demonstrates that naive pixelization, when paired with modern visual pipelines, can uncover structure and performance previously confined to domain-specific time-series methods, motivating continued exploration of hybrid and pixel-centric models for time series data (Rodrigues et al., 2021, Roschmann et al., 10 Jun 2025, Wenninger et al., 2019, Zeng et al., 2024, Vincent et al., 2023, Hellermann et al., 2021, Schlegel et al., 2024, Kim et al., 2024, Fortuin et al., 2018, Cruces et al., 2019).