First & Second Derivative Heatmaps (FSDH)

Updated 18 January 2026

FSDH is a technique that computes first and second derivative heatmaps from 1D time series, capturing sharp transitions and turning points.
It normalizes and stacks discrete derivatives into a 2D tensor for convolutional networks, enabling precise extraction of local edge-aware features.
Integrating FSDH within Times2D allows hybrid modeling that combines local transient patterns with periodic components to boost forecasting accuracy.

First and Second Derivative Heatmaps (FSDH) constitute a critical module in the Times2D architecture for time series forecasting, offering a principled approach to extracting local difference statistics by mapping 1D time series into structured 2D representations. These heatmaps provide fine-grained insight into non-stationarity by highlighting both sharp changes and turning points, fundamentally enabling 2D convolutional architectures to capture local, transient, or regime-shifting phenomena that purely frequency-based or spectral approaches might miss (Nematirad et al., 31 Mar 2025).

1. Mathematical Foundations

The FSDH module operates on a batch of multivariate time series $X_{1D} \in \mathbb{R}^{B \times S \times N}$ , where $B$ denotes batch size, $S$ temporal length, and $N$ the variable channels. The process involves two discrete derivative computations:

First derivative $D_1(t)$ (forward difference with zero-padding):

$D_1(t) = X_{1D}(t+1) - X_{1D}(t), \quad \begin{cases} D_1(0) = \mathbf{0}, \ t = 0, \dots, S-2 \end{cases}$

producing $D_1 \in \mathbb{R}^{B \times S \times N}$ .

Second derivative $D_2(t)$ (applied to $D_1$ as above):

$D_2(t) = D_1(t+1) - D_1(t), \quad \begin{cases} D_2(0) = \mathbf{0}, \ t = 0, \dots, S-2 \end{cases}$

with $D_2 \in \mathbb{R}^{B \times S \times N}$ .

Due to the uneven scale arising from abrupt transitions, normalization is typically employed, either by scale-invariant normalization $\widetilde D_i(t) = \frac{D_i(t)}{\max_{t'} |D_i(t')| + \varepsilon}$ or zero-mean/unit-variance normalization

$\widetilde D_i(t) = \frac{D_i(t) - \mu_i}{\sigma_i}\,, \quad \mu_i = \frac{1}{S} \sum_t D_i(t), \quad \sigma_i^2 = \frac{1}{S}\sum_t (D_i(t) - \mu_i)^2.$

allowing for consistent convolutional feature extraction across series of diverse scales.

2. Heatmap Construction and 2D Representation

The normalized first and second derivatives $\widetilde D_1$ and $\widetilde D_2$ are stacked along a new axis to create a tensor $H_{2D} \in \mathbb{R}^{B \times 2 \times S \times N}$ :

$H_{2D}(t, d) = \begin{cases} \widetilde D_1(t), & d = 1 \ \widetilde D_2(t), & d = 2 \end{cases}$

Within CNN conventions, $H_{2D}$ adopts the layout $[B, C_{\mathrm{in}} = 2, H = S, W = N]$ , treating the two derivative orders as distinct channels. This 2D encoding enables subsequent convolutional processing to leverage local patterns jointly across time ( $S$ ) and feature ( $N$ ) dimensions.

The typical FSDH transformation and convolutional encoding process is as follows:

Stage	Input Shape	Operation/Output
Compute $D_1$ , $D_2$	$[B, S, N]$	$[B, S, N]$ each
Normalize/stack	$[B, S, N]$	$[B, 2, S, N]$
Conv2D layers ( $\sim$ 2-3)	$[B, 2, S, N]$	$[B, C', S, N]$
Weighted sum/linear projection	$[B, C', S, N]$	$[B, S, N]$
Slicing/final head	$[B, S, N]$	$[B, P, N]$

For convolution, standard hyperparameters are: kernel size $3 \times 3$ , stride 1, padding 1, typically 2-3 layers, with ReLU activation and BatchNorm2d. The final head consists of a $1 \times 1$ convolution or a linear projection, optionally selecting the last $P$ time steps for step-wise forecasting.

3. Interpreting Derivative Heatmaps

The first and second derivative heatmaps are explicitly interpretable:

First derivative heatmap ( $D_1$ ): Encodes sharp rises and falls; regions where $|D_1(t)|$ is large denote abrupt local increases or decreases, characterizing edges and rapid regime changes.
Second derivative heatmap ( $D_2$ ): Encodes turning points; sign changes in $D_2(t)$ localize inflection points such as local maxima or minima, highlighting corners in the temporal trajectory.

This local structure captures features—such as spikes, drops, and onset of transitions—often indistinguishable in frequency-decomposed or smoothed representations. FSDH thereby yields a two-channel 2D image from which convolutional operations can extract composite phenomena such as "edges" and "corners" in the time-value axis—a property known to be useful in standard computer vision convolutional processing (Nematirad et al., 31 Mar 2025).

4. Integration within Times2D Forecasting Pipeline

In the Times2D pipeline, FSDH and the Periodic Decomposition Block (PDB) operate in parallel, each generating a 1D summary tensor for the forecasting horizon:

FSDH produces $\widehat X^{\mathrm{FSDH}}_{1D} \in \mathbb{R}^{B \times P \times N}$ by projecting the feature maps via weighted sum or linear operation.
PDB yields an analogous tensor $\widehat X^{\mathrm{PDB}}_{1D}$ that contains frequency-domain multi-period decomposed features.

These are combined in the Aggregation Forecasting Block (AFB) either by direct element-wise summation: $\widehat X_{1D} = \widehat X^{\mathrm{PDB}}_{1D} + \widehat X^{\mathrm{FSDH}}_{1D}$ or, optionally, via gated/attention-based fusion with a learned scalar $\alpha \in (0,1)$ : $\widehat X_{1D} = \alpha\,\widehat X_{1D}^{\mathrm{PDB}} + (1-\alpha)\,\widehat X_{1D}^{\mathrm{FSDH}}$

This mechanism can be viewed as a specialized residual fusion, maintaining the complementary statistical properties of both decomposition branches.

5. Implementation Specifics and Hyperparameter Regimes

Key hyperparameters and architectural conventions for FSDH in Times2D include:

Input shapes: $2 \times S \times N$ for the stacked derivatives; $S$ is typically in $[96, 1440]$ .
Convolutional stack: 2 or 3 layers; kernel size $3 \times 3$ ; stride $1,1$; padding to preserve dimensionality.
Channels: Input: $2$, output per layer: e.g., $32 \rightarrow 64 \rightarrow 32$ .
Activation: ReLU, with BatchNorm2d.
Head: $1 \times 1$ convolution or linear projection $C' \times S \times N \to S \times N$ ; final slicing or linear to produce last $P$ forecasted steps.
Optimization: All FSDH weights and parameters are trained end-to-end alongside the full Times2D model using either MSE or MAE on the final fused output.

The process is summarized in the following schematic pseudocode (PyTorch-style):

D1 = X[:,1:,:] - X[:,:-1,:]                   # [B,S-1,N]
D1 = torch.cat([torch.zeros(B,1,N), D1], dim=1)
D2 = D1[:,1:,:] - D1[:,:-1,:]
D2 = torch.cat([torch.zeros(B,1,N), D2], dim=1)
D1 = D1 / (D1.abs().amax(dim=1,keepdim=True)+1e-6)
D2 = D2 / (D2.abs().amax(dim=1,keepdim=True)+1e-6)
H = torch.stack([D1, D2], dim=1)            # [B,2,S,N]
H_feat = conv_layers(H)                    # [B,C',S,N]
weights = torch.nn.Parameter(torch.ones(C'))
X_fsdh = (H_feat * weights.view(1,C',1,1)).sum(dim=1)
X_out = linear_head(X_fsdh[:,-P:,:])       # [B,P,N]

6. Empirical Role and Significance

The FSDH module, by explicitly representing both sharp local changes and turning points, enhances the Times2D framework’s capacity to model highly non-stationary, irregular, or rapidly fluctuating real-world time series. When combined in a residual fusion with the globally-informative, periodic components extracted by the PDB, FSDH yields a hybrid representation that empirically demonstrates state-of-the-art forecasting accuracy for both short-term transitions and long-horizon prediction tasks. This suggests that FSDH provides an effective means to supplement classical spectral or frequency-based features with local, edge-aware statistics directly extracted from the raw temporal data (Nematirad et al., 31 Mar 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Times2D: Multi-Period Decomposition and Derivative Mapping for General Time Series Forecasting (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to First and Second Derivative Heatmaps (FSDH).