Path Signatures Logistic Regression

Updated 12 July 2025

Path Signatures Logistic Regression (PSLR) is a semi-parametric framework that uses truncated path signatures to convert complex functional data into robust, finite-dimensional features.
It transforms time-augmented functional trajectories into nonlinear representations, bypassing the limitations of traditional basis expansions in handling irregular sampling.
Empirical studies, including Parkinson’s gait analysis and human activity recognition, demonstrate PSLR's superior accuracy and theoretical guarantees for practical functional data analysis.

Path Signatures Logistic Regression (PSLR) is a semi-parametric classification framework designed for vector-valued functional data with accompanying scalar covariates. It departs from traditional basis expansion techniques in functional logistic regression by utilizing truncated path signatures, a concept from rough path theory, to extract geometry-aware, finite-dimensional feature representations. This approach enables the effective modeling of complex, nonlinear dependencies and offers robustness to irregular sampling, thereby addressing substantive limitations of prior methods in modern functional data analysis contexts.

1. Motivation and Conceptual Overview

Classical functional logistic regression models traditionally rely on linear relationships and predetermined basis expansions (e.g., Fourier, B-splines) for representing functional predictors. These expansions often require regular sampling and introduce sensitivity to the choice of basis, leading to rigidity and potential performance degradation when data are incomplete, irregularly sampled, or exhibit substantial cross-channel dependencies. PSLR was developed to overcome these limitations by replacing the linear term on the functional predictor with a nonlinear transformation leveraging path signatures. By embedding each functional predictor as a “time-augmented” path, PSLR constructs a robust, geometry-informed set of features, thus bypassing the need for basis selection and naturally accommodating irregular sampling scenarios.

2. Statistical Framework and Model Specification

PSLR models the conditional probability of a binary outcome $y \in \{0,1\}$ given a vector-valued functional predictor $\bm{X}$ and a scalar covariate vector $\bm{z}$ :

$\mathrm{Logit}\left(\mathbb{P}(y=1\,|\ \bm{X},\bm{z})\right) = F(\bm{X}) + \bm{z}^{\top} \bm{\gamma}$

Here, $F(\bm{X})$ is not approximated using a fixed basis expansion. Instead, the functional trajectory is augmented with the time variable $t$ to form $\widetilde{\bm{X}} = (\bm{X}, t)$ , and its truncated signature $S_p(\widetilde{\bm{X}})$ —comprising all iterated integrals up to order $p$ —is used as the feature set:

$F(\bm{X}) \approx S_p(\widetilde{\bm{X}})^{\top} \bm{\beta}_p$

This yields a generalized linear model:

$\mathrm{Logit}\left(\mathbb{P}(y=1\,|\ \bm{X},\bm{z})\right) = \widetilde{\bm{S}}_p^{\top} \bm{\theta}_p$

where $\widetilde{\bm{S}}_p = (S_p(\widetilde{\bm{X}})^\top, \bm{z}^\top)^\top$ and $\bm{\theta}_p = (\bm{\beta}_p^\top, \bm{\gamma}^\top)^\top$ . The functional term thus leverages a nonlinear, basis-free transformation, while the scalar covariate is incorporated linearly, rendering the model semi-parametric.

3. Feature Construction Using Path Signatures

The signature transform constitutes the cornerstone of PSLR's feature extraction toolkit. By time-augmenting each functional input trajectory, the method computes the set of iterated integrals:

$S^{(i_1, \dots, i_k)}(\widetilde{\bm{X}}) = \int_{0 < t_1 < \dots < t_k < T} d \widetilde{X}_{t_1}^{i_1} \cdots d \widetilde{X}_{t_k}^{i_k}$

These coefficients encapsulate the cumulative geometric and temporal structure of the path, including nonlinear and cross-channel interactions. Critically, path signatures are well-defined for the entire continuous path, which imbues them with inherent robustness under missing data and non-uniform time grids. This robustness ensures that functional data sampled irregularly—without a common time grid—can be processed directly, while subject-specific timing patterns are preserved.

4. Theoretical Guarantees and Model Selection

The development of PSLR is accompanied by rigorous theoretical analysis. The existence of an optimal truncation order $p^*$ and parameter vector $\bm{\theta}_{p^*}^*$ that achieve minimal risk within the model family is established under mild boundedness on $F$ , the functional data $\bm{X}$ , and scalar covariates (Theorem 1). Within an $\ell_1$ -bounded parameter space, there is a minimal sufficient truncation order $p^*$ resulting in risk minimization (Theorem 2). A data-driven procedure for selecting $p^*$ is proposed, based on penalized empirical risk (see Eq. (3.8)–(3.9)), with non-asymptotic excess risk bounds to guarantee finite-sample performance:

$\mathcal{R}_{p}(\bm{\theta}_p) = \mathbb{E}\left[-y\, \widetilde{\bm{S}}_p^{\top} \bm{\theta}_p + \log\left(1+e^{\widetilde{\bm{S}}_p^{\top} \bm{\theta}_p}\right)\right]$

Under regularity conditions, the excess risk converges at rate $\mathcal{O}(n^{-1/2})$ , with an exponential upper bound on the probability of selecting a suboptimal truncation order (Theorems 3 and 4). This principled model selection process ensures a favorable bias-variance trade-off, undergirded by strong non-asymptotic guarantees.

5. Empirical Performance and Comparative Evaluation

PSLR is empirically evaluated on both synthetic and real-world datasets. Simulation studies systematically vary the number of functional components ( $d$ ), the number of scalar covariates ( $q$ ), and the structure of temporal sampling (introducing both missing data and non-uniformity). The PSLR methodology, integrating signature-derived features and scalar covariates, consistently outperforms classical methods based on B-spline, Fourier, and FPCA expansions, as well as ablated models using either only signature features or only scalar effects.

Applications to real-world datasets are explicitly detailed. In Parkinson’s Disease gait analysis, vertical ground reaction force data from multiple foot sensors—combined with clinical and demographic variables—are modeled using PSLR. The method achieves superior accuracy and F1 scores relative to existing functional logistic regressions, with estimated feature coefficients offering interpretability in relation to clinical knowledge (e.g., reduced variability in certain sensors corresponds to established Parkinsonian gait characteristics). In the MotionSense human activity recognition dataset, PSLR maintains robust and accurate classification under irregular sampling, capturing dynamic features from smartphone sensor data more effectively than traditional methods.

6. Practical Implementation and Interpretability

Several practical advantages arise from PSLR’s use of the signature transform:

Basis-free Representation: The method eliminates the need for ad hoc basis selection or knot placement, thus removing a major parameterization hurdle for practitioners.
Efficiency: The finite-dimensional signature provides an efficient and informative embedding of the function, even as the underlying functional data remain infinite-dimensional.
Sampling Robustness: PSLR naturally accommodates irregular, sparse, or missing data, as the computation of path signatures does not rely on a common or uniformly spaced time grid.
Model Selection: The optimal truncation order for the signature, $p$ , is chosen using a fully data-driven procedure with theoretical guarantees for finite-sample performance.
Interpretability: Signature features summarize path characteristics such as displacement, variation, and inter-channel coupling, which can be related to known scientific or clinical phenomena in applied domains, including biomechanics and wearable sensor analysis.

7. Outlook, Limitations, and Future Directions

PSLR represents an integration of rough path theory with functional data analysis, expanding the methodological toolset for practitioners working with complex, high-dimensional, and irregularly sampled datasets. The findings suggest several potential directions for further research: developing more computationally efficient algorithms for higher-order signatures, enhancing the interpretability of higher-order interaction terms, and extending the approach to accommodate alternative outcome types (e.g., multiclass or survival data) and non-Euclidean function spaces. The demonstrated theoretical and practical properties position PSLR as a foundational advancement in geometry-informed, robust functional classification techniques.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Path Signatures Logistic Regression (PSLR).