Explain the effectiveness of the SimExp encoding on UCR datasets

Investigate and explain why the Similarity-preserving Expanded encoding (SimExp) achieves strong classification performance on UCR time-series datasets even without a cellular automaton reservoir, and characterize the mechanisms and dataset conditions under which SimExp is effective. Derive theoretical or empirical justifications for its superiority relative to using a linear SVM directly on the original floating-point series and clarify its relationship to methods capturing global structure such as dynamic time warping.

Background

The paper introduces SimExp, an expanded binary encoding that preserves similarity of time-series values. Ablation shows SimExp alone improves average accuracy over a linear SVM baseline across UCR datasets and even marginally outperforms a dynamic time warping baseline, while adding a CA reservoir does not help. The authors note that the reason for SimExp’s effectiveness remains unresolved and call for further research.

References

Why it is so effective is a question that remains open for further research.

On when is Reservoir Computing with Cellular Automata Beneficial? (2407.09501 - Glover et al., 13 Jun 2024) in Section 5.3 Deception of good encoding