Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Long-term series forecasting with Query Selector -- efficient model of sparse attention (2107.08687v2)

Published 19 Jul 2021 in cs.LG

Abstract: Various modifications of TRANSFORMER were recently used to solve time-series forecasting problem. We propose Query Selector - an efficient, deterministic algorithm for sparse attention matrix. Experiments show it achieves state-of-the art results on ETT, Helpdesk and BPI'12 datasets.

Introduction

Time series forecasting (TSF) is a critical component of many sectors, including finance and healthcare. It involves predicting future values based on past observations. Traditional statistical models, and more recently ML and deep learning (DL) methods, have been applied to address this challenge. DL models, particularly Transformer models originating from NLP, have shown promise in TSF. However, they are computationally intensive due to the attention mechanism, raising concerns about their efficiency when handling long sequences.

Related Work

Many methods focus on reducing the complexity introduced by attention matrices in Transformers. Strategies include using older neural network architectures like RNN, CNN, and LSTM or improving the computational efficiency of Transformers. Recent developments aim to mitigate the performance costs by introducing sparsity in the attention matrix. Notable approaches to sparsity include the Longformer, Reformer, and Informer models, with the latter providing state-of-the-art results.

Background and Methodology

The paper presents a deterministic approach, called Query Selector, for constructing sparse attention matrices that enhance computational efficiency while maintaining accuracy in TSF models. The key principle behind Query Selector is the selective evaluation of queries that contribute most significantly to the attention matrix, informed by insights similar to the dropout technique in ML. A hyperparameter, designated as the factor f, dictates the extent of sparsity. The algorithm chooses a subset of queries, computes a modified key matrix with mean values, and calculates the sparse attention matrix through a softmax operation.

Experiments and Results

The methodology was evaluated using the Electricity Transformer Temperature (ETT) dataset, featuring multivariate TSF tasks. Inquiry Selector performance was assessed against the state-of-the-art Informer and full attention Transformer models. Results indicate that Query Selector can achieve lower prediction errors across different configurations and even surpass the performance of the reference models in some scenarios. The approach's effectiveness was further demonstrated in business process forecasting, providing promising avenues for deployment in various TSF applications.

In summary, the proposed Query Selector algorithm represents a significant contribution to the ongoing effort to streamline computations in time series forecasting. By reducing the computational load without compromising accuracy, this work facilitates the implementation of DL approaches to TSF in practical, real-world scenarios where long sequences are common.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jacek Klimek (1 paper)
  2. Jakub Klimek (1 paper)
  3. Witold Kraskiewicz (2 papers)
  4. Mateusz Topolewski (4 papers)
Citations (6)