Transfer Learning on Transformers for Building Energy Consumption Forecasting -- A Comparative Study (2410.14107v3)

Published 18 Oct 2024 in cs.LG

Abstract: This study investigates the application of Transfer Learning (TL) on Transformer architectures to enhance building energy consumption forecasting. Transformers are a relatively new deep learning architecture, which has served as the foundation for groundbreaking technologies such as ChatGPT. While TL has been studied in the past, prior studies considered either one data-centric TL strategy or used older deep learning models such as Recurrent Neural Networks or Convolutional Neural Networks. Here, we carry out an extensive empirical study on six different data-centric TL strategies and analyse their performance under varying feature spaces. In addition to the vanilla Transformer architecture, we also experiment with Informer and PatchTST, specifically designed for time series forecasting. We use 16 datasets from the Building Data Genome Project 2 to create building energy consumption forecasting models. Experimental results reveal that while TL is generally beneficial, especially when the target domain has no data, careful selection of the exact TL strategy should be made to gain the maximum benefit. This decision largely depends on the feature space properties such as the recorded weather features. We also note that PatchTST outperforms the other two Transformer variants (vanilla Transformer and Informer). Our findings advance the building energy consumption forecasting using advanced approaches like TL and Transformer architectures.

Summary

The paper shows that applying six transfer learning strategies to Transformer models significantly enhances forecasting accuracy with up to 15% MAE reduction.
The paper compares advanced Transformer variants such as Informer and PatchTST, revealing notable performance differences in time series forecasting.
The paper highlights that selecting the appropriate transfer learning approach and feature space is critical for robust energy consumption predictions.

Transfer Learning on Transformers for Building Energy Consumption Forecasting: A Comparative Study

This paper investigates the integration of Transfer Learning (TL) within Transformer architectures for enhancing building energy consumption forecasting. It makes a significant contribution to the understanding of how TL strategies can be applied to Transformers, a deep learning architecture known for its strength in handling sequential data.

Key Findings and Methodology

The paper provides an extensive empirical evaluation involving six TL strategies across various Transformer models: vanilla Transformer, Informer, and PatchTST. The research utilizes data from the Building Data Genome Project 2, covering diverse energy consumption profiles from multiple geographies and climatic conditions. The primary focus is on exploring how these TL strategies impact forecasting accuracy across different feature spaces and dataset characteristics.

Integration of Transformer Variants:

The experiments include advanced Transformer variants—Informer and PatchTST—specifically tailored for time series forecasting. The analysis demonstrates that while Transformers generally improve forecasting performance, the choice of TL strategy and Transformer variant profoundly influences the outcomes.

Data-Centric TL Strategies:

The paper explores six out of eight potential TL strategies, evaluating their application to the task of building energy consumption prediction. Results indicate that TL is most beneficial when the target domain lacks sufficient data, although careful consideration of the feature space (e.g., weather features) is crucial for maximizing benefits.

Performance Metrics:

The authors use Mean Absolute Error (MAE) and Mean Squared Error (MSE) to evaluate model performance across different forecasting horizons (24-hour and 96-hour). Their findings reveal that PatchTST consistently outperforms other Transformer variants, demonstrating its effectiveness in capturing complex temporal dependencies inherent in energy consumption data.

Numerical Results and Insights

The research demonstrates substantial improvements in forecasting accuracy relative to traditional models. For example, Strategy 8, involving fine-tuning a model pre-trained on all datasets, generally yields the best performance gains, with MAE reductions up to 15% in some cases. This reinforces the potential of TL strategies in enhancing model precision under scenarios with limited data availability.

Implications and Future Directions

The results suggest practical pathways for integrating AI-driven tools in building management systems, which could significantly enhance energy efficiency and sustainability. However, the paper also highlights challenges, such as the computational intensity associated with training large Transformer models, which may limit their real-time applicability. Addressing these computational demands should be a priority for future research, possibly through algorithmic optimizations or more efficient computing strategies.

Regional and Computational Limitations:

The paper is primarily confined to datasets from North American and European regions. Expanding this scope to include more diverse geographic datasets will be crucial for validating the generalizability of these findings. Additionally, leveraging foundation models like TimeGPT could open new avenues for TL experimentation, although such efforts will require significant computational resources.

Conclusion

This research contributes substantially to the field by showcasing the effectiveness of TL strategies on Transformer architectures in building energy forecasting. It sets the stage for future explorations into scalable and efficient AI solutions for energy management, aligning with global sustainability objectives. The careful examination of different TL strategies serves as a valuable guide for researchers aiming to apply Transformer models in similar domains. The work underscores the importance of selecting appropriate TL strategies based on the specific characteristics of available datasets, paving the way for more accurate and adaptable energy forecasting models.

PDF Markdown