Automated Website Fingerprinting through Deep Learning (1708.06376v2)

Published 21 Aug 2017 in cs.CR and cs.LG

Abstract: Several studies have shown that the network traffic that is generated by a visit to a website over Tor reveals information specific to the website through the timing and sizes of network packets. By capturing traffic traces between users and their Tor entry guard, a network eavesdropper can leverage this meta-data to reveal which website Tor users are visiting. The success of such attacks heavily depends on the particular set of traffic features that are used to construct the fingerprint. Typically, these features are manually engineered and, as such, any change introduced to the Tor network can render these carefully constructed features ineffective. In this paper, we show that an adversary can automate the feature engineering process, and thus automatically deanonymize Tor traffic by applying our novel method based on deep learning. We collect a dataset comprised of more than three million network traces, which is the largest dataset of web traffic ever used for website fingerprinting, and find that the performance achieved by our deep learning approaches is comparable to known methods which include various research efforts spanning over multiple years. The obtained success rate exceeds 96% for a closed world of 100 websites and 94% for our biggest closed world of 900 classes. In our open world evaluation, the most performant deep learning model is 2% more accurate than the state-of-the-art attack. Furthermore, we show that the implicit features automatically learned by our approach are far more resilient to dynamic changes of web content over time. We conclude that the ability to automatically construct the most relevant traffic features and perform accurate traffic recognition makes our deep learning based approach an efficient, flexible and robust technique for website fingerprinting.

Authors (5)

Vera Rimmer (5 papers)
Davy Preuveneers (5 papers)
Marc Juarez (12 papers)
Tom Van Goethem (4 papers)
Wouter Joosen (12 papers)

Citations (280)

View on Semantic Scholar

Summary

The paper introduces a novel deep learning model that automates feature extraction for website fingerprinting on Tor networks.
The authors systematically evaluated SDAE, CNN, and LSTM architectures using a comprehensive dataset of over three million traces.
Results demonstrate that deep learning models outperform traditional methods while maintaining robustness against temporal website changes.

An Expert Overview of "Automated Website Fingerprinting through Deep Learning"

The paper "Automated Website Fingerprinting through Deep Learning" by Rimmer et al. addresses the significant challenge of website fingerprinting (WF) attacks on The Onion Router (Tor) using deep learning methodologies. This research contributes to the privacy and security domain by suggesting how deep neural networks can improve the ability to deanonymize Tor traffic, a longstanding concern due to the potential for identity exposure to local network observers.

Summary of Contributions

The authors propose an innovative WF attack model that leverages deep learning (DL) techniques to automate feature extraction from network traffic data, showcasing its adaptability and robustness compared to traditional machine learning-based WF attacks. The main points of the contributions are as follows:

Systematic Exploration of DL Algorithms:
- The paper systematically evaluates feedforward, convolutional, and recurrent architectures.
- They employ three DL models: Stacked Denoising Autoencoder (SDAE), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), demonstrating the efficacy of automated feature learning.
Comprehensive Dataset Utilization:
- A dataset comprising over three million traces—the largest of its kind used for WF—is employed.
- This expansive data allows a more thorough evaluation of the DL-based approaches, showcasing their performance and resilience.
Comparative Evaluation with State-of-the-art:
- The DL approaches are methodically compared with existing WF techniques such as Wang’s k-NN, Panchenko’s CUMUL, and Hayes's k-Fingerprinting.
- Findings reveal that DL models either match or surpass these attacks, with accuracy improvements of up to 2% in some scenarios.
Robustness to Temporal Changes:
- A key insight is that DL-based models display superior resilience to the dynamic content changes of websites over time, a common challenge in maintaining WF efficacy.

Implications and Future Directions

The paper's implications touch both practical and theoretical facets of cybersecurity. Practically, this work suggests that adversaries now possess automated tools capable of sophisticated WF attacks on anonymized traffic, necessitating the development of more robust defense mechanisms. Theoretically, it opens up discussions on the potential arms race between WF attacks employing machine learning and countermeasures that have to obfuscate or disguise traffic characteristics effectively.

Future advancements could pivot toward:

Developing stronger countermeasures against DL-based WF attacks that target the model intricacies rather than fixed traffic features.
Exploring adversarial training approaches, which might guide the design of defense strategies that confuse deep learning models, ensuring higher anonymity in communication over Tor.

Conclusion

Rimmer et al. significantly advance the understanding of how DL methodologies transform the landscape of WF attacks on Tor, offering a compelling narrative about the dual-edged nature of AI technologies in cybersecurity. While they enhance attack capabilities by enabling adaptive and resilient WF methods, they also set the stage for innovative defensive developments in the ongoing pursuit of online privacy.

PDF Markdown