k-fingerprinting: a Robust Scalable Website Fingerprinting Technique (1509.00789v3)

Published 2 Sep 2015 in cs.CR

Abstract: Website fingerprinting enables an attacker to infer which web page a client is browsing through encrypted or anonymized network connections. We present a new website fingerprinting technique based on random decision forests and evaluate performance over standard web pages as well as Tor hidden services, on a larger scale than previous works. Our technique, k-fingerprinting, performs better than current state-of-the-art attacks even against website fingerprinting defenses, and we show that it is possible to launch a website fingerprinting attack in the face of a large amount of noisy data. We can correctly determine which of 30 monitored hidden services a client is visiting with 85% true positive rate (TPR), a false positive rate (FPR) as low as 0.02%, from a world size of 100,000 unmonitored web pages. We further show that error rates vary widely between web resources, and thus some patterns of use will be predictably more vulnerable to attack than others.

PDF Abstract

Overview of "k-fingerprinting: a Robust Scalable Website Fingerprinting Technique"

This paper by Jamie Hayes and George Danezis tackles the pertinent and evolving challenge of website fingerprinting—a technique that allows an attacker to identify which web page a client is visiting, despite encryption or the use of anonymity networks like Tor. At its core, this problem is conceptualized as a classification task, where an adversary trains a model to recognize traffic patterns corresponding to specific web pages.

Methodology and Innovations

The authors introduce "k-fingerprinting," a novel attack method grounded in random decision forests. The decision forests assist in developing a robust fingerprint for a web page, which can be leveraged for precise identification in both closed-world and open-world settings. This technique stands out due to several noteworthy features:

Robustness and Speed: It utilizes an ensemble methodology to produce fast and high-accuracy classifications. By engaging multiple decision trees and capitalizing on their collective output, the k-fingerprinting technique achieves superior performance compared to earlier methods.
Feature Importance Analysis: The paper diligently evaluates multiple features used in this and prior literature, providing a detailed analysis of their informativeness. It discovers that simple features, such as the number of incoming packets, rank higher in utility over more complex signal characteristics like packet timing.
Adaptability to Modern Web Usage Patterns: The open-world setting employed covers a significantly large dataset, simulating real-world conditions more effectively. This contributes to the methodology's scalability and its capability to handle varied browsing patterns and massive world sizes.

Results

The empirical assessment of k-fingerprinting demonstrates impressive effectiveness. The method achieves a true positive rate (TPR) of 85% with a false positive rate (FPR) as low as 0.02% when discriminating among 30 monitored hidden services from a world set of 100,000 unmonitored pages—a notable advancement compared to existing techniques.

Against established defenses, including techniques like BuFLO and Decoy Pages, k-fingerprinting consistently outperforms alternative methods. Particularly, when morphed or padded traffic is targeted, this technique shows a comparative resilience, indicating potential vulnerabilities in current defense mechanisms.

Additionally, training on a limited dataset while maintaining high accuracy is a significant outcome from the paper. This not only reduces the setup costs for a potential adversary but also underscores the robustness of the attack against limited-data constraints.

Implications and Future Work

The paper illuminates the ongoing vulnerabilities in encrypted and anonymized web communications, reinforcing the necessity for advanced defenses. As existing defenses are shown to be inadequate under certain conditions, this research indicates the need for novel and adaptive solutions, particularly those that are less reliant on traffic shaping and more on genuine obfuscation.

Looking towards the future, the implications stretch toward improving machine learning methodologies that are resilient against adversarial scenarios. Furthermore, the exploration of scalable solutions that secure metadata—vital in safeguarding user privacy—remains critical. The focus may also shift to better understand the temporal and qualitative aspects of web traffic as dynamic attributes subject to fingerprinting attacks.

In conclusion, the authors have provided a robust and scalable attack framework for website fingerprinting that offers a high degree of accuracy with efficiently managed computational requirements. This paper not only highlights challenges in the field but also lays foundational work that could drive future enhancements in both offensive and defensive web traffic analysis techniques.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Jamie Hayes (47 papers)
George Danezis (35 papers)

Citations (361)

View on Semantic Scholar

k-fingerprinting: a Robust Scalable Website Fingerprinting Technique (1509.00789v3)

Overview of "k-fingerprinting: a Robust Scalable Website Fingerprinting Technique"

Methodology and Innovations

Results

Implications and Future Work

Related Papers