A First Look at Related Website Sets (2408.07495v1)

Published 14 Aug 2024 in cs.NI

Abstract: We present the first measurement of the user-effect and privacy impact of "Related Website Sets," a recent proposal to reduce browser privacy protections between two sites if those sites are related to each other. An assumption (both explicitly and implicitly) underpinning the Related Website Sets proposal is that users can accurately determine if two sites are related via the same entity. In this work, we probe this assumption via measurements and a user study of 30 participants, to assess the ability of Web users to determine if two sites are (according to the Related Website Sets feature) related to each other. We find that this is largely not the case. Our findings indicate that 42 (36.8%) of the user determinations in our study are incorrect in privacy-harming ways, where users think that sites are not related, but would be treated as related (and so due less privacy protections) by the Related Website Sets feature. Additionally, 22 (73.3%) of participants made at least one incorrect evaluation during the study. We also characterise the Related Website Sets list, its composition over time, and its governance.

Summary

The paper examines the Related Website Sets' main contribution by evaluating user ability, revealing a 36.8% misidentification rate.
It analyzes the RWS list composition and governance, showing a 58.8% pull request rejection rate and a median processing time of 5 days.
The evaluation of HTML and domain similarity reveals a low joint score of 0.04, questioning the reliability of automated relatedness detection.

The paper "A First Look at Related Website Sets," authored by Stephen McQuistin, Peter Snyder, Hamed Haddadi, and Gareth Tyson, presents an in-depth evaluation of the Google-proposed "Related Website Sets" (RWS) feature. This feature aims to create exceptions to third-party storage partitioning in browsers, based on the relatedness of websites, to enhance web usability without unduly sacrificing privacy protections.

Summary

The paper performs a rigorous assessment of the underlying assumptions of RWS through empirical measurements and a user paper. Fundamental to the RWS proposal is the presumption that users can accurately determine the affiliation between websites, and hence, trust that certain cross-site data sharing should occur.

Key contributions of the paper include:

User Study on Website Relatedness: The paper involving 30 participants investigates whether users can accurately determine if two websites are related. Results reveal that participants incorrectly identified related websites 36.8% of the time, suggesting significant privacy risks if RWS is implemented without further adjustments.
Evaluation of RWS List Composition and Governance: The authors analyze the current RWS list's development and administration processes, revealing processes around pull requests and the implementation of related subsets (service sites, associated sites, and ccTLD variations).
HTML and Domain Similarity Analysis: The paper assesses the structural and content similarity between primary sites and their associated or service sites, showing that common branding elements and domain name similarities are often pivotal in user evaluations of relatedness, yet are insufficiently reliable metrics for determining relatedness programmatically.

Findings and Implications

User Study on Relatedness

The user paper illustrates a significant discrepancy between the Related Website Sets maintainers' expectations and user perceptions. About 36.8% of related sites were incorrectly identified as unrelated by users. Furthermore, participants who identified unrelated sites took longer to make their decisions, indicating uncertainty and a lack of clear indicators of affiliation.

The implications are profound: if users cannot reliably identify related sites, the RWS could inadvertently reduce privacy protections in situations where users do not expect it. This mismatch could lead to increased tracking and privacy risks.

Composition and Governance of RWS List

Analyzing the governance around the RWS list, the paper demonstrates a substantial volume of pull requests that do not meet the criteria, evidenced by a rejection rate of approximately 58.8%. The automated validation processes, while efficient, still rely heavily on manual checks, with a median processing time of 5 days for successful pull requests. This reveals potential bottlenecks and the need for better documentation and tooling to streamline the submission process.

HTML and Domain Similarity

The structural analysis using HTML similarity metrics indicates that many associated and service sites are not closely related to their set primaries in terms of content structure and style. The median joint HTML similarity score of 0.04 suggests that automated approaches to determining relatedness based on page content or domain structure are not robust.

Broader Implications

These findings suggest that the RWS proposal, while aiming to balance usability and privacy, may need significant adjustments to mitigate privacy risks.

Enhancing User Awareness: The paper suggests exploring methods to explicitly indicate relatedness to users, perhaps through browser UI elements. This could help align user expectations with the operational policies dictated by RWS.
Refining Criteria for Relatedness: Further refinement in defining and validating the criteria for relatedness could help mitigate some of the privacy risks. This may include more stringent checks beyond domain name similarity and HTML structure.
Scalability and Manual Governance: As the adoption of RWS grows, the scalability of manual validation processes will become increasingly important. Automating additional aspects of the validation and introducing more rigorous pre-submission checks could be beneficial.

Future Work in AI and Web Privacy

The intersection of AI and web privacy continues to provide fertile ground for research and development. Future work could explore advanced machine learning models to evaluate the likelihood of domains being related based on user interaction patterns and other contextual clues. Moreover, the efficacy of UI-based indicators for conveying relatedness effectively is another promising avenue.

In conclusion, "A First Look at Related Website Sets" provides a comprehensive first examination of the implications of Google's RWS proposal, highlighting critical areas where the current assumptions may not hold. The paper underscores the importance of aligning privacy measures with actual user perceptions and expectations to strengthen web privacy protections.