Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SocialGenPod: Privacy-Friendly Generative AI Social Web Applications with Decentralised Personal Data Stores (2403.10408v1)

Published 15 Mar 2024 in cs.CR, cs.CY, cs.IR, cs.LG, and cs.SI

Abstract: We present SocialGenPod, a decentralised and privacy-friendly way of deploying generative AI Web applications. Unlike centralised Web and data architectures that keep user data tied to application and service providers, we show how one can use Solid -- a decentralised Web specification -- to decouple user data from generative AI applications. We demonstrate SocialGenPod using a prototype that allows users to converse with different LLMs, optionally leveraging Retrieval Augmented Generation to generate answers grounded in private documents stored in any Solid Pod that the user is allowed to access, directly or indirectly. SocialGenPod makes use of Solid access control mechanisms to give users full control of determining who has access to data stored in their Pods. SocialGenPod keeps all user data (chat history, app configuration, personal documents, etc) securely in the user's personal Pod; separate from specific model or application providers. Besides better privacy controls, this approach also enables portability across different services and applications. Finally, we discuss challenges, posed by the large compute requirements of state-of-the-art models, that future research in this area should address. Our prototype is open-source and available at: https://github.com/Vidminas/socialgenpod/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Brett Bejcek. 2024. Who has access to my data? https://help.rewind.ai/en/articles/6526621-who-has-access-to-my-data. Accessed: 2024-02-05.
  2. Tim Berners-Lee. 2023. Inference from Private Data. https://www.w3.org/DesignIssues/PrivateData.html. Accessed:2024-02-05.
  3. Ben Derico. 2023. ChatGPT bug leaked users’ conversation histories. https://www.bbc.co.uk/news/technology-65047304. Accessed: 2024-01-24.
  4. Iron: Private inference on transformers. Advances in Neural Information Processing Systems 35 (2022), 15718–15731.
  5. Mixtral of experts. arXiv preprint arXiv:2401.04088 (2024).
  6. Language Models as a Service: Overview of a New Paradigm and its Challenges. arXiv e-prints (2023), arXiv–2309.
  7. A demonstration of the solid platform for social web applications. In Proceedings of the 25th international conference companion on world wide web. 223–226.
  8. Solid: a platform for decentralized social applications based on linked data. MIT CSAIL & Qatar Computing Research Institute, Tech. Rep. (2016).
  9. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  10. Diffusion models: A comprehensive survey of methods and applications. Comput. Surveys 56, 4 (2023), 1–39.
  11. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).

Summary

  • The paper presents a prototype that uses decentralized personal data stores to empower user privacy and control while enabling generative AI applications.
  • It employs a modular, service-oriented architecture integrating web applications, retrieval services, and AI models to address centralized data challenges.
  • The study highlights technical hurdles and outlines future research directions to balance computational efficiency, personalization, and privacy in AI services.

SocialGenPod: A Decentralised Approach for Privacy-Friendly Generative AI on the Social Web

Introduction to SocialGenPod

The paper presents SocialGenPod, a prototype demonstrating the deployment of generative AI web applications in a privacy-preserving, decentralized manner by utilizing Solid, a specification for the decentralized web. Traditionally, user data in AI-driven applications is managed centrally, posing concerns regarding privacy, control, and vendor lock-in. SocialGenPod addresses these challenges by allowing users to store their data in Solid Pods—personal online data stores under user control—and enabling AI applications to operate using this data without compromising privacy. The paper not only introduces the concept and architecture of SocialGenPod but also explores the technical challenges and potential research directions in this emerging field.

Decentralised Personal Data Stores

Central to SocialGenPod is the use of decentralised personal data stores, which allow for the separation of user data from the AI models and applications that use this data. This architecture addresses several key issues inherent to centralized data management:

  • Privacy and User Control: Users retain full control over their data, deciding what information is shared and with whom. This is a significant departure from traditional models where service providers have unrestricted access to user data.
  • Data Portability: By decoupling data from applications, users can easily transfer their information between different services, mitigating vendor lock-in and fostering an ecosystem of interoperable services.
  • Challenges in Implementation: The decentralised nature of Solid poses unique technical challenges, particularly related to the substantial computational requirements of state-of-the-art AI models. The prototype demonstrates solutions to some of these challenges, providing a foundation for future research.

Technical Implementation

SocialGenPod's implementation showcases a modular architecture comprising a web application, a retrieval service, and generative AI models, all interacting in a privacy-friendly ecosystem. Key features include:

  • Solid-Based User Authentication and Data Management: The user’s identity is authenticated via Solid-OIDC, allowing the web application to interact with the user’s Solid Pod to retrieve and store data securely.
  • Modular and Interchangeable Components: By adopting a service-oriented architecture, SocialGenPod enables users to select or substitute different retrieval and AI model services according to their preferences.
  • Challenges in Decentralised Retrieval and AI Services: The paper discusses the hurdles in integrating decentralised retrieval services and AI models, focusing on the need to balance privacy with computational efficiency. The proposed solutions are pragmatic, yet underscore the necessity for further innovation in this space.

Future Directions and Open Challenges

The exploration of SocialGenPod points to several promising avenues for research and development:

  • Enhanced Personalisation through Decentralised Models: Future developments could include more sophisticated personalisation features, leveraging private data for fine-tuning models in a decentralised manner.
  • Improvements in Privacy and Security for Document Retrieval: Addressing the privacy implications of document retrieval—especially in scenarios where data must be temporarily copied for processing—is paramount for advancing the privacy guarantees of decentralised applications.
  • Exploration of Private Inference Techniques: Enhancing privacy protections when interacting with external model providers through techniques such as private inference represents a crucial area of future research.

Conclusion

SocialGenPod represents a seminal step towards realising a privacy-friendly, decentralised framework for generative AI applications on the social web. While the prototype addresses key challenges associated with privacy, control, and data portability, it also highlights significant areas requiring further investigation and development. As the field evolves, SocialGenPod serves as a catalyst for ongoing research into decentralised data management and its integration with generative AI technologies.