Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Institutional Platform for Secure Self-Service Large Language Model Exploration (2402.00913v2)

Published 1 Feb 2024 in cs.CR, cs.AI, and cs.CL

Abstract: This paper introduces a user-friendly platform developed by the University of Kentucky Center for Applied AI, designed to make large, customized LLMs more accessible. By capitalizing on recent advancements in multi-LoRA inference, the system efficiently accommodates custom adapters for a diverse range of users and projects. The paper outlines the system's architecture and key features, encompassing dataset curation, model training, secure inference, and text-based feature extraction. We illustrate the establishment of a tenant-aware computational network using agent-based methods, securely utilizing islands of isolated resources as a unified system. The platform strives to deliver secure LLM services, emphasizing process and data isolation, end-to-end encryption, and role-based resource authentication. This contribution aligns with the overarching goal of enabling simplified access to cutting-edge AI models and technology in support of scientific discovery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. OpenAI, “Chatgpt,” https://chat.openai.com, 2023, accessed: 2023-07-30.
  2. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
  3. J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  4. J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, and D. C. Schmidt, “A prompt pattern catalog to enhance prompt engineering with chatgpt,” arXiv preprint arXiv:2302.11382, 2023.
  5. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  6. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
  7. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  8. “Llm explorer,” https://llm.extractum.io/, 2024, accessed: 2024-01-11.
  9. K. Zhou, Y. Zhu, Z. Chen, W. Chen, W. X. Zhao, X. Chen, Y. Lin, J.-R. Wen, and J. Han, “Don’t make your llm an evaluation benchmark cheater,” arXiv preprint arXiv:2311.01964, 2023.
  10. “Hugging face,” https://huggingface.co/, 2024, accessed: 2024-01-11.
  11. T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
  12. “Lmsys: Large model systems organization,” https://github.com/lm-sys/FastChat, 2024.
  13. “Openaccess ai collective: Axolotl,” https://github.com/OpenAccess-AI-Collective/axolotl, 2024.
  14. P. Das, N. Ivkin, T. Bansal, L. Rouesnel, P. Gautier, Z. Karnin, L. Dirac, L. Ramakrishnan, A. Perunicic, I. Shcherbatyi et al., “Amazon sagemaker autopilot: a white box automl solution at scale,” in Proceedings of the fourth international workshop on data management for end-to-end machine learning, 2020, pp. 1–7.
  15. “Azure machine learning,” https://azure.microsoft.com/en-us/products/machine-learning, 2024.
  16. “vllm,” https://github.com/vllm-project/vllm, 2024.
  17. “Langchain,” https://github.com/langchain-ai/langchain, 2024, accessed: 2024-01-13.
  18. “Azure openai service,” https://azure.microsoft.com/en-us/products/ai-services/openai-service, 2024.
  19. “Lmsys chatbot arena leaderboard,” https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard, 2024.
  20. B. Gunel, J. Du, A. Conneau, and V. Stoyanov, “Supervised contrastive learning for pre-trained language model fine-tuning,” arXiv preprint arXiv:2011.01403, 2020.
  21. M. Xia, T. Gao, Z. Zeng, and D. Chen, “Sheared llama: Accelerating language model pre-training via structured pruning,” arXiv preprint arXiv:2310.06694, 2023.
  22. M. Kwon, S. M. Xie, K. Bullard, and D. Sadigh, “Reward design with language models,” arXiv preprint arXiv:2303.00001, 2023.
  23. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  24. R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” arXiv preprint arXiv:2305.18290, 2023.
  25. X. Liu, H. Yan, S. Zhang, C. An, X. Qiu, and D. Lin, “Scaling laws of rope-based extrapolation,” arXiv preprint arXiv:2310.05209, 2023.
  26. T. Dao, “Flashattention-2: Faster attention with better parallelism and work partitioning,” arXiv preprint arXiv:2307.08691, 2023.
  27. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  28. E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” arXiv preprint arXiv:2210.17323, 2022.
  29. “Ggml,” https://github.com/ggerganov/ggml, 2023.
  30. T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,” arXiv preprint arXiv:2305.14314, 2023.
  31. J. Lin, J. Tang, H. Tang, S. Yang, X. Dang, and S. Han, “Awq: Activation-aware weight quantization for llm compression and acceleration,” arXiv preprint arXiv:2306.00978, 2023.
  32. “Nvidia tensorrt-llm,” https://github.com/NVIDIA/TensorRT-LLM, 2024, accessed: 2024-01-13.
  33. L. Chen, Z. Ye, Y. Wu, D. Zhuo, L. Ceze, and A. Krishnamurthy, “Punica: Multi-tenant lora serving,” arXiv preprint arXiv:2310.18547, 2023.
  34. Y. Sheng, S. Cao, D. Li, C. Hooper, N. Lee, S. Yang, C. Chou, B. Zhu, L. Zheng, K. Keutzer et al., “S-lora: Serving thousands of concurrent lora adapters,” arXiv preprint arXiv:2311.03285, 2023.
  35. M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne et al., “The fair guiding principles for scientific data management and stewardship,” Scientific data, vol. 3, no. 1, pp. 1–9, 2016.
  36. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020.
  37. K. Tirumala, D. Simig, A. Aghajanyan, and A. S. Morcos, “D4: Improving llm pretraining via document de-duplication and diversification,” arXiv preprint arXiv:2308.12284, 2023.
  38. R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto, “Alpaca: A strong, replicable instruction-following model,” Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, vol. 3, no. 6, p. 7, 2023.
  39. “Sharegpt,” https://sharegpt.com/, 2024.
  40. “Clearml,” https://clear.ml/, 2024.
  41. M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel, “Amazon s3 for science grids: a viable solution?” in Proceedings of the 2008 international workshop on Data-aware distributed computing, 2008, pp. 55–64.
  42. “Lorax,” https://github.com/predibase/lorax, 2024.
  43. “Lorax supported models,” https://predibase.github.io/lorax/models/base_models/, 2024.
  44. “Lorax supported adapters,” https://predibase.github.io/lorax/models/adapters/, 2024.
  45. N. Hyeon-Woo, M. Ye-Bin, and T.-H. Oh, “Fedpara: Low-rank hadamard product for communication-efficient federated learning,” arXiv preprint arXiv:2108.06098, 2021.
  46. A. Edalati, M. Tahaei, I. Kobyzev, V. P. Nia, J. J. Clark, and M. Rezagholizadeh, “Krona: Parameter efficient tuning with kronecker adapter,” arXiv preprint arXiv:2212.10650, 2022.
  47. Y. Li, Y. Yu, C. Liang, P. He, N. Karampatziakis, W. Chen, and T. Zhao, “Loftq: Lora-fine-tuning-aware quantization for large language models,” arXiv preprint arXiv:2310.08659, 2023.
  48. “Lm studion,” https://lmstudio.ai, 2024, accessed: 2024-01-31.
  49. D. Markowitz, “Gptmeet ai’s multitool: Vector embeddings,” Google Cloud Blog, 2022.
  50. V. C. Bumgardner, V. W. Marek, and C. D. Hickey, “Cresco: A distributed agent-based edge computing framework,” in 2016 12th International Conference on Network and Service Management (CNSM).   IEEE, 2016, pp. 400–405.
  51. V. C. Bumgardner, C. Hickey, and V. W. Marek, “An edge-focused model for distributed streaming data applications,” in 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).   IEEE, 2018, pp. 657–662.
  52. C. Bumgardner, C. Hickey, and N. Seyedtalebi, “Agent communications in edge computing,” in 2019 IEEE International Conference on Industrial Internet (ICII).   IEEE, 2019, pp. 387–392.
Citations (1)

Summary

  • The paper introduces a secure platform that enables self-service exploration of large language models by integrating multi-LoRA inference with custom adapters.
  • It details robust dataset curation and efficient model training methodologies that tailor AI models to specific research and project needs.
  • It emphasizes state-of-the-art secure inference through process isolation, end-to-end encryption, and tenant-aware computational networking to protect sensitive data.

Institutional Platform for Secure Self-Service LLM Exploration

Introduction to the Platform

The University of Kentucky's Center for Applied AI has developed a platform designed to enhance the accessibility of customizable LLMs. This initiative leverages the latest advancements in multi-LoRA inference to support a wide array of custom adapters, tailored to meet the diverse requirements of users and projects. The architecture of the platform underscores four main pillars: dataset curation, model training, secure inference, and text-based feature extraction. This holistic approach facilitates a user-friendly environment for engaging with sophisticated AI models, emphasizing the ease of access without compromising on security or functionality.

System Architecture and Key Features

Dataset Curation and Model Training

The platform offers robust mechanisms for dataset curation and model training, allowing users to refine and tailor datasets according to their specific project needs. This customization is critical for generating LLMs that are not only accurate but also relevant to the unique contexts in which they will be applied. The training process is designed to be efficient and scalable, accommodating a variety of user-defined parameters.

Secure Inference and Text-based Feature Extraction

Security is a paramount concern in the operation of the platform, especially during the inference phase. The system ensures process and data isolation, end-to-end encryption, and role-based resource authentication. These measures provide a secure environment for conducting LLM operations while safeguarding sensitive information. Furthermore, the platform incorporates advanced text-based feature extraction capabilities, enhancing the utility and applicability of the generated models across different domains.

Secure Computational Network

A novel aspect of the platform is its tenant-aware computational network, which employs agent-based methods to integrate isolated resources into a coherent system securely. This approach not only optimizes resource utilization but also reinforces the security framework by maintaining strict isolation between the computational environments of different tenants. Such a configuration facilitates the provision of LLM services in a secure and efficient manner, catering to the growing demand for AI-driven solutions in research and industry.

Implications and Future Directions

The development of a secure, self-service platform for exploring LLMs represents a significant contribution to the field of AI. By simplifying access to state-of-the-art AI technologies, the platform has the potential to accelerate scientific discovery and innovation. The emphasis on security and customization addresses some of the key challenges faced by users of LLMs, namely data privacy and model relevance.

Looking ahead, the platform opens up new avenues for research and development in AI. The scalable and secure architecture lays the foundation for future enhancements, such as the integration of more advanced machine learning algorithms and the expansion of support for different types of AI models. Additionally, the focus on user-friendly access points to advanced AI technologies could serve as a model for the development of similar platforms across various domains, further democratizing access to AI tools and resources.

In conclusion, the Institutional Platform for Secure Self-Service LLM Exploration by the University of Kentucky presents a comprehensive solution to the challenges of accessing, customizing, and securely utilizing LLMs. As the field of AI continues to evolve, such platforms will play a crucial role in bridging the gap between sophisticated AI technologies and the diverse needs of their users.

Youtube Logo Streamline Icon: https://streamlinehq.com