Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MELTing point: Mobile Evaluation of Language Transformers (2403.12844v4)

Published 19 Mar 2024 in cs.LG
MELTing point: Mobile Evaluation of Language Transformers

Abstract: Transformers have revolutionized the machine learning landscape, gradually making their way into everyday tasks and equipping our computers with "sparks of intelligence". However, their runtime requirements have prevented them from being broadly deployed on mobile. As personal devices become increasingly powerful and prompt privacy becomes an ever more pressing issue, we explore the current state of mobile execution of LLMs. To achieve this, we have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device, supporting different models, devices and frameworks, including Android, iOS and Nvidia Jetson devices. We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance, tracing their memory and energy requirements along the way. Our analysis is the first systematic study of on-device LLM execution, quantifying performance, energy efficiency and accuracy across various state-of-the-art models and showcases the state of on-device intelligence in the era of hyperscale models. Results highlight the performance heterogeneity across targets and corroborates that LLM inference is largely memory-bound. Quantization drastically reduces memory requirements and renders execution viable, but at a non-negligible accuracy cost. Drawing from its energy footprint and thermal behavior, the continuous execution of LLMs remains elusive, as both factors negatively affect user experience. Last, our experience shows that the ecosystem is still in its infancy, and algorithmic as well as hardware breakthroughs can significantly shift the execution cost. We expect NPU acceleration, and framework-hardware co-design to be the biggest bet towards efficient standalone execution, with the alternative of offloading tailored towards edge deployments.

Analyzing On-Device Execution of LLMs: A Technical Deep Dive

The research paper titled "MELTing Point: Mobile Evaluation of Language Transformers" presents a systematic investigation into the execution of LLMs on mobile devices. The paper initiates a meticulous examination of the feasibility and performance of running LLMs at the consumer edge, with a primary focus on mobile systems. As edge devices are gaining computational capabilities, the possibility of private, efficient, and localized execution of LLMs becomes increasingly plausible. This analysis seeks to outline the paper's findings, quilted together through a rigorous methodological approach and substantiated by a sophisticated infrastructure termed MELT.

The authors begin by contextualizing the need for running LLMs on-device, emphasizing privacy concerns, decentralization, and the democratization of machine intelligence. The aims of the paper are positioned around core research questions: the feasibility of on-device deployment, the inference performance across heterogeneous consumer devices, and the bottlenecks impeding such deployments. Additionally, it investigates the trade-offs incurred by quantization—a technique often employed to reduce model size and memory footprint, albeit at potential accuracy costs.

Methodology and Infrastructure

MELT, the bespoke infrastructure introduced in the paper, serves as the cornerstone of this research. This infrastructure entails an integrated system for downloading, converting, deploying, and benchmarking LLMs across a gamut of devices, including iOS and Android-based platforms, utilizing various execution frameworks. The authors meticulously constructed a device farm encompassing high-end and mid-tier devices, accompanied by an elaborate energy monitoring setup. This detailed infrastructure enabled them to trace performance, energy consumption, and thermal behavior systematically across devices.

The paper employs a wide array of LLMs, sourced and configured to support different quantization schemes and frameworks, namely MLC-LLM and llama.cpp. The evaluation primarily focuses on conversational agents, leveraging a dataset of multi-turn prompts. Through MELT, the paper automates interaction, monitors power consumption, and records inference performance metrics, offering a granular view of each aspect of the on-device LLM execution.

Key Findings

  1. Performance and Throughput: The paper highlights significant heterogeneity in LLM performance across devices, primarily contingent on the model size, framework, and device tier. Interestingly, the paper reports higher prefill throughput than generation throughput, attributive to the compute vs. memory-bound characteristics of the workload.
  2. Energy Efficiency: The research outlines the pronounced energy demands of LLM inference, citing that quantization, while reducing memory demands, incurs an accuracy loss. Furthermore, the high power draw during inference poses challenges for sustained on-device execution, impacting user experience significantly.
  3. Quantization Impacts: A notable insight from this work is the precision vs. accuracy trade-offs inherent in quantization. The paper evidences that while quantization can render LLMs deployable on resource-constrained devices, it often results in noticeable performance degradation, specifically in models quantized below 4-bit precision.
  4. Memory and Computational Bottlenecks: The paper confirms that LLM inference remains predominantly memory-bound. The memory bandwidth becomes a critical bottleneck, especially during the decode operation in the generation phase of LLM inference.
  5. Quality of Experience: The paper emphasizes that the on-device deployment of LLMs can adversely affect user experience, noting device responsiveness issues during model load times and execution phases.

Future Implications and Research Directions

In view of the findings, the paper speculates on the potential shifts in AI and edge computing. It hypothesizes that future advancements might manifest through algorithmic innovations or hardware evolutions, such as improved neural processing units (NPUs) and hardware-software co-design frameworks, aimed at optimizing these memory-intensive workloads. The sustainability concerns discussed also hint at the necessity for cloud-edge hybrid models that can balance resource efficiencies.

Finally, the paper posits the broadened utility of extended on-device capabilities, opening avenues for personalized, multimodal, and context-aware intelligent assistants. This could lead to an evolution in how users interact with digital systems, shifting from conventional multi-step processes to natural language-driven workflows facilitated by robust on-device AI capabilities.

The paper contributes valuable benchmarks and insights into the field of mobile AI, setting a foundational platform for the continued exploration and adaptation of LLMs at the edge. As hardware progresses and algorithmic novel techniques unfold, this research paves the path toward realizing efficient, privacy-centric, on-device AI solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (120)
  1. Best of both worlds: Automl codesign of a cnn and its hardware accelerator. In 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6.
  2. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 4895–4901.
  3. Alibaba. 2023. MNN-LLM. https://github.com/alibaba/MNN
  4. LLM in a flash: Efficient Large Language Model Inference with Limited Memory. arXiv:2312.11514 [cs.CL]
  5. EmBench: Quantifying performance variations of deep neural networks across modern commodity devices. In The 3rd international workshop on deep learning for mobile systems and applications. 1–6.
  6. Smart at what cost? characterising mobile deep neural networks in the wild. In Proceedings of the 21st ACM Internet Measurement Conference. 658–672.
  7. DeepSpeed-inference: enabling efficient inference of transformer models at unprecedented scale. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.
  8. android.com. 2023. AICore. https://developer.android.com/ml/aicore Accessed: Dec 2023.
  9. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).
  10. Barbara Krasnoff,. 2021. How to use Android 12’s call screening features. https://www.theverge.com/22792060/call-screening-android-12-google-pixel-how-to Accessed: Mar 2024.
  11. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
  12. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  13. Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads. https://github.com/FasterDecoding/Medusa.
  14. Accelerating large language model decoding with speculative sampling. arXiv preprint arXiv:2302.01318 (2023).
  15. TVM: end-to-end optimization stack for deep learning. arXiv preprint arXiv:1802.04799 11, 20 (2018).
  16. François Chollet. 2019. On the measure of intelligence. arXiv preprint arXiv:1911.01547 (2019).
  17. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
  18. To what extent do deep learning-based code recommenders generate predictions by cloning code from the training set?. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 167–178. https://doi.org/10.1145/3524842.3528440
  19. commoncrawl.org. 2024. CommonCrawl Dataset. https://commoncrawl.org/ Accessed: 2024-02-06.
  20. Tri Dao. 2023. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691 (2023).
  21. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems 35 (2022), 16344–16359.
  22. SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression. arXiv preprint arXiv:2306.03078 (2023).
  23. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  24. Towards Next-Generation Intelligent Assistants Leveraging LLM Techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA) (KDD ’23). Association for Computing Machinery, New York, NY, USA, 5792–5793. https://doi.org/10.1145/3580305.3599572
  25. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  26. Ronen Eldan and Yuanzhi Li. 2023. TinyStories: How Small Can Language Models Be and Still Speak Coherent English? arXiv preprint arXiv:2305.07759 (2023).
  27. Adaptable butterfly accelerator for attention-based NNs via hardware and algorithm co-design. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 599–615.
  28. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research 23, 1 (2022), 5232–5270.
  29. Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In International Conference on Learning Representations.
  30. Elias Frantar and Dan Alistarh. 2023a. QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models. arXiv preprint arXiv:2310.16795 (2023).
  31. Elias Frantar and Dan Alistarh. 2023b. SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot. arXiv preprint arXiv:2301.00774 (2023).
  32. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323 (2022).
  33. A framework for few-shot language model evaluation. https://doi.org/10.5281/zenodo.5371628
  34. Georgi Gerganov. 2023. llama.cpp. https://github.com/ggerganov/llama.cpp
  35. Learning to forget: Continual prediction with LSTM. Neural computation 12, 10 (2000), 2451–2471.
  36. Google Inc. 2024. Gemma: Introducing new state-of-the-art open models. https://blog.google/technology/developers/gemma-open-models/ Accessed: Mar 2024.
  37. PoWER-BERT: Accelerating BERT inference via progressive word-vector elimination. In International Conference on Machine Learning. PMLR, 3690–3699.
  38. Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752 (2023).
  39. Knowledge Distillation of Large Language Models. arXiv preprint arXiv:2306.08543 (2023).
  40. Transkimmer: Transformer Learns to Layer-wise Skim. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 7275–7286. https://doi.org/10.18653/v1/2022.acl-long.502
  41. guinmoon. 2023. LLMFarm. https://github.com/guinmoon/LLMFarm
  42. MLX: Efficient and flexible machine learning on Apple silicon. https://github.com/ml-explore
  43. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  44. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 (2022).
  45. AI Benchmark: Running Deep Neural Networks on Android Smartphones. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops.
  46. Phi-2: The surprising power of small language models. https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/ Accessed: Mar 2024.
  47. Mistral 7B. arXiv preprint arXiv:2310.06825 (2023).
  48. Andrej Karpathy. 2023. llama2.c. https://github.com/karpathy/llama2.c Accessed: Dec 2023.
  49. Gyuwan Kim and Kyunghyun Cho. 2021. Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 6501–6511. https://doi.org/10.18653/v1/2021.acl-long.508
  50. SqueezeLLM: Dense-and-Sparse Quantization. arXiv preprint arXiv:2306.07629 (2023).
  51. OpenAssistant Conversations–Democratizing Large Language Model Alignment. arXiv preprint arXiv:2304.07327 (2023).
  52. Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs. arXiv preprint arXiv:2209.13443 (2022).
  53. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
  54. Efficient Memory Management for Large Language Model Serving with PagedAttention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles.
  55. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning. 1–6.
  56. SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (London, United Kingdom) (MobiCom ’20). Association for Computing Machinery, New York, NY, USA, Article 37, 15 pages. https://doi.org/10.1145/3372224.3419194
  57. The Future of Consumer Edge-AI Computing. arXiv preprint arXiv:2210.10514 (2022).
  58. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning. PMLR, 19274–19286.
  59. Mapping Natural Language Instructions to Mobile UI Action Sequences. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 8198–8210. https://doi.org/10.18653/v1/2020.acl-main.729
  60. libimobiledevice. 2024. ideviceinstaller. https://github.com/libimobiledevice/ideviceinstaller Accessed: Mar 2024.
  61. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv preprint arXiv:2306.00978 (2023).
  62. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958 (2021).
  63. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
  64. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. arXiv preprint arXiv:2305.17888 (2023).
  65. MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. arXiv preprint arXiv:2402.14905 (2024).
  66. llama.cpp Team. 2023. k-quants. https://github.com/ggerganov/llama.cpp/pull/1684 Accessed: March 2024.
  67. Calabash: Accelerating Attention Using a Systolic Array Chain on FPGAs. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 242–247.
  68. LLM-Pruner: On the Structural Pruning of Large Language Models. In Advances in Neural Information Processing Systems.
  69. A survey on mobile edge computing: The communication perspective. IEEE communications surveys & tutorials 19, 4 (2017), 2322–2358.
  70. Mark Sherwood. 2024. Large Language Models On-Device with MediaPipe and TensorFlow Lite. https://developers.googleblog.com/2024/03/running-large-language-models-on-device-with-mediapipe-andtensorflow-lite.html Accessed: March 2024.
  71. MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training. arXiv preprint arXiv:2403.09611 (2024).
  72. mit-han lab. 2023. TinyChatEngine. https://github.com/mit-han-lab/TinyChatEngine Accessed: Dec 2023.
  73. Monsoon Solutions Inc. 2023. Monsoon Solutions Inc. https://www.msoon.com Accessed: Dec 2023.
  74. Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035 (2023).
  75. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  76. The carbon footprint of machine learning training will plateau, then shrink. Computer 55, 7 (2022), 18–28.
  77. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350 (2021).
  78. Tianduo Wang Wei Lu Peiyuan Zhang, Guangtao Zeng. 2023. TinyLlama. https://github.com/jzhang38/TinyLlama
  79. RWKV: Reinventing RNNs for the Transformer Era. arXiv preprint arXiv:2305.13048 (2023).
  80. Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems 5 (2023).
  81. Qualcomm. 2023. The future of AI is hybrid. White Paper. Qualcomm.
  82. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  83. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492–28518.
  84. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  85. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  86. ZeRO: Memory Optimizations toward Training Trillion Parameter Models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC ’20). IEEE Press, Article 20, 16 pages.
  87. NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Yansong Feng and Els Lefever (Eds.). Association for Computational Linguistics, Singapore, 431–445. https://doi.org/10.18653/v1/2023.emnlp-demo.40
  88. Mlperf inference benchmark. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 446–459.
  89. WinoGrande: An Adversarial Winograd Schema Challenge at Scale. Commun. ACM 64, 9 (aug 2021), 99–106. https://doi.org/10.1145/3474381
  90. Are emergent abilities of large language models a mirage? Advances in Neural Information Processing Systems 36 (2024).
  91. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems 36 (2024).
  92. BLEURT: Learning Robust Metrics for Text Generation. In Proceedings of ACL.
  93. Q-bert: Hessian based ultra low precision quantization of bert. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8815–8821.
  94. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 1288, 23 pages.
  95. MLC team. 2023. MLC-LLM. https://github.com/mlc-ai/mlc-llm
  96. MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT. arXiv:2402.16840 [cs.CL]
  97. tinygrad. 2023. Tinygrad. https://github.com/tinygrad/tinygrad Accessed: Dec 2023.
  98. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  99. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  100. BatteryLab: A Collaborative Platform for Power Monitoring: https://batterylab. dev. In International Conference on Passive and Active Network Measurement. Springer, 97–121.
  101. Attention is all you need. Advances in neural information processing systems 30 (2017).
  102. Stylianos I Venieris and Christos-Savvas Bouganis. 2016. fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 40–47.
  103. Small Language Models Improve Giants by Rewriting Their Outputs. arXiv preprint arXiv:2305.13514 (2023).
  104. Efficient large language models: A survey. arXiv preprint arXiv:2312.03863 1 (2023).
  105. Enabling Conversational Interaction with Mobile UI Using Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 432, 17 pages. https://doi.org/10.1145/3544548.3580895
  106. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).
  107. Some like it hot: thermal feedback for mobile devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2555–2564. https://doi.org/10.1145/1978942.1979316
  108. Sustainable ai: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems 4 (2022), 795–813.
  109. Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870 (2023).
  110. Smoothquant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning. PMLR, 38087–38099.
  111. LLMCad: Fast and Scalable On-device Large Language Model Inference. arXiv preprint arXiv:2309.04255 (2023).
  112. Penetrative AI: Making LLMs Comprehend the Physical World. In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications (San Diego, CA, USA) (HOTMOBILE ’24). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3638550.3641130
  113. A First Look at Deep Learning Apps on Smartphones. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 2125–2136. https://doi.org/10.1145/3308558.3313591
  114. Tensorgpt: Efficient compression of the embedding layer in llms based on the tensor-train decomposition. arXiv preprint arXiv:2307.00526 (2023).
  115. A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv:2401.08092 (2024).
  116. yepkit.com. 2023. YKUSH USB Controller. https://www.yepkit.com/products/ykush Accessed: Dec 2023.
  117. EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models. arXiv preprint arXiv:2308.14352 (2023).
  118. Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 119–139. https://www.usenix.org/conference/nsdi23/presentation/you
  119. Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830 (2019).
  120. Lima: Less is more for alignment. Advances in Neural Information Processing Systems 36 (2024).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Stefanos Laskaridis (20 papers)
  2. Lorenzo Minto (2 papers)
  3. Hamed Haddadi (131 papers)
  4. Kleomenis Katevas (20 papers)
Citations (10)
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com