Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model (2506.09061v2)

Published 6 Jun 2025 in cs.DC, cs.AI, and cs.PF

Abstract: This paper introduces EdgeProfiler, a fast profiling framework designed for evaluating lightweight LLMs on edge systems. While LLMs offer remarkable capabilities in natural language understanding and generation, their high computational, memory, and power requirements often confine them to cloud environments. EdgeProfiler addresses these challenges by providing a systematic methodology for assessing LLM performance in resource-constrained edge settings. The framework profiles compact LLMs, including TinyLLaMA, Gemma3.1B, Llama3.2-1B, and DeepSeek-r1-1.5B, using aggressive quantization techniques and strict memory constraints. Analytical modeling is used to estimate latency, FLOPs, and energy consumption. The profiling reveals that 4-bit quantization reduces model memory usage by approximately 60-70%, while maintaining accuracy within 2-5% of full-precision baselines. Inference speeds are observed to improve by 2-3x compared to FP16 baselines across various edge devices. Power modeling estimates a 35-50% reduction in energy consumption for INT4 configurations, enabling practical deployment on hardware such as Raspberry Pi 4/5 and Jetson Orin Nano Super. Our findings emphasize the importance of efficient profiling tailored to lightweight LLMs in edge environments, balancing accuracy, energy efficiency, and computational feasibility.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Alyssa Pinnock (1 paper)
  2. Shakya Jayakody (1 paper)
  3. Kawsher A Roxy (1 paper)
  4. Md Rubel Ahmed (9 papers)

Summary

We haven't generated a summary for this paper yet.