Developing Energy-Efficient Machine Learning Algorithms for Edge Devices
Keywords:
Edge computing , Energy-efficient machine learning, Federated learning, Hardware-aware optimization, Knowledge distillation, Low-power AI, Model compression, Neural architecture search, Quantization, Real-time inferenceAbstract
The proliferation of edge devices, such as IoT sensors, smartphones, and embedded systems, necessitates the development of energy-efficient Machine Learning (ML) algorithms. Traditional ML models are often computationally intensive and require significant power, making them unsuitable for resource-constrained environments. This paper evaluates and integrates existing techniques for developing energy-efficient ML algorithms tailored for edge computing. Key methods examined include model compression, quantization, knowledge distillation, and hardware-aware optimizations. The paper further analyzes the trade-offs between accuracy and energy consumption, implementation challenges, and future research directions. Through experimental analysis and real-world case studies, we empirically validate the effectiveness of these strategies, demonstrating significant improvements in energy efficiency without compromising model performance.
References
S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both Weights and Connections for Efficient Neural Networks,” Arxiv (Cornell University), Jun. 2015, doi: https://doi.org/10.48550/arxiv.1506.02626.
G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” Arxiv.org, Mar. 09, 2015. http://arxiv.org/abs/1503.02531.
S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” Neural and Evolutionary Computing, Oct. 2015, doi: https://doi.org/10.48550/arxiv.1510.00149.
R. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation,” Arxiv.org, 2025. https://arxiv.org/abs/1404.0736.
B. Jacob, S. Kligys, BO. Chen, M. Zhu, “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, doi: https://doi.org/10.1109/cvpr.2018.00286
R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper,” Arxiv.org, Jun. 21, 2018. https://arxiv.org/abs/1806.08342v1.
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Arxiv.org, Sep. 11, 2020. http://arxiv.org/abs/1905.11946.
V. J. Reddi ,C. Cheng, D. Kanter, P. Mattson, “MLPerf Inference Benchmark,” IEEE Xplore, May 01, 2020. https://ieeexplore.ieee.org/document/9138989.
J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized Convolutional Neural Networks for Mobile Devices,” IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, doi: https://doi.org/10.1109/cvpr.2016.521.
S. Dhar, J. Guo, J. Liu, S. Tripathi, U. Kurup, and M. Shah, “On-Device Machine Learning: An Algorithms and Learning Theory Perspective,” Arxiv.org, Jul. 24, 2020. https://arxiv.org/abs/1911.00623.
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” Proceedings.Mlr.Press, Apr. 10, 2017. https://proceedings.mlr.press/v54/mcmahan17a.html.
H. J. Damsgaard , A. Grenier, D. Katare, “Adaptive approximate computing in edge AI and IoT applications: A review,” Journal of Systems Architecture, vol. 150, pp. 103114–103114, May 2024, doi: https://doi.org/10.1016/j.sysarc.2024.103114.
C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, “A Survey of Neuromorphic Computing and Neural Networks in Hardware,” Arxiv.org, May 19, 2017. https://arxiv.org/abs/1705.06963.