Edge AI and TinyML: Powering the Next Generation of IoT
Keywords:
Edge AI, Embedded intelligence, Federated learning, Hardware acceleration, Internet of Things, Model compression, TinyMLAbstract
The rapid expansion of the internet of things (IoT) has led to an unprecedented increase in data generation at the network edge, exposing critical limitations of traditional cloud-centric computing models. High latency, excessive bandwidth usage, energy inefficiency, and growing privacy concerns make centralized processing unsuitable for many real-time and safety-critical IoT applications. To address these challenges, intelligence is increasingly being shifted closer to data sources through Edge artificial intelligence (Edge AI) and Tiny machine learning (TinyML). This study presents a comprehensive review of these emerging paradigms and their role in enabling scalable, efficient, and privacy-preserving intelligent IoT systems. Edge AI enables real-time inference directly on edge devices, reducing dependence on remote servers and allowing faster decision-making. TinyML extends this concept further by enabling machine learning models to run on highly resource-constrained hardware such as microcontrollers and sensors, often operating with kilobytes of memory and milliwatt-level power budgets. The study discusses key model optimization techniques—including quantization, pruning, and knowledge distillation—that make it feasible to deploy deep learning models on such constrained platforms with minimal accuracy degradation. In addition, the study examines the importance of hardware-software co-design in Edge AI systems. Specialized hardware accelerators, neural processing units, and optimized system-on-chip architectures are reviewed alongside lightweight software frameworks such as TensorFlow Lite for Microcontrollers, STM32Cube.AI, and Edge Impulse. Benchmark analyses are used to highlight trade-offs between inference latency, memory footprint, and energy consumption across different deployment platforms.
References
S. Teerapittayanon, B. McDanel, and H. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” Proc. IEEE ICDCS, 2017. doi: https://doi.org/10.1109/ICDCS.2017.226
A. P, R. M, S. M.C., and R.K., “Edge AI for Real-Time Intelligence on Devices,” in Artificial Intelligence: Concepts, Challenges and Emerging Applications, Pencil Bitz Publishing, pp. 1–147, Dec. 2025, Available: https://pencilbitz.com/Artificial-Intelligence-Concepts-Challenges-and-Emerging-Applications-Edited-book.php
G. Premsankar, M. Di Francesco and T. Taleb, "Edge Computing for the Internet of Things: A Case Study," in IEEE Internet of Things Journal, vol. 5, no. 2, pp. 1275-1284, April 2018, doi: https://doi.org/10.1109/JIOT.2018.2805263
Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo and J. Zhang, "Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing," in Proceedings of the IEEE, vol. 107, no. 8, pp. 1738-1762, Aug. 2019. doi: https://doi.org/10.1109/JPROC.2019.2918951
P. Warden and D. Situnayake, “TinyML [Book],” www.oreilly.com, Dec. 2019. Available: https://www.oreilly.com/library/view/tinyml/9781492052036/
N. N. Alajlan and D. M. Ibrahim, “TinyML: Enabling of Inference Deep Learning Models on Ultra-Low-Power IoT Edge Devices for AI Applications,” Micromachines, vol. 13, no. 6, p. 851, May 2022, doi: https://doi.org/10.3390/mi13060851
C. Surianarayanan et al., “A survey on optimization techniques for edge artificial intelligence,” Sensors, vol. 23, no. 3, p. 1279, 2023. doi: https://doi.org/10.3390/s23031279
S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks,” ICLR, 2016. Available: https://arxiv.org/abs/1510.00149
G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv.org, 2026. http://arxiv.org/abs/1503.02531. Available: https://arxiv.org/abs/1503.02531
T. Chen et al., “TVM: An automated end-to-end optimizing compiler for deep learning,” Proc. OSDI, 2018. Available: https://arxiv.org/abs/1802.04799
NA, “LiteRT for Microcontrollers,” Google AI for Developers, 2024. https://ai.google.dev/edge/litert/microcontrollers/overview
STMicroelectronics, “X-CUBE-AI: AI expansion package for STM32,” 2024. Available: https://www.st.com/en/embedded-software/x-cube-ai.html
L. Heim, A. Biri, Z. Qu, and L. Thiele, “Measuring what really matters: Optimizing neural networks for TinyML,” arXiv.org, Apr. 21, 2021. https://arxiv.org/abs/2104.10645
C. Banbury et al., “Benchmarking TinyML systems: Challenges and direction,” Proc. MLSys, 2021. Available: https://arxiv.org/abs/2003.04821
Y. LeCun, J. Denker, and S. Solla, “Optimal brain damage,” Advances in Neural Information Processing Systems, 1990. Available: https://www.researchgate.net/publication/221618539
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” arXiv.org, Sep. 11, 2020. http://arxiv.org/abs/1905.11946
A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv.org, Apr. 17, 2017. https://arxiv.org/abs/1704.04861
J. Konecny et al., “Federated learning: Strategies for improving communication efficiency,” NIPS Workshop, 2016. Available: https://arxiv.org/abs/1610.05492
Q. Yang et al., “Federated machine learning: Concept and applications,” ACM TIST, vol. 10, no. 2, 2019. doi: https://doi.org/10.1145/3298981
L. Antonio, I. Fernanda, and V. Katherine, “Advancing TinyML in IoT: A Holistic System-Level Perspective for Resource-Constrained AI,” Future Internet, vol. 17, no. 6, pp. 257–257, Jun. 2025, doi: https://doi.org/10.3390/fi17060257
A. Yousefpour et al., “All one needs to know about fog computing and related edge computing paradigms: A complete survey,” Journal of Systems Architecture, vol. 98, pp. 289–330, Feb. 2019, doi: https://doi.org/10.1016/j.sysarc.2019.02.009
S. Deng, H. Zhao, W. Fang, J. Yin, S. Dustdar and A. Y. Zomaya, "Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence," in IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7457-7469, Aug. 2020, doi: https://doi.org/10.1109/JIOT.2020.2984887
K. Xu et al., “An ultra-low power TinyML system for real-time visual processing at the edge,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 70, no. 7, pp. 2640–2644, 2023. doi: https://doi.org/10.1109/TCSII.2023.3239044
N. D. Lane et al., “DeepX: A software accelerator for low-power deep learning inference on mobile devices,” 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), Vienna, Austria, 2016, pp. 1-12, doi: https://doi.org/10.1109/IPSN.2016.7460664
Z. C. EL, W. Hamidouche, G. Herrou, and D. Menard, “Designing Object Detection Models for TinyML: Foundations, Comparative Analysis, Challenges, and Emerging Solutions,” arXiv (Cornell University), Aug. 2025, doi: https://doi.org/10.48550/arxiv.2508.08352