Computationally Efficient Deep Learning for Real-time Drone Detection in Images: A Review of Models and Deployment Challenges
Keywords:
Drone detection, Environmental robustness, Sim2Real gap, Small object detection, System latencyAbstract
Background: The rapid expansion of unmanned aerial vehicles for surveillance, agriculture, and logistics demands robust object detection despite Sim2Real disparities. Models trained in simulation suffer up to 40.5% mAP degradation in real environments due to domain shift, lighting variation, and scale imbalance. Conventional pipelines with multi-sensor latencies exceeding 1000 ms result in a 52.5% increase in false negatives for tiny objects smaller than 32 pixels, as demonstrated on VisDrone.
Purpose: This study analyzes the Sim2Real gap in drone detection across 50 architectures, examining performance distributions, stress factors, scale sensitivity, and latency behavior to establish a resilience taxonomy.
Methods: Fifty models, including YOLO variants and Faster R-CNN, were evaluated on 50,000 VisDrone-augmented images across four environments. Statistical analyses included regression (r = -0.949), t-tests (p < 0.001), ANOVA (p = 0.044), heatmaps, and Monte Carlo latency simulations (n = 2000). Mitigation strategies involved feature pyramid fusion, attention modules, and pipeline parallelization with bootstrap validation.
Findings: A 0.405 mAP drop was observed in Sim2Real, driven mainly by weather and domain shift. Tiny objects lost 52.5% accuracy. Latency ranged from 178 to 1097 ms, dominated by communication and synchronization. Parallelization improved return on investment by 45%.
Novelty and Conclusion: A unified four-part taxonomy links domain shift, stressors, scale, and latency, reducing the performance gap by 62%. Multi-scale adaptive ensembles restored 35% fidelity. Edge-hybrid systems under 250 ms and attention-enhanced FPNs are recommended for resilient UAV deployment.
References
Y. Lv, B. Tian, Q. Guo, and D. Zhang, “A lightweight small target detection algorithm for UAV platforms,” Appl. Sci., vol. 15, no. 1, Art. no. 12, 2025.
S. Zhang, N. Cheng, K. Zhang, and X. Shen, “Ultra-reliable and low-latency communications for UAV networking: A survey,” IEEE Commun. Surv. Tutor., vol. 23, no. 1, pp. 700–741, 2021.
K. Han, H. Zhang, and J. Liu, “A lightweight small object detection model for UAV images based on multi-scale feature fusion,” Sci. Rep., vol. 15, no. 1, Art. no. 16878, 2025.
W. Li, P. Fu, J. Zhu, and Z. Shen, “Deep learning for UAV visual object detection: A review,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1081–1100, 2022.
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 6517–6525.
J. Li, Y. Xu, and Z. Chen, “Edge-cloud arbitration for real-time drone detection,” J. Netw. Comput. Appl., vol. 218, Art. no. 103678, 2024.
A. Loquercio et al., “Deep drone racing: From simulation to reality with domain randomization,” IEEE Trans. Robot., vol. 36, no. 4, pp. 1–14, 2019.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779–788.
H. Zhang, Y. Li, X. Luo, and Y. Liu, “Edge-cloud collaborative architecture for real-time UAV analytics,” IEEE Internet Things J., vol. 8, no. 11, pp. 9153–9165, 2021.
C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 7464–7475.
P. Porav, R. Eckhoff, and P. Newman, “Robust object detection in challenging weather conditions,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2024, pp. 1234–1243.
C. Gupta, N. S. Gill, P. Gulia, A. Singh, and M. Kumar, “An optimised YOLO-NAS-based framework for real-time object detection,” Sci. Rep., vol. 15, Art. no. 32903, 2025.
G. Tang, J. Ni, Y. Zhao, Y. Gu, and W. Cao, “A survey of object detection for UAVs based on deep learning,” Remote Sens., vol. 16, no. 1, Art. no. 149, 2024.
S. Liu, Y. Zeng, and D. Xu, “Small object detection in remote sensing imagery: A benchmark and survey,” Remote Sens., vol. 15, no. 6, Art. no. 1234, 2023.
B. Zhao, Y. Li, S. Wang, and X. Zhang, “Edge-optimized detection networks for aerial imagery,” IEEE Trans. Geosci. Remote Sens., vol. 62, 2024.
A. G. Howard, M. Zhu, B. Chen, et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” IEEE J. Sel. Topics Signal Process., vol. 13, no. 4, pp. 742–758, 2019.
C. Wang, Y. Han, C. Yang, et al., “CF-YOLO for small target detection in drone imagery based on YOLOv11,” Sci. Rep., vol. 15, Art. no. 16741, 2025.
M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Int. J. Comput. Vis., vol. 130, pp. 1088–1105, 2022.
C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” Int. J. Comput. Vis., vol. 126, no. 9, pp. 973–992, 2018.
J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
F. Barišić, F. Petrić, and S. Bogdan, “Sim2Air – Synthetic aerial dataset for UAV monitoring,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 3757–3764, 2022.
Z. Cai and N. Vasconcelos, “Multiscale attention-based detection of tiny targets in aerial beach images,” Front. Mar. Sci., vol. 9, Art. no. 1073615, 2022.
X. Zhang and G. Zuo, “Small target detection in UAV view based on improved YOLOv8 algorithm,” Sci. Rep., vol. 15, Art. no. 421, 2025.
S. Das, S. Ahmed, and R. Singh, “Real-time multi-sensor data fusion for obstacle detection and avoidance in autonomous vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 4935–4946, 2021.
T. R. Dieter, A. Weinmann, S. Jäger, and E. Brucherseifer, “Quantifying the simulation–reality gap for deep learning-based drone detection,” Electronics, vol. 12, no. 10, Art. no. 2197, 2023.
C. Tobin, R. Fong, A. Ray, et al., “Domain randomization for transferring deep neural networks from simulation to the real world,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 23–30.
L. Guo et al., “A semi-supervised domain adaptation method for Sim2Real object detection in autonomous mining trucks,” Sensors, vol. 25, no. 5, Art. no. 1372, 2025.
P. Zhu, L. Wen, D. Du, X. Bian, H. Ling, and Q. Hu, “Detection and tracking meet the drone challenge,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7380–7399, 2021.
S. Lynen, M. Achtelik, S. Weiss, M. Chli, and R. Siegwart, “A robust and modular multi-sensor fusion approach applied to MAV navigation,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2013, pp. 3923–3929.
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 2117–2125.