Machine Learning-Enhanced Multi-Agent Reinforcement Learning for Adaptive Collaboration in Autonomous Ad Hoc Systems

Authors

  • Leena Bagul Postgraduate Student, Department of Computer Science and Engineering, Sandip University, Nashik, Maharashtra, India
  • Mahima Chaudhari Postgraduate Student, Department of Computer Science and Engineering, Sandip University, Nashik, Maharashtra, India
  • Bhavya Mishra Postgraduate Student, Department of Computer Science and Engineering, Sandip University, Nashik, Maharashtra, India
  • Anushka Mahind Postgraduate Student, Department of Computer Science and Engineering, Sandip University, Nashik, Maharashtra, India
  • Rajashri Rikame Undergraduate Student, Department of Computer Science and Engineering, Sandip University, Nashik, Maharashtra, India
  • Mritunjay Kr. Ranjan Assistant Professor, Department of Computer Science and Engineering, Sandip University, Nashik, Maharashtra, India

DOI:

https://doi.org/10.46610/JAHNMC.2025.v02i03.003

Keywords:

Adaptive collaboration, Autonomous ad hoc systems, Graph Neural Networks (GNNs), Machine learning, Random Forests (RFs)

Abstract

Ad hoc systems that are autonomous need the adaptation of multi-agent cooperation within dynamic and decentralized settings. Conventional multi-agent reinforcement learning (MARL) algorithms, typically lack scalability, role assigning, and efficient coordination as a result of the heterogeneous interactions. The paper will suggest a multi-agent reinforcement learning enhanced by machine learning (ML-MARL) framework proposal that will combine four different and complementary machine learning algorithms (Graph Neural Networks (GNNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Random Forests (RFs)) to enhance adaptive collaboration within these systems. The GNNs are utilized to learn about inter-agent communication dynamics and connectivity, and effective sharing of knowledge within the network. SVMs are used to differentiate between cooperative and non-cooperative behaviour, which builds trust and resilience to adversarial disruption. DTs give interpretable rule-based action advice, which are lightweight policy initializers to agents. RFs improve prediction accuracy through environments uncertainties modelling and critical features that affect collaboration recognition. All these elements enhance MARL by making states more representative, making decisions, and predicting rewards. The framework proposed illustrates that machine learning will provide more adaptability, scalability, and robustness to multi-agent cooperation in autonomous ad hoc systems. This offers novel avenues in the decentralization of coordination in areas like swarm robotics, disaster management and future communication networks.

References

M. Phogat and D. Kumar, “Feature Selection Techniques for Genomic Data,” 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), pp. 785–790, May 2022, doi: https://doi.org/10.1109/com-it-con54601.2022.9850466

T. Hu, L. Yu, S. Li, J. Li, and Y. Zhou, “Network analysis of proactive health behaviors, parent-adolescent communication, and depressive symptoms among adolescents in China,” Journal of Affective Disorders, vol. 392, p. 120173, Jan. 2026, doi: https://doi.org/10.1016/j.jad.2025.120173

W. Wang, H. Wang, and A. J. Sobey, “Collaborating in a competitive world: Heterogeneous Multi-Agent Decision Making in Symbiotic Supply Chain Environments,” arXiv.org, 2025. https://arxiv.org/abs/2501.14111

P. Zhang, K. Lin, D. Li, F. Fu, “DAPlanner: Dual-agent framework with multi-modal large language model for autonomous driving motion planning,” Applied Soft Computing, p. 113625, Jul. 2025, doi: https://doi.org/10.1016/j.asoc.2025.113625

Y. Jing, B. Guo, N. Li, R. Xu, and Z. Yu, “Federated multi-agent reinforcement learning: A comprehensive survey of methods, applications and challenges,” Expert Systems with Applications, vol. 293, p. 128729, Dec. 2025, doi: https://doi.org/10.1016/j.eswa.2025.128729

K. Bhatta and Q. Chang, “Train small, deploy large: Scaling multi-agent reinforcement learning for multi-stage manufacturing lines,” Journal of Manufacturing Systems, vol. 81, pp. 155–168, May 2025, doi: https://doi.org/10.1016/j.jmsy.2025.04.017

P. Liang, Y. Xun, J. Cai, and H. Yang, “Autoscaling of microservice resources based on dense connectivity spatio–temporal GNN and Q-learning,” Future Generation Computer Systems, pp. 107909–107909, May 2025, doi: https://doi.org/10.1016/j.future.2025.107909

A. Goeckner, Y. Sui, N. Martinet, X. Li, and Q. Zhu, “Graph Neural Network-based Multi-agent Reinforcement Learning for Resilient Distributed Coordination of Multi-Robot Systems,” In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5732–5739, Oct. 2024, doi: https://doi.org/10.1109/iros58592.2024.10802510

D. Vos and S. Verwer, “Optimizing Interpretable Decision Tree Policies for Reinforcement Learning,” arXiv.org, 2024. https://arxiv.org/abs/2408.11632

L. D. Raedt and P. Flach, Machine Learning: ECML 2001: 12th European Conference on Machine Learning, Freiburg, Germany, September 5-7, 2001. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001.

A. Ferigo, L. L. Custode, and G. Iacca, “Quality–diversity optimization of decision trees for interpretable reinforcement learning,” Neural Computing and Applications, Nov. 2023, doi: https://doi.org/10.1007/s00521-023-09124-5

W. Jin, X. Tian, N. Wang, “Representation-Driven Sampling and Adaptive Policy Resetting for Improving Multi-Agent Reinforcement Learning,” Neural Networks, pp. 107875–107875, Jul. 2025, doi: https://doi.org/10.1016/j.neunet.2025.107875

F. Martinez-Gil, M. Lozano, and F. Fernández, “MARL-Ped: A multi-agent reinforcement learning based framework to simulate pedestrian groups,” Simulation Modelling Practice and Theory, vol. 47, pp. 259–275, Sep. 2014, doi: https://doi.org/10.1016/j.simpat.2014.06.005

F. Gao, Y. Cai, H. Yao, S. Li, Q. Gao, and L. Yin, “Factorising value function with hierarchical residual Q-network in multi-agent reinforcement learning,” Neurocomputing, pp. 131340–131340, Aug. 2025, doi: https://doi.org/10.1016/j.neucom.2025.131340

A. Bar-Noy, T. Erlebach, M. M. Halldórsson, and S. Nikoletseas, “Editorial for Algorithms for Sensor Systems, Wireless Ad Hoc Networks and Autonomous Mobile Entities,” Theoretical Computer Science, vol. 553, pp. 1–1, Sep. 2014, doi: https://doi.org/10.1016/j.tcs.2014.09.004

S. I. Ahsan, P. Legg, and S. M. I. Alam, “An explainable ensemble-based intrusion detection system for software-defined vehicle ad-hoc networks,” Cyber Security and Applications, vol. 3, p. 100090, Dec. 2025, doi: https://doi.org/10.1016/j.csa.2025.100090

J. Liu, L. Yang, N. Kumar, A. M. Almuhaideb, K. I. Kostromitin, and P. Zhang, “Trajectory prediction training scheme in vehicular ad-hoc networks based on federated learning,” Ad Hoc Networks, vol. 178, p. 103917, May 2025, doi: https://doi.org/10.1016/j.adhoc.2025.103917

Y. Wu, M. Cong, Q. Lu, Z. Zhou, J. Liu, and D. Yang, “Verified PEMFC heavy-duty long-haul truck vehicle model with thermal management limitations of conventional cooling systems,” Applied Thermal Engineering, vol. 280, pp. 128025–128025, Sep. 2025, doi: https://doi.org/10.1016/j.applthermaleng.2025.128025

S. Ju, S. Han, T. Cho, J. Lee, “Learning graph-based individual intrinsic reward for multi-agent reinforcement learning,” ICT Express, Aug. 2025, doi: https://doi.org/10.1016/j.icte.2025.07.010

W. Yang and C. Liang, “A large-scale consensus decision-making model for non-cooperative behaviour based on incomplete probabilistic hesitant fuzzy information in social trust networks,” Information Sciences, pp. 122196–122196, Apr. 2025, doi: https://doi.org/10.1016/j.ins.2025.122196

F. Li, X. Li, S. Wen, H. Huang, and J. Bao, “SAMAC-R3-MED: Semantic alignment and multi-agent collaboration of retriever-reranker-responder models for multimodal engineering documents,” Computers in Industry, vol. 171, pp. 104336–104336, Jul. 2025, doi: https://doi.org/10.1016/j.compind.2025.104336

L. Kerbel, B. Ayalew, and A. Ivanco, “Dynamic Ad Hoc Teaming and Mutual Distillation for Cooperative Learning of Powertrain Control Policies for Vehicle Fleets,” SSRN Electronic Journal, Jan. 2025, doi: https://doi.org/10.2139/ssrn.5200242

J. Li, Y. Yang, Z. He, H. Wu, H. Shi, and W. Chen, “Cournot Policy Model: Rethinking centralised training in multi-agent reinforcement learning,” Information Sciences, vol. 677, pp. 120983–120983, Jun. 2024, doi: https://doi.org/10.1016/j.ins.2024.120983

S. Zhao, Y. Wei, Y. Li, and Y. Cheng, “A Multi-Agent Reinforcement Learning (MARL) Framework for Designing an Optimal State-Specific Hybrid Maintenance Policy for a Series k-out-of-n Load-Sharing System,” Reliability Engineering & System Safety, pp. 111587–111587, Aug. 2025, doi: https://doi.org/10.1016/j.ress.2025.111587

Published

2025-11-24