Exploring Reinforcement Learning for Optimizing Generative AI Systems
DOI:
https://doi.org/10.46610/JoANNLS.2025.v02i01.003Keywords:
Deep Q-Networks (DQN), Generative AI, GANs (Generative Adversarial Networks), Proximal Policy Optimization (PPO), Reinforcement Learning (RL), VAEs (Variational Auto encoders)Abstract
This paper explores the optimization of generative AI models using Reinforcement Learning (RL) strategies. Generative AI, such as GANs and VAEs, has seen remarkable progress, but challenges like unstable training, mode collapse, and limited diversity persist. RL offers a promising solution by introducing structured decision-making and long-term reward optimization into the generative process. We propose a framework where RL algorithms, such as Proximal Policy Optimization (PPO) and Deep Q-Networks (DQN), guide the training of generative models by optimizing a reward function tailored to improve output quality and diversity. Experimental results on standard datasets demonstrate that RL- enhanced generative models outperform traditional models in terms of stability, convergence speed, and sample diversity. We also discuss challenges like reward design and computational complexity, proposing directions for future research, such as hybrid approaches and multi- agent RL for collaborative generation. This work highlights the potential of RL to push generative AI beyond current limitations.
References
Y. Cao, Q. Z. Sheng, J. McAuley, and L. Yao, “Reinforcement Learning for Generative AI: A Survey,” arXiv.org, Aug. 28, 2023. https://arxiv.org/abs/2308.14328
G. Franceschelli and M. Musolesi, “Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges,” Journal of Artificial Intelligence Research, vol. 79, pp. 417–446, Feb. 2024, doi: https://doi.org/10.1613/jair.1.15278.
G. Sun, Z. Xie, D. Niyato, F. Mei, J. Kang, “Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases,” IEEE Wireless Communications, pp. 1–10, 2025, doi: https://doi.org/10.1109/mwc.001.2400176.
S. Li, X. Lin, Y. Liu, G. Li, and J. Li, “Optical: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization,” Proceedings of the 1st Sigcomm Workshop on Hot Topics in Optical Technologies and Applications in Networking, pp. 1–6, Aug. 2024, doi: https://doi.org/10.1145/3672201.3674119.
L. Yuan,W. Li, H. Chen, G. CuiN. Ding, “Free Process Rewards without Process Labels,” arXiv.org, 2024. https://arxiv.org/abs/2412.01981 (accessed Mar. 12, 2025).
K. Cobbe ,V. Kosaraju, M. Bavarian, M. Chen, H. Jun, “Training Verifiers to Solve Math Word Problems,” arXiv:2110.14168 [cs], Nov. 2021, Available: https://arxiv.org/abs/2110.14168
G. Chen, M. Liao, C. Li, and K. Fan, “Alpha Math Almost Zero: Process Supervision without Process,” arXiv.org, 2024. https://arxiv.org/abs/2405.03553 (accessed Mar. 12, 2025).
P. Wang,L. Li, X. Shao, X. R. X, D. Dai, “Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations,” arXiv.org, 2023. https://arxiv.org/abs/2312.08935 (accessed Mar. 12, 2025).
D. Guo, D. Yang, H. Zhang, J. Song, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” arXiv.org, 2025. https://arxiv.org/abs/2501.12948
F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune, “Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning,” arXiv:1712.06567 [cs], Apr. 2018, Available: https://arxiv.org/abs/1712.06567