Developing Cooperative Behavior in Iterated Prisoner’s Dilemma through Q-learning

Authors

  • Sudip Bhattacharya
  • Amitesh Patra
  • Kushal Punem

Keywords:

Game theory, Gamma diminishing, Prisoner’s dilemma, Q-learning, Reinforcement learning

Abstract

The Prisoner’s Dilemma is a classic game in the field of game theory, which serves as a model for analyzing strategic decision-making in social interactions. In this game, two players must choose between cooperating and defecting, impacting the rewards they receive. Numerous strategies have been developed that can either take actions to their benefit or defend themselves against the opponent but due to the static decision-making nature of these strategies, they could perform well against only a set of strategies. Thus, there remains a need for adaptable approaches capable of learning and adjusting to changing circumstances. This research aims to develop an intelligent agent using reinforcement learning, specifically Q-learning, to effectively engage in Prisoner’s Dilemma with diverse opponents. By modelling states as combinations of player actions and utilizing the Q-table, the agent refines its actions over time using the Bellman equation. Our agent demonstrated varying average cooperation rates, 10.68% against always defect, 20.26%, against always cooperate, and a significantly higher 95.1% against Tit-for-tat highlighting the agent’s adaptive nature and proficiency in responding to different opponent strategies. These findings underscore the potential of reinforcement learning in optimizing decision-making in complex real-world scenarios, especially those involving strategic interactions and social dilemmas.

Published

2025-06-02

How to Cite

Bhattacharya, S., Patra, A., & Punem, K. (2025). Developing Cooperative Behavior in Iterated Prisoner’s Dilemma through Q-learning. Journal of Data Engineering and Knowledge Discovery, 2(2), 12–20. Retrieved from https://matjournals.net/engineering/index.php/JoDEKD/article/view/1972