Proximal Policy Optimisation-driven Stock Trading Using Simple Moving Average Crossover and Higher High-Higher Low Stock Price Structure
Abstract
Due to the fluctuations and unpredictability of financial markets, a rule-based trading approach is often ineffective in an environment where conditions are constantly changing. This project is an automated stock trading system that implements the proximal policy optimization (PPO) reinforcement learning approach and signals a simple moving average (SMA) crossover to identify the trend direction and use higher high, higher low (HH, HL) stock price structure to confirm trends with greater certainty. An SMA crossover also identifies those trends that may reverse; meanwhile, an HH-HL structure confirms a bullish price continuation by detecting market microstructures and maintaining the momentum of the market microstructure. The PPO agent is trained on a customised trading environment that models trade execution realities (transaction cost model, position limits, and risk exposure). Consequently, market states are represented as normalised price trends and crossover dynamics and are reflective of structural price action features. The reward function of the agent is designed to maximise risk-adjusted returns while minimising drawdowns and overtrading. The experiment performed by the researcher based on a large sample of historical equity data (multi-year) showed that the proposed method yielded higher cumulative returns, an increased Sharpe ratio, and drawdown reduction compared to conventional SMA strategies and baseline reinforcement learning models. Moreover, the experimental data imply that the use of policy, gradient optimisation techniques combined with trend and structural awareness features, can lead to increased trading stability and adaptability to different market regimes.
References
S. Adhikary and A. Kadia, “Algorithmic trading with a combination of advanced technical indicators An automation,” International Journal on Science and Technology (IJSAT), vol. 16, no. 3, pp. 1–20, 2025.
R. Kumar, A. Kadia, S. Kumar, and A. Sharma, “Capture market trends through multi-indicator confirmations using reinforcement learning models,” International Journal on Science and Technology (IJSAT), vol. 17, no. 1, pp. 1–15, 2026.
A. Kadia, “Machine learning–based stock trading strategies using simple moving average with average traded volume crossover confirmation,” in Proceedings of the 2025 IEEE Silchar Subsection Conference (SILCON), Silchar, India, pp. 1–5, 2025.
D.-W. Jeong and Y. H. Gu, “Pro Trader RL: Reinforcement learning framework for generating trading knowledge by mimicking the decision-making patterns of professional traders,” Expert Systems with Applications, vol. 254, p. 124465, 2024.
J. Chung, M. Kim, S. Min, H. Choi, S. Park, and J. Kim, “Correlation-assisted spatio-temporal reinforcement learning for stock revenue maximization,” Expert Systems with Applications, vol. 289, p. 128361, 2025.
D. Ma and D. Yuan, “Enhanced stock price forecasting through a regularized ensemble framework with graph convolutional networks,” Expert Systems with Applications, vol. 250, p. 123948, 2024.
X. Wang and L. Liu, “Risk-sensitive deep reinforcement learning for portfolio optimization,” Journal of Risk and Financial Management, vol. 18, no. 7, p. 347, 2025.
A. Jha, S. Maheshwari, P. Dutta, and U. Dubey, “Optimizing financial modeling with machine learning: Integrating particle swarm optimization for enhanced predictive analytics,” Journal of Business Analytics, vol. 8, no. 3, pp. 196–215, 2025.
A. J. M. Casares, “Enhancing algorithmic trading with wavelet-based deep reinforcement learning: A multi-indicator approach,” Neural Computing and Applications, vol. 37, pp. 25339–25385, 2025.
R. Pandit et al., “Stock market intraday trading using reinforcement learning,” in Multi-disciplinary Trends in Artificial Intelligence, Lecture Notes in Computer Science, 2023, pp. 380–389.
F. C. L. Paiva et al., “Intelligent trading systems: A sentiment-aware reinforcement learning approach,” in Proceedings of the Second ACM International Conference on AI in Finance (ICAIF ’21), New York, NY, USA, 2021, Article no. 40, pp. 1–9.
J. Karaila, K. Baltakys, H. Hansen, A. Goel, and J. Kanniainen, “Network analysis of aggregated money flows in stock markets,” Quantitative Finance, vol. 24, no. 10, pp. 1423–1443, 2024.
J. B. Chakole, M. S. Kolhe, G. D. Mahapurush, A. Yadav, and M. P. Kurhekar, “A Q-learning agent for automated trading in equity stock markets,” Expert Systems with Applications, vol. 163, p. 113761, Jan. 2021.
N. Tyagi, “Deep reinforcement learning for algorithmic trading strategies,” International Journal of Research and Applied Innovations (IJRAI), vol. 7, no. 2, Mar.–Apr. 2024.
R. E. Prasetyo, S. Sumanto, I. Chaidir, and A. Supriyatna, “Reinforcement learning for bitcoin trading: A comparative study of PPO and DQN,” Journal Mandiri IT, vol. 14, no. 2, pp. 159–169, Oct. 2025.
Yahoo Finance, “Reliance Industries Limited (RELIANCE.NS) historical data,” Yahoo Finance, 2026.