Governance and Evaluation Framework for Agentic AI Systems in Enterprise Operations

Sandeep Mahajan

Authors

Sandeep Mahajan

Abstract

The increasing deployment of agentic artificial intelligence (AI) systems, autonomous agents capable of perceiving, reasoning, and acting independently, within enterprise operations, introduces governance and evaluation challenges that existing frameworks do not address. Prevailing AI governance models primarily target static or semi-autonomous systems, overlooking the dynamic, self-directed behaviors of agentic AI that complicate accountability, ethical compliance, and organizational integration. This study develops a preliminary expert-informed governance and evaluation framework specifically for agentic AI in enterprise contexts. Using a multi-method qualitative design, the research integrates a systematic literature review with semi-structured interviews of domain experts in AI governance, ethics, and enterprise deployment. The resulting framework synthesizes technical, ethical, and organizational dimensions, incorporating multi-faceted evaluation metrics, trajectory-based assessments, and adaptable human oversight aligned with operational risk. Expert input highlights the critical role of organizational readiness, role clarity, and collaborative leadership for effective implementation. This research advances a practical and actionable governance paradigm that supports responsible deployment and continuous evaluation of agentic AI systems, bridging theoretical understanding with enterprise practice and enabling organizations to leverage autonomous AI technologies sustainably and ethically.

References

E. Brynjolfsson and A. McAfee, Machine, Platform, Crowd: Harnessing Our Digital Future. New York, NY, USA: W. W. Norton & Company, 2017.

S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed. Hoboken, NJ, USA: Pearson, 2022.

L. Floridi, J. Cowls, M. Beltrametti, R. Chatila, P. Chazerand, V. Dignum, et al., “AI4People: An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations,” Minds and Machines, vol. 28, no. 4, pp. 689–707, 2018.

I. Rahwan, “Society-in-the-loop: Programming the algorithmic social contract,” Ethics and Information Technology, vol. 20, no. 1, pp. 5–14, 2018.

C. Cath, S. Wachter, B. Mittelstadt, M. Taddeo, and L. Floridi, “Artificial intelligence and the ‘good society’: The US, EU, and UK approach,” Science and Engineering Ethics, vol. 24, no. 2, pp. 505–528, 2018.

C. O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York, NY, USA: Crown Publishing Group, 2016.

B. D. Mittelstadt, P. Allo, M. Taddeo, S. Wachter, and L. Floridi, “The ethics of algorithms: Mapping the debate,” Big Data & Society, vol. 3, no. 2, 2016.

F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint, arXiv:1702.08608, 2017.

A. Jobin, M. Ienca, and E. Vayena, “The global landscape of AI ethics guidelines,” Nature Machine Intelligence, vol. 1, no. 9, pp. 389–399, 2019.

M. Brundage, S. Avin, J. Clark, H. Toner, P. Eckersley, B. Garfinkel, et al., “Toward trustworthy AI development: Mechanisms for supporting verifiable claims,” arXiv preprint, arXiv:2004.07213, 2020.

A. F. T. Winfield and M. Jirotka, “Ethical governance is essential to building trust in robotics and artificial intelligence systems,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 376, no. 2133, 2018.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.

J. Larson, S. Mattu, L. Kirchner, and J. Angwin, “How we analyzed the COMPAS recidivism algorithm,” ProPublica, 2016.

S. Amershi, D. Weld, M. Vorvoreanu, A. Fourney, B. Nushi, P. Collisson, et al., “Guidelines for human-AI interaction,” in Proc. 2019 CHI Conf. Human Factors in Computing Systems (CHI ’19), Glasgow, U.K., May 4–9, 2019, pp. 1–13.

Y. Wang, M. Li, M. Lu, and H. Chen, “A multi-dimensional evaluation criterion scheme for autonomous vehicles: Behavioral performance,” in Proc. 35th Chinese Control and Decision Conf. (CCDC), 2023, pp. 3370–3375.

R. Binns, “Fairness in machine learning: Lessons from political philosophy,” in Proc. 1st Conf. Fairness, Accountability and Transparency (FAT ’18), New York, NY, USA, Feb. 23–24, 2018, pp. 149–159.

G. Glinska, Building Stakeholder Trust in Artificial Intelligence. Darden White Paper, Batten Institute, Darden School of Business, 2024.

V. Dignum, “Responsible artificial intelligence: Designing AI for human values,” ITU Journal: ICT Discoveries, Special Issue 1, pp. 1–8, 2017.

J. Whittlestone, R. Nyrup, A. Alexandrova, K. Dihal, and S. Cave, Ethical and Societal Implications of Algorithms, Data, and Artificial Intelligence: A Roadmap for Research. Cambridge, U.K.: Leverhulme Centre for the Future of Intelligence, 2019.

J. D. Lee and K. A. See, “Trust in automation: Designing for appropriate reliance,” Human Factors, vol. 46, no. 1, pp. 50–80, 2004.

R. R. Hoffman, M. Johnson, J. M. Bradshaw, and A. Underbrink, “Trust in automation,” IEEE Intelligent Systems, vol. 28, no. 1, pp. 84–88, Jan. 2013.

O. L. Olorunfemi, O. O. Amoo, A. Atadoga, O. A. Fayayola, O. Abrahams, and P. Olaseni, “Towards a conceptual framework for ethical AI development in IT systems,” Computer Science & IT Research Journal, vol. 5, no. 3, pp. 616–627, 2024.

V. Venkatesh and H. Bala, “Technology acceptance model 3 and a research agenda on interventions,” Decision Sciences, vol. 39, no. 2, pp. 273–315, 2008.

M. Tarafdar, C. M. Beath, and J. W. Ross, “Using AI to enhance business operations,” in How AI Is Transforming the Organization, Cambridge, MA, USA: MIT Press, 2020, pp. 67–86.

R. Cariaga, “Human-AI collaboration in decision-making, creativity, and productivity: Systematic review,” SSRN Electronic Journal, 2025.

T. H. Davenport and R. Ronanki, “Artificial intelligence for the real world,” Harvard Business Review, vol. 96, no. 1, pp. 108–116, 2018.

J. W. Creswell and V. L. Plano Clark, Designing and Conducting Mixed Methods Research, 3rd ed. Thousand Oaks, CA, USA: SAGE Publications, 2018.

L. A. Palinkas, S. M. Horwitz, C. A. Green, J. P. Wisdom, N. Duan, and K. Hoagwood, “Purposeful sampling for qualitative data collection and analysis in mixed-method implementation research,” Administration and Policy in Mental Health and Mental Health Services Research, vol. 42, no. 5, pp. 533–544, 2015.

G. Silva-Atencio, “Quantifying AI autonomy: A multidimensional framework for agentic AI governance and risk assessment,” Artificial Intelligence and Applications, 2025.

E. Papagiannidis, P. Mikalef, and K. Conboy, “Responsible artificial intelligence governance: A review and research framework,” The Journal of Strategic Information Systems, vol. 34, no. 2, 2025.

F. Piccialli, D. Chiaro, S. Sarwar, D. Cerciello, P. Qi, and V. Mele, “AgentAI: A comprehensive survey on autonomous agents in distributed AI for industry 4.0,” Expert Systems with Applications, vol. 291, p. 128404, Oct. 2025.

P. Frangos, “An integrative literature review on leadership and organizational readiness for AI,” in Proceedings of the International Conference on AI Research, vol. 4, no. 1, pp. 145–152, 2022.

Governance and Evaluation Framework for Agentic AI Systems in Enterprise Operations

Authors

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Current Issue