Skip to content

HARRISBURG, PA — A research paper led by a Harrisburg University of Science & Technology (HU) doctoral student is drawing attention for examining a pressing question in artificial intelligence (AI): when AI agents are given tools and turned loose on real-world tasks, do they finish those tasks safely — or just successfully?

The paper, “The Verifier Tax: Horizon Dependent Safety-Success Tradeoffs in Tool Using LLM Agents,” was accepted to the inaugural Association for Computing Machinery (ACM) Conference on AI and Agentic Systems (CAIS), held May 26–29 in San Jose, California. Lead author Tanmay Sah, who recently defended a doctoral dissertation at Harrisburg University, presented the work. The study is part of Tanmay’s dissertation research.

The research was co-authored by HU’s Kayden Jordan, Ph.D., Assistant Professor and Program Lead of Social Analytics, along with Vishal Srivastava of Johns Hopkins University and Dolly Sah of the University of Utah.

Most AI systems are judged by a simple measure: Did they complete the task? “But our work asks a deeper question,” said Tanmay. “Did the AI complete the task safely and correctly, or did it succeed in a way that violated rules, policies, or user trust?”

Using simulated customer-service scenarios in airline and retail settings, the team tested whether AI agents that successfully completed a request did so without breaking rules, violating policies or undermining user trust.

“The motivation came from seeing real-world AI incidents where deployed systems caused harm or near harm,” Tanmay said. “One example documented in the AI Incident Database is the wrongful arrest of Robert Williams after a false facial-recognition match. Incidents like this show that the issue is not simply whether an AI system produces an output, but whether that output is correct, safe, compliant, and appropriate for the real-world context.”

A notable share of “successful” outcomes, the researchers found, came from unsafe actions. To address this, the team built an automated verifier to check an agent’s intended actions against required policies before they were carried out.

Tanmay continued: “Through this research, we found that the verifier helped block unsafe actions, but it also revealed another important issue: blocking unsafe actions does not automatically improve the agent’s overall task success rate. In other words, an agent may become safer but still struggle to recover and complete the task successfully after being blocked. We describe this as a safety-capability gap.”

In other words, task success alone is not enough.

“An AI agent may complete a task and still do so in a way that violates rules, policies or user trust. As AI agents become more autonomous and begin using tools to complete multi-step tasks, we need evaluation frameworks that measure not only whether they succeed, but whether they succeed safely.”

“It would be especially meaningful if larger AI organizations keep this idea in mind when building, evaluating, and releasing new models or agentic systems. Wherever possible, organizations should measure safe success and unsafe success separately, rather than relying only on overall performance or task completion.”

The large-scale experiments were made possible in part by Harrisburg University’s computing resources, including access to GPU hardware that supported the team’s testing.

The paper has gained recognition beyond the conference. ACM named it among the highlighted papers in its official announcement of CAIS 2026, and it was featured in a preview of the conference published by Two Sigma, a quantitative investment and technology firm. The full paper is available through the ACM Digital Library.

Learn more today about HU’s graduate programs in Data Science and Analytics.

# # #

ABOUT HARRISBURG UNIVERSITY

Harrisburg University of Science & Technology (HU) is an independent, nonprofit university offering degrees in advanced manufacturing, engineering, robotics, nursing, cybersecurity, and other critical fields. Accredited by the Middle States Commission on Higher Education, HU serves a diverse student body through bachelor’s, master’s, and doctoral programs that link learning and research with practical applications. For information about HU’s affordable STEM degrees and professional development programs, call 717.901.5146 or email Connect@HarrisburgU.edu. Stay in the know by following Harrisburg University on LinkedIn, Instagram, and Facebook.

MEDIA CONTACT

Do you have questions about this story? Interested in lining up an interview? Please contact Dan Wilhelm, Director of Communications for Harrisburg University, at DWilhelm@HarrisburgU.edu or 717.901.5100×1724.

# # #