Back to all posts

AI Rivals Human Experts in School Threat Assessment

2025-10-03David Riedman, PhD3 minutes read
Artificial Intelligence
School Safety
Decision Science

Last week marked a significant milestone with the successful defense of my PhD dissertation, which explores a critical intersection of artificial intelligence and school safety. This post offers a brief overview of my research into how Large Language Models (LLMs) like ChatGPT can be used to assess threats of violence.

A Groundbreaking PhD Study

My dissertation, titled LLMs Versus Human Experts: Mixed Methods Analysis Measuring Variance in School Shooting Threat Assessments, delves into a pressing issue. With approximately 100,000 threats made to U.S. schools annually, the need for accurate and consistent assessment is paramount.

The study's abstract outlines the core research: we measured the unwanted variability, or "noise," in expert judgments. This was achieved by testing six leading-edge LLMs on fictitious but highly realistic school shooting scenarios. These scenarios were carefully crafted based on an analysis of 1,000 real threats made in the United States.

Pitting AI Against Humans: The Research Findings

Drawing from decision science, the dissertation measures the accuracy and consistency of these AI models against a prior study of 245 human law enforcement officers who rated the same six threat scenarios. The quantitative results were striking: the LLMs produced severity ratings that were, on average, within one point of the ratings from human experts. Critically, there were no statistically significant differences between the two groups.

This supports the primary hypothesis that LLMs can effectively approximate the accuracy of human threat assessments. Furthermore, the aggregate scores from the LLMs displayed significantly lower variance than the human ratings. This demonstrates both a "wisdom-of-crowds" effect and a reduction in what Daniel Kahneman calls judgment "noise."

Chart comparing LLM and Human Expert threat ratings

Why AI Was More Consistent

The qualitative analysis of the narrative explanations provided by both humans and AI revealed why this difference in consistency exists. The LLMs consistently focused on specific, objective aspects of the threats presented in the scenarios. In contrast, human experts were often influenced by subjective factors such as their own lived experiences, strict adherence to (or deviation from) formal procedures, and personal assumptions they brought to the assessment.

The Future of School Safety: Human-AI Collaboration

These findings suggest a powerful new direction for school safety. LLMs can enhance the reliability of school-based threat assessments, particularly when used as part of a human-LLM hybrid team. The AI can provide a stable, data-driven baseline, which a human expert can then augment with context and nuance.

In under-resourced or rural schools that may lack trained human experts, an LLM could even serve as the sole assessor, ensuring that no threat goes un-evaluated. This study contributes to the fields of behavioral economics, decision science, and violence prevention, offering a new framework for comparing the assessment abilities of LLMs to those of humans.

For more on the concept of judgment flaws, check out this Freakonomics interview with Daniel Kahneman.

About the Author

David Riedman, PhD is the creator of the K-12 School Shooting Database, Chief Data Officer at a global risk management firm, and a tenure-track professor. Listen to his weekly podcast—Back to School Shootings—or his recent interviews on Freakonomics Radio and in the New England Journal of Medicine.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.