Civic Tech Innovation

How good are we at detecting machine generated text?

Detectors trained on ChatGPT struggle to detect outputs from other LLMs.

Machine-generated text has been deceiving humans for the past four years. Since the release of GPT-2 in 2019, large language models (LLMs) have become increasingly adept at generating stories, news articles, student essays, and more, to the extent that people often cannot tell when they are reading content produced by an algorithm.

While LLMs are being leveraged to save time and enhance creativity in writing, their capabilities also pose risks, leading to misuse and harmful consequences that are already evident in various information spaces. The difficulty in detecting machine-generated text only amplifies these potential dangers.

To address this challenge, both academics and companies are turning to machine learning models themselves, which can identify subtle patterns in word choice and grammar that often escape human intuition. These models aim to improve the detection of AI-generated text.

Many commercial detectors now claim to detect machine-generated text with up to 99% accuracy, but are these claims realistic? Chris Callison-Burch, Professor of Computer and Information Science, and Liam Dugan, a doctoral student in Callison-Burch’s research group, explored this question in a recent paper presented at the 62nd Annual Meeting of the Association for Computational Linguistics.

“At the same time as detection technology improves, so does the technology designed to evade it,” notes Callison-Burch. “It’s an arms race, and while creating robust detectors is a worthy goal, the current models have significant limitations and vulnerabilities.”

To examine these limitations and chart a course for developing stronger detectors, the research team created the Robust AI Detector (RAID), a dataset comprising over 10 million documents, including recipes, news articles, blog posts, and more—both AI-generated and human-authored. RAID is the first standardized benchmark for testing the effectiveness of current and future detectors. The team also developed a leaderboard that publicly ranks the performance of all detectors evaluated with RAID, ensuring an unbiased comparison.

“The concept of a leaderboard has driven success in many areas of machine learning, such as computer vision,” says Dugan. “The RAID benchmark introduces the first leaderboard for robust detection of AI-generated text. We hope this will encourage transparency and high-quality research in this rapidly evolving field.”

Dugan notes that the paper is already making an impact on companies developing detection tools.

“Soon after our paper was released as a preprint and the RAID dataset became available, it was downloaded many times, and we were contacted by Originality.ai, a leading company in AI text detection,” he says. “They featured our work in a blog post, ranked their detector on our leaderboard, and are using RAID to uncover hidden vulnerabilities and improve their tool. It’s gratifying to see the community engaging with our work to advance AI-detection technology.”

But do the current detectors live up to their claims? RAID reveals that many fall short.

“Detectors trained on ChatGPT struggle to detect outputs from other LLMs like Llama, and vice versa,” Callison-Burch explains. “Similarly, detectors tuned for news articles often fail when analyzing machine-generated recipes or creative writing. What we found is that many detectors only perform well within specific use cases or on text similar to what they were trained on.”

While detectors can identify AI-generated text that is unaltered, they often fail when the text is edited or disguised, revealing a significant weakness in current detection technologies.

These faulty detectors pose a serious risk. “If schools rely on narrowly trained detectors to catch students using ChatGPT for assignments, they might falsely accuse students of cheating, or overlook others using different LLMs,” Callison-Burch warns.

Moreover, the team found that simple adversarial attacks—like replacing letters with look-alike symbols—can easily bypass detectors, allowing AI-generated text to slip through undetected.

  • Press release – University of Pennsylvania School of Engineering and Applied Science