Major AI Conference Flooded With Peer Reviews Written Fully By AI

Close up view of a red toy robot sat amongst a stack of books — An AI-detection tool developed by Pangram labs found that peer reviewers are increasingly using chatbots to draft responses to authors.Credit: breakermaximus/iStock via Getty

What can researchers do if they suspect that their manuscripts have been peer reviewed using artificial intelligence (AI)? Dozens of academics have raised concerns on social media about manuscripts and peer reviews submitted to the organizers of next year’s International Conference on Learning Representations (ICLR), an annual gathering of specialists in machine learning. Among other things, they flagged hallucinated citations and suspiciously long and vague feedback on their work.

Graham Neubig, an AI researcher at Carnegie Mellon University in Pittsburgh, Pennsylvania, was one of those who received peer reviews that seemed to have been produced using large language models (LLMs). The reports, he says, were “very verbose with lots of bullet points” and requested analyses that were not “the standard statistical analyses that reviewers ask for in typical AI or machine-learning papers.”

But Neubig needed help proving that the reports were AI-generated. So, he posted on X (formerly Twitter) and offered a reward for anyone who could scan all the conference submissions and their peer reviews for AI-generated text. The next day, he got a response from Max Spero, chief executive of Pangram Labs in New York City, which develops tools to detect AI-generated text.

Pangram screened all 19,490 studies and 75,800 peer reviews submitted for ICLR 2026, which will take place in Rio de Janeiro, Brazil, in April. Neubig and than 11,000 other AI researchers will be attending.

Pangram’s analysis revealed that around 21% of the ICLR peer reviews were fully AI-generated, and than half contained signs of AI use. The findings were posted online by Pangram Labs.

“People were suspicious, but they didn’t have any concrete proof,” says Spero. “Over the course of 12 hours, we wrote some code to parse out all of the text content from these paper submissions,” he adds.

The conference organizers say they will now use automated tools to assess whether submissions and peer reviews breached policies on using AI in submissions and peer reviews. This is the first time that the conference has faced this issue at scale, says Bharath Hariharan, a computer scientist at Cornell University in Ithaca, New York, and senior programme chair for ICLR 2026. “After we go through all this process … that will give us a better notion of trust.”

AI-written peer review

The Pangram team used one of its own tools, which predicts whether text is generated or edited by LLMs. Pangram’s analysis flagged 15,899 peer reviews that were fully AI-generated. But it also identified many manuscripts that had been submitted to the conference with suspected cases of AI-generated text: 199 manuscripts (1%) were found to be fully AI-generated; 61% of submissions were mostly human-written; but 9% contained than 50% AI-generated text.

Pangram described the model in a preprint¹which it submitted to ICLR 2026. Of the four peer reviews received for the manuscript, one was flagged as fully AI-generated and another as lightly AI-edited, the team’s analysis found.

Major AI conference flooded with peer reviews written fully by AI

AI is transforming peer review — and many scientists are worried

For many researchers who received peer reviews for their submissions to ICLR, the Pangram analysis confirmed what they had suspected. Desmond Elliott, a computer scientist at the University of Copenhagen, says that one of three reviews he received seemed to have missed “the point of the paper”. His PhD student who led the work suspected that the review was generated by LLMs, because it mentioned numerical results from the manuscript that were incorrect and contained odd expressions.

When Pangram released its findings, Elliott adds, “the first thing I did was I typed in the title of our paper because I wanted to know whether my student’s gut instinct was correct”. The suspect peer review, which Pangram’s analysis flagged as fully AI-generated, gave the manuscript the lowest rating, leaving it “on the borderline between accept and reject”, says Elliott. “It’s deeply frustrating”.

Repercussions

p style=”font-size:18px;color:#555″>
Disclaimer: This news article has been republished exactly as it appeared on its original source, without any modification.
We do not take any responsibility for its content, which remains solely the responsibility of the original publisher.

p style=”font-size:14px;color:#555″>
Author:Miryam Naddaf
Published on:2025-11-27 04:00:00
Source: www.nature.com

Disclaimer: This news article has been republished exactly as it appeared on its original source, without any modification.
We do not take any responsibility for its content, which remains solely the responsibility of the original publisher.

Author: uaetodaynews
Published on: 2025-11-27 14:19:00
Source: uaetodaynews.com