Generative AI has reached a point where even trained observers are struggling to spot its work. A new study from UNSW Sydney demonstrates just how easily humans can be fooled by AI-generated faces, with participants averaging only 11 out of 20 correct identifications—barely better than random guessing.

The test, developed by researchers at the University of New South Wales, presents 20 faces in a simple real-or-AI challenge. While some users scored well above average—including the of the study, who achieved 14 out of 20—the overall performance was dismal. The findings, published in the British Journal of Psychology, suggest that as AI models refine their outputs, the once-reliable visual flaws—like distorted teeth or mismatched ears—are disappearing.

What’s more, the study identified a stark gap between confidence and accuracy. Participants, regardless of skill level, often overestimated their ability to detect AI faces. Even 'super-recognizers'—individuals with exceptional face-processing abilities—only marginally outperformed the average, scoring just a few points higher.

The research team noted that AI faces tend to exhibit a 'hyper-average' appearance, lacking the subtle imperfections found in real human features. Since generative models prioritize statistically likely outputs, this uniformity becomes a dead giveaway. Yet, for most people, spotting it remains difficult.

New Study Reveals Humans Struggle to Spot AI-Generated Faces—Even Experts

The Limits of Old Tricks

Many still rely on outdated visual cues to identify AI-generated images—think blurry backgrounds, unnatural accessories, or poorly rendered details. But as models advance, these flaws are being smoothed out. The UNSW study confirms what earlier research—like Microsoft’s 2023 experiment—has shown: without context, humans struggle to tell the difference between real and AI-generated content more than 30% of the time.

Why Context Still Matters

While the test strips away real-world context, the study’s lead researcher, Dr. James D. Dunn, emphasizes that outside controlled environments, people have more tools at their disposal. For instance, an AI-generated profile might lack a history, post identical content across accounts, or include suspicious links. These red flags, though not foolproof, can help mitigate the risk of deception.

Yet the broader takeaway is clear: as generative AI improves, the ability to spot its work without specialized tools will become increasingly difficult. The question isn’t just whether humans can keep up—it’s whether the systems themselves will ever develop flaws that are easy to detect.