Recently, a viral photo of a young Elon Musk captured attention on social media. A user posted it with the caption, “Reportedly, Elon Musk is researching an anti-aging formula, but it has spiraled out of control.” At first, the image appears credible, but a closer inspection reveals Musk's face has been artificially grafted onto a child’s body—a testament to advanced AI technology.
With the rise of sophisticated AI models, the internet has become inundated with AI-generated content. From humorous memes about “Comrade Trump’s retirement” to bizarre scenarios like “Musk’s AI venture leading to catering,” the variety is vast. Students increasingly utilize AI to write essays, with notable authors like Mo Yan admitting that ChatGPT co-wrote his award speech for Yu Hua.
This prevalence raises a crucial question: How can we differentiate between AI-generated content and human-created material? Instances of AI scams—like one that defrauded victims of 4.3 million—highlight the potential risks associated with this technology. Although several tools have emerged to detect AI-generated content, their reliability is questionable. To investigate this issue, I tested various image detection tools.
I assessed three popular tools—Umm-maybe, Illuminarty, and AI or Not—each claiming near 95% accuracy. Illuminarty and Umm-maybe provided probability estimates, while AI or Not offered definitive classifications. Surprisingly, these tools struggled with the viral Musk photo. Illuminarty and AI or Not identified it as AI-generated, yet Umm-maybe suggested an 81% chance it was human-made. This discrepancy was perplexing, especially given the evident AI influence. Additional tests with a famous Audrey Hepburn screenshot resulted in vague assessments from Umm-maybe, indicating a 50/50 probability of human versus AI creation.
Of the ten images tested, eight were AI-generated and two were created by humans. Ignoring the ambiguous results, AI or Not and Umm-maybe achieved an accuracy rate of 67%, while Illuminarty scored only 50%. Clearly, the accuracy of these detection tools falls short of expectations.
Next, I turned to text detection using three well-known tools: GPTZero, Sapling, and Copyleaks. After generating an advertisement for coconut water with ChatGPT, I tested the text with each tool. Surprisingly, GPTZero and Sapling suggested the text could be entirely human-written, while only Copyleaks flagged it as AI-generated. This inconsistency was striking. I also asked ChatGPT to emulate Lu Xun's style in a piece titled "Hot Pot Diary," and similarly found varied results across the tools.
The takeaway is clear: both image and text detection rely on AI algorithms to differentiate between human and AI-generated content. Different training models yield varied outcomes, and the rapid evolution of AI technologies complicates these evaluations. For instance, GPTZero employs metrics like perplexity and burstiness, where higher perplexity suggests human-like complexity and burstiness reflects sentence variation.
As AI continues to advance, refining its outputs to mimic human expression, distinguishing between human and machine-generated content becomes increasingly difficult. The responsibility lies with tech experts to innovate detection methods before AI-generated material becomes indistinguishable from that created by humans.
Ultimately, while AI technology progresses, our tools for identifying AI-generated content must also evolve, for only AI can truly navigate these complex waters.