Recently, AI21 Labs, a competitor of OpenAI, launched an engaging social experiment titled "Human or Not." This online game demonstrated that up to 32% of participants found it challenging to distinguish between humans and AI chatbots, marking it as the largest Turing Test to date. The study utilized advanced large language models (LLMs) such as OpenAI's GPT-4 and AI21's Jurassic-2, analyzing over a million interactions.
The findings were enlightening. Participants successfully identified human interlocutors 73% of the time but struggled with bots, achieving only a 60% accuracy rate. Researchers observed that players adopted various strategies to determine if they were interacting with a human or a machine. Many assumed that bots would not make spelling or grammatical errors and would avoid using slang. However, the chatbots were designed to make such errors and utilize casual language.
Participants often asked personal questions like, “Where are you from?” or “What’s your name?” believing AI lacked personal history. Surprisingly, many chatbots provided compelling responses, as they had been trained on a wide range of personal narratives.
After two minutes of conversation, users were challenged to guess whether their partner was human or a robot. Following over a month of play and millions of interactions, 32% of participants could not differentiate between the two. Interestingly, some speculated that excessive politeness might indicate an AI presence.