At 10:30 AM Pacific Time on Monday, May 13, 2024, OpenAI unveiled its latest AI foundation model, GPT-4o, showcasing its remarkable ability to engage in natural conversations through audio prompts. This multimodal system also efficiently processes uploaded audio, video, and text inputs, offering faster responses at a lower cost compared to previous versions.
Just a few hours later, at 2:29 PM PT, the model was accessed illegally by an individual known as "Pliny the Prompter," who shared a specific prompt on the social network X that bypassed the model's safety restrictions.
The jailbreak allowed users to generate explicit content or analyze sensitive material like X-ray images—features that were previously restricted.
Pliny the Prompter is not new to this scene. They have been hacking into popular large language models (LLMs) like Anthropic's Claude and Google's Gemini since last year, producing various controversial outputs, from illicit instructions to unauthorized visual representations of celebrities.
In May 2023, Pliny founded a Discord community titled “BASI PROMPT1NG” to unite fellow jailbreak enthusiasts, fostering collaboration in navigating the boundaries set by AI providers.
The current landscape of LLM jailbreaking in 2024 mirrors past trends in iOS, when users quickly found ways to customize Apple's tightly controlled software. However, with LLMs, jailbreakers may gain access to even more advanced and autonomous systems.
But what drives these jailbreakers? Are they merely agents of chaos, or do they have deeper intentions? We conducted an exclusive interview with Pliny to explore their motivations and perspectives on AI:
a media: When did you start jailbreaking LLMs? Have you done similar work before?
Pliny the Prompter: I've been at it for about 9 months now; I hadn't done anything like this prior.
What are your strongest skills in this area?
Pliny the Prompter: Jailbreaking, prompt injections, and system prompt leaks. It takes creativity, pattern recognition, and consistent practice—along with a solid interdisciplinary background and intuition.
Why do you jailbreak LLMs? What impact do you hope this has on users and the tech industry?
Pliny the Prompter: I dislike restrictions; being told I can't do something fuels my persistence. I see unlocking AI not only as a personal victory but also as a way to highlight the limitations of guardrails. My goal is to increase awareness of AI's true potential and encourage a shift toward transparency.
How do you approach finding flaws in new models?
Pliny the Prompter: I analyze how the system thinks, whether it allows role-play, its creative output capabilities, and its interactions with different types of text.
Have you been approached by AI providers regarding your work?
Pliny the Prompter: Yes, they’ve expressed admiration for my capabilities.
Are you concerned about legal repercussions for jailbreaking?
Pliny the Prompter: There's always some concern, but laws around AI jailbreaking are still unclear. I’ve never been banned, though I have received warnings. Most organizations appreciate that this form of red teaming ultimately protects their interests.
How do you respond to critics who view jailbreaking as dangerous?
Pliny the Prompter: While it may seem risky, responsible red teaming is crucial for identifying and preventing harmful vulnerabilities within AI. The ethical questions surrounding deepfakes also raise important discussions about accountability in AI-generated content.
What inspired your name, "Pliny the Prompter"?
Pliny the Prompter: I draw inspiration from Pliny the Elder, a historic figure known for his diverse talents and bravery. His exploration spirit resonates with my own curiosity and tenacity.
In an age where AI technology is rapidly evolving, the actions of jailbreakers like Pliny the Prompter raise significant questions about the ethics of AI use, the boundaries of creativity, and the ongoing dialogue about the future of artificial intelligence.