Today at OpenAI’s Spring Updates event, Chief Technology Officer Mira Murati unveiled GPT-4o (GPT-4 Omni), a groundbreaking multimodal large language model (LLM) that will be available to free ChatGPT users in the coming weeks. Additionally, a new desktop ChatGPT app for macOS (with Windows support coming later) will allow users to access the platform beyond web and mobile applications.
“GPT-4o reasons across voice, text, and vision,” Murati explained, highlighting its ability to accept and analyze real-time video captured by users via their ChatGPT smartphone apps, although this feature is not yet publicly available.
“This feels magical, and that’s wonderful, but we want to demystify it and let you try it for yourself,” she added.
The new model can respond in real-time audio, detect users' emotional states from audio and video inputs, and adjust its vocal tone to express various emotions, similar to offerings from rival AI startup Hume.
During a demo, a presenter asked ChatGPT powered by GPT-4o to narrate a story with increasing drama, which it executed swiftly. It intelligently stops speaking when interrupted and listens attentively before continuing.
OpenAI shared demo videos showcasing GPT-4o’s capabilities, stating it can respond to audio inputs in as little as 232 milliseconds, averaging 320 milliseconds—comparable to human conversational response times.
OpenAI explained how GPT-4o enhances user experience, saying, “Before GPT-4o, Voice Mode involved latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4), using three separate models. This meant the main intelligence source—GPT-4—couldn't fully perceive tone, multiple speakers, or background sounds.”
With GPT-4o, all inputs and outputs are processed by a single end-to-end neural network, combining text, vision, and audio to create richer interactions. It can even generate multiple views of an image, which can be transformed into 3D objects.
However, OpenAI has not announced plans to open-source GPT-4o or any of its newer models. While users can explore the model's capabilities on OpenAI’s website and through its API, they won’t have access to the underlying model weights for customization—an area of criticism from co-founder Elon Musk.
The introduction of GPT-4o significantly upgrades the free ChatGPT experience. Previously limited to the text-only GPT-3.5 model, free users will now access a more advanced model with enhanced capabilities for analyzing images and documents, web browsing, data analysis, custom GPTs, and memory storage that retains user preferences through simple prompts.
In a live demo, presenters showcased ChatGPT powered by GPT-4o translating spoken words in real-time between Italian and English.
OpenAI also highlighted, “ChatGPT now supports over 50 languages for sign-up, login, and user settings.” Furthermore, GPT-4o excels at understanding and discussing shared images and can create consistent AI art characters, a feat that eluded many existing AI art generators.
Initially, GPT-4o will be available to paying subscribers, with a gradual rollout to free users: “We’re starting with ChatGPT Plus and Team users, with Enterprise access coming soon. Free users will have usage limits,” OpenAI stated.
On social media, OpenAI confirmed that "text and image input" are being rolled out in the API today, while voice and video capabilities will launch in the coming weeks. The API will offer GPT-4o at half the price and double the speed of GPT-4 Turbo, with increased call limits for third-party developers.
OpenAI CEO Sam Altman reflected on the company’s evolving mission: “Our goal was to create AI for societal benefit, but now it appears we’ll develop AI that empowers others to innovate, benefiting everyone.”
In his blog post, Altman noted: “Our primary mission is to provide powerful AI tools affordably. I’m proud that we offer the best model globally for free via ChatGPT.”
The new ChatGPT desktop app is set for a staggered release, launching first on macOS and later for Windows. Murati revealed that more than 100 million people currently use ChatGPT, with over 1 million custom GPTs created.
Despite the event’s brief 26 minutes and some awkward live demos, the soon-to-launch technology promises to enhance user experience, offering a more natural, powerful interface than prior versions.