OpenAI Launches GPT-4o 'Omni' Model, Enhancing ChatGPT Experience

Home AI News OpenAI Launches GPT-4o 'Omni' Model, Enhancing ChatGPT Experience

Updated on October 22 2024

OpenAI unveiled its latest flagship generative AI model, GPT-4o, on Monday. The “o” stands for “omni,” highlighting the model's versatility in processing text, speech, and video. GPT-4o is expected to be introduced across OpenAI’s developer and consumer products in an "iterative" rollout over the coming weeks.

According to OpenAI's CTO, Mira Murati, GPT-4o delivers intelligence on par with GPT-4 while significantly enhancing its capabilities across multiple modalities.

“GPT-4o integrates reasoning across voice, text, and vision,” Murati explained during a live presentation at OpenAI’s San Francisco headquarters. “This is crucial as we rethink how humans interact with machines.”

Previously, GPT-4 Turbo was OpenAI’s most advanced model, utilizing a blend of images and text for tasks like text extraction from images or image content description. Now, with GPT-4o, the model also incorporates speech.

What does this mean for users? A world of possibilities.

GPT-4o elevates the user experience in OpenAI's AI-powered chatbot, ChatGPT. While the platform has offered a voice mode that transforms text responses into speech, GPT-4o enhances this feature, enabling users to engage with ChatGPT in a more dynamic, assistant-like manner.

For instance, users can pose a question to the GPT-4o-enhanced ChatGPT and even interrupt the response while it’s still in progress. OpenAI emphasizes the model's “real-time” responsiveness, which allows it to detect vocal nuances, generating responses in various emotive styles, including singing.

Additionally, GPT-4o enhances ChatGPT's visual analysis capabilities. Now, when given a photo or desktop screenshot, ChatGPT can quickly address inquiries such as, “What does this software code do?” or “What brand is that shirt?”

These features are set to further develop. Murati noted that while GPT-4o currently translates foreign menus from images, future iterations might enable ChatGPT to “watch” live sports events and explain the rules in real time.

“We recognize the growing complexity of these models, but we aim to make interactions more intuitive and natural, allowing users to focus solely on collaborating with ChatGPT,” Murati stated. “For years, we’ve concentrated on boosting the intelligence of our models, but this marks a significant leap in user-friendliness.”

GPT-4o is also designed to be more multilingual, performing enhanced translations in approximately 50 languages. OpenAI claims that in its API and Microsoft’s Azure OpenAI Service, GPT-4o operates twice as quickly, at half the cost, and with higher rate limits than the previous GPT-4 Turbo.

Currently, audio capabilities are not available in the GPT-4o API for all users. OpenAI plans to initially offer these features to “a small group of trusted partners” due to concerns about potential misuse.

Starting today, GPT-4o is accessible in the free version of ChatGPT and to subscribers of OpenAI’s premium ChatGPT Plus and Team plans, featuring “5x higher” message limits. OpenAI notes that when users reach their limits, ChatGPT will revert to the older GPT-3.5 model. The enhanced voice features leveraging GPT-4o will enter alpha testing for Plus users in the coming month, alongside options tailored for enterprises.

In other news, OpenAI announced a revamped ChatGPT user interface on the web, featuring a more conversational home screen and message layout. Additionally, a desktop app for macOS will allow users to pose questions using keyboard shortcuts and capture or discuss screenshots. ChatGPT Plus users will gain early access to this app, with a Windows version set for release later this year.

Moreover, the GPT Store, OpenAI's library and toolkit for third-party chatbots built on its AI models, is now available to free-tier ChatGPT users. Free users can also access previously premium features, including a memory function that allows ChatGPT to recall user preferences, upload files and images, and search the web for timely information.

Stay tuned for updates by signing up for our AI newsletter, launching soon!

ChatGPT's New Face: A Deep Dive into the Black Hole Phenomenon

Google's Project Starline: 3D Video Conferencing Platform Launching in 2025 with HP Collaboration

Most people like

Notta

Introducing our AI-Powered Transcription and Translation Service: Revolutionizing the way you convert speech into text and translate languages effortlessly. Experience fast, accurate, and reliable results tailored to your needs, making communication seamless and efficient across various platforms. Unlock the potential of artificial intelligence to enhance your workflow and connect with a global audience like never before.

transcription Transcription

Crumb

Discover our AI recipe generator that crafts unique dishes tailored to the ingredients you have on hand. Unleash culinary creativity and transform your pantry into a gourmet kitchen with personalized recipes at your fingertips!

AI recipe generator AI Recipe Assistant

Press Release Network

Boost your visibility with effective press releases and comprehensive media monitoring.

Press release AI Analytics Assistant

MagickPen

Discover the ultimate AI-powered writing tool designed for seamless text generation. Transform your writing process with ease and efficiency, as this innovative tool assists you in crafting high-quality content effortlessly. Let advanced AI technology enhance your creativity and productivity, making writing tasks quicker and more enjoyable.

AI AI Content Generator

Find AI tools in YBX