OpenAI’s president, Greg Brockman, recently shared what appears to be the first public image generated by the company’s new GPT-4o model on his X account.
The image features a person in a black T-shirt emblazoned with the OpenAI logo, writing on a blackboard. The text reads, “Transfer between Modalities. Suppose we directly model P (text, pixels, sound) with one big autoregressive transformer. What are the pros and cons?”
The GPT-4o model, launched on Monday, enhances the previous GPT-4 family (including GPT-4, GPT-4 Vision, and GPT-4 Turbo) by offering faster processing, reduced costs, and improved retention of information from diverse inputs, such as audio and visuals.
OpenAI's innovative approach in training GPT-4o with multimedia tokens eliminates the need to convert audio and visual data into text first. This allows the model to directly analyze and interpret these media formats, resulting in a more seamless and efficient operation compared to the earlier GPT-4 models, which relied on multiple interconnected models.
Comparing the new image to those generated by OpenAI's DALL-E 3—released in September 2023—highlights significant improvements in quality, photorealism, and text accuracy with the GPT-4o model.
Currently, the native image generation capabilities of GPT-4o are not publicly accessible. As Brockman noted in his post, “The team is working hard to bring those to the world.”