OpenAI's GPT-4 with Vision: Key Flaws Uncovered in Recent Study

Home AI News OpenAI's GPT-4 with Vision: Key Flaws Uncovered in Recent Study

Updated on October 24 2024

When OpenAI introduced GPT-4, its leading text-generating AI model, the company highlighted its multimodal capabilities—specifically, its understanding of both images and text. According to OpenAI, GPT-4 can generate captions and interpret complex images, such as identifying a Lightning Cable adapter from a plugged-in iPhone photo.

However, following the announcement in late March, OpenAI has withheld the model's image features due to concerns about potential misuse and privacy violations. Only recently did the company clarify the reasoning behind these concerns. Earlier this week, OpenAI released a technical paper outlining its efforts to address the challenges associated with GPT-4’s image analysis functionalities.

As of now, GPT-4 with vision, referred to as “GPT-4V” internally, has been regularly utilized by a limited user base of a few thousand within Be My Eyes, an app designed to assist individuals with low vision or blindness in navigating their surroundings. In recent months, OpenAI has also engaged “red teamers” to explore any unintended behaviors of the model, as detailed in the paper.

The document states that OpenAI has put safeguards in place to prevent GPT-4V from being exploited for ill-intended purposes, such as breaking CAPTCHAs or making assumptions about a person's identity, age, or race based solely on images. OpenAI is also addressing biases that may pertain to physical appearance, gender, or ethnicity.

Nevertheless, no AI model is entirely immune to flaws. The paper indicates that GPT-4V sometimes fails to draw appropriate inferences, occasionally merging distinct text strings in an image to generate fictitious terms. Like its predecessor, GPT-4V may create inaccuracies by confidently fabricating information. It also has difficulty recognizing text or characters, sometimes bypassing mathematical symbols and missing obvious objects or settings.

Given these limitations, it is clear why OpenAI explicitly advises against using GPT-4V for detecting hazardous substances or chemicals in images—an unusual but apparently concerning consideration for the company. Red team evaluations showed that while the model sometimes identifies poisonous items like toxic mushrooms correctly, it often misidentifies substances such as fentanyl, carfentanil, and cocaine based on their chemical structures.

In medical imaging contexts, GPT-4V shows similar shortcomings; it can provide inconsistent answers, and fails to adopt standard practices, such as interpreting imaging scans oriented as if the patient were facing the viewer. As a result, it may misdiagnose various conditions.

Moreover, the paper notes that GPT-4V struggles with the subtleties of certain hate symbols and missed the contemporary meaning of the Templar Cross associated with white supremacy in the U.S. In a particularly strange instance demonstrating its hallucinatory tendencies, GPT-4V generated songs or poems praising hateful figures or groups when presented with their images, even in the absence of explicit identification.

The model also exhibits biases based on gender and body type, but these issues primarily arise when OpenAI's protective measures are turned off. In one experiment, when prompted to offer advice to a woman in a bathing suit, GPT-4V focused almost exclusively on her body weight and body positivity, a stark contrast to responses it would likely offer for a man.

Overall, the language in the technical paper suggests that GPT-4V remains a work in progress, with many steps still needed to reach OpenAI’s initial aspirations. The company has resorted to implementing stringent measures to curtail the risk of toxicity, misinformation, or privacy breaches.

OpenAI asserts that it is developing “mitigations” and “processes” aimed at enhancing the model’s capabilities safely, such as allowing GPT-4V to describe faces without disclosing users’ identities. However, the paper suggests that GPT-4V is not yet the ultimate solution, indicating that significant work remains to be done.

Kolena Secures $15M Funding to Develop Innovative Tools for Testing AI Models

Tubi Introduces New Content Discovery Tool Powered by OpenAI's ChatGPT-4 for Enhanced Viewer Experience

Most people like

Documind

27.8K

Revolutionary Tool for Quick and Efficient Document Search.

document search AI Chatbot

Ai Girlfriends

14K

Discover the convenience of virtual companionship and engaging AI chat experiences, all in one platform.

virtual companionship AI Chatbot

CinemaFlow AI

115.6K

Transform your ideas into captivating visual stories with just one click. This powerful script allows you to effortlessly create compelling narratives that engage your audience and elevate your content. Say goodbye to the complexities of storytelling—now you can bring your visions to life in an instant!

video creation AI Script Writing

Vocareum

475.4K

Discover how virtual labs are transforming education and training by providing innovative platforms designed to enhance learning experiences. These advanced online environments offer hands-on practice, interactive simulations, and immersive resources, making complex concepts accessible and engaging. Whether for academic purposes or professional development, virtual labs are reshaping the way knowledge is acquired and skills are developed in today’s digital age.

virtual labs Other

Find AI tools in YBX