Emerging Open-Source AI Vision Model Challenges ChatGPT: Key Issues to Consider

Home AI News Emerging Open-Source AI Vision Model Challenges ChatGPT: Key Issues to Consider

Updated on December 4 2023

Nous Research, a private applied research group recognized for its contributions to the large language model (LLM) field, has introduced a new vision-language model called Nous Hermes 2 Vision, available on Hugging Face.

This open-source model builds on the earlier OpenHermes-2.5-Mistral-7B and extends its capabilities by allowing users to input images and extract text information from visual content. However, shortly after its launch, users reported excessive hallucination issues, prompting the company to rebrand the project as Hermes 2 Vision Alpha. A more stable version with fewer glitches is expected soon.

Nous Hermes 2 Vision Alpha

Named after the Greek messenger of the gods, Hermes, this vision model is crafted to navigate the complexities of human discourse with remarkable precision. It integrates the visual data provided by users with its learned knowledge, enabling it to deliver detailed, natural language responses. For example, the co-founder of Nous, known as Teknium on X, shared a screenshot demonstrating the model's ability to analyze an image of a burger, assessing its health implications.

Distinct Features of Nous Hermes 2 Vision

While ChatGPT, grounded in GPT-4V, also supports image prompting, Nous Hermes 2 Vision sets itself apart with two primary enhancements:

1. Lightweight Architecture: Instead of relying on traditional 3B vision encoders, Nous Hermes 2 Vision employs SigLIP-400M. This not only simplifies the model's architecture, making it lighter, but also enhances performance on vision-language tasks.

2. Function Calling Capability: The model has been trained on a custom dataset featuring function calling. Users can use a

The model was also trained on additional datasets, including LVIS-INSTRUCT4V, ShareGPT4V, and dialogues from OpenHermes-2.5.

Challenges Ahead

While Nous Hermes 2 Vision is available for research and development, early feedback indicates that it still has significant issues. Following its release, co-founder Quan Nguyen acknowledged problems related to hallucinations and the model's tendency to generate excessive EOS tokens, leading to its alpha designation.

“I see people talking about ‘hallucinations,’ and yes, the situation is concerning. I was aware of this since the underlying LLM is uncensored. I plan to release an updated version by the end of the month to address these issues,” Nguyen wrote on X.

In response to inquiries about the model's problems, further questions remained unanswered at the time of this writing. However, Nguyen mentioned that the function calling feature performs well when users provide a clear schema and indicated that he might develop a dedicated function-calling model based on user feedback.

To date, Nous Research has released 41 open-source models within its Hermes, YaRN, Capybara, Puffin, and Obsidian series, showcasing a variety of architectures and capabilities.

Unlocking the Future of Materials Science: Exploring the Pros and Cons of AI-Driven Discovery

Runway ML Collaborates with Getty Images to Create New AI Video Models for Hollywood and Advertising Efforts

Most people like

Stealth AI

33.1K

Introducing our cutting-edge AI writer designed to produce plagiarism-free content that goes undetected. Experience the power of advanced technology that ensures originality and creativity in every piece, catering to your unique writing needs while enhancing your online presence. Discover how our undetectable AI writer can elevate your content strategy effortlessly.

AI writing tool AI Rewriter

BuildShip

125.8K

Discover a visual low-code platform designed to empower you with seamless API creation, efficient scheduled jobs, and streamlined backend tasks. Unlock the potential of your development process with intuitive tools that simplify complex workflows.

Low-code AI API Design

Linguix

188.1K

Linguix enhances your writing through advanced grammar and spell checking, efficient text rewriting, and a variety of additional features designed to polish your content.

Writing assistant Writing Assistants

Pixellot

281K

Discover how AI-driven sports cameras are transforming the landscape of sports coverage, streaming, and analytics. These innovative systems enhance live broadcasting quality and provide in-depth performance analysis, giving fans and coaches unparalleled insights into every game. Join us as we explore the future of sports technology and the impact of automation in capturing the excitement of athletic events.

AI camera Other

Find AI tools in YBX