OpenAI Launches Multimodal Model to Compete with Google Gemini

Home AI News OpenAI Launches Multimodal Model to Compete with Google Gemini

About a month and a half ago, Google introduced its advanced large language model, Gemini, which reportedly uses five times the computational power of GPT-4. Touted as a "multimodal and efficient machine learning tool," Gemini's development started in April 2023 after the merger of Google Brain and DeepMind. More details about Gemini are expected in the coming months. It is anticipated to match GPT-4’s parameter scale and has already shown remarkable multimodal capabilities during training.

Once Gemini is fine-tuned and undergoes thorough safety testing, Google plans to release various versions suited for different products, applications, and devices. Recent updates indicate that a select group of partner companies has been granted early access to Gemini’s software, which may soon be integrated into consumer services and business solutions through Google’s cloud offerings.

Meanwhile, OpenAI is working to incorporate multimodal capabilities into GPT-4, potentially replicating the features aimed for in Gemini. This initiative, codenamed Gobi, is expected to launch before Gemini's official release, as OpenAI strives to maintain its competitive advantage in the AI landscape. When GPT-4 was released earlier this year, OpenAI showcased its multimodal features, initially available only to select organizations, such as accessibility services like Be My Eyes.

After several months, OpenAI is gearing up to launch a broader version of its visual capabilities, named GPT-Vision. The launch has faced delays due to concerns that the new features could be misused for malicious purposes, such as bypassing CAPTCHA or unauthorized surveillance. OpenAI is reportedly addressing the legal implications of this technology, with announcements likely on the horizon.

Google has also come under scrutiny regarding the potential misuse of Gemini. In response to concerns, a spokesperson revealed that measures were taken as early as July to ensure the responsible development and rollout of related products. Leveraging its vast proprietary data across text, images, video, and audio—including information from search engines and YouTube—Gemini aims to draw on years of accumulated expertise.

An early user of Gemini has noted that it effectively mitigates issues associated with "AI hallucinations," a common challenge faced by existing large models. OpenAI's CEO, Sam Altman, has hinted at various enhancements for GPT-4, pointing towards the development of an upgraded model, although he has played down the imminent arrival of GPT-5. Conversely, Mustafa Suleyman, co-founder of DeepMind, has suggested that OpenAI may be secretly developing and training GPT-5 under another name.

While OpenAI is committed to Gobi to preserve its leadership in AI-generated content, reports indicate that Gobi may still be in the technical validation phase. Recently, Google CEO Sundar Pichai expressed confidence in his company's position in AI, highlighting their focus on balancing innovation with responsibility amid technological advancements.

This ongoing race in AI development mirrors the competition between iOS and Android in the mobile ecosystem, and excitement for Gemini's launch is palpable. People are eager to explore its robust functionalities and discover how it will shape the landscape between Google and OpenAI. Meanwhile, Baidu's CEO Li Yanhong noted that pursuing large models is less meaningful than capitalizing on application opportunities. Regardless of which platform prevails in the smartphone competition, services like WeChat have already attracted billions of users, expanding their usage across numerous scenarios.

Stanford Student Team Apologizes for Plagiarizing Chinese AI Model: Code Author Missing Following Withdrawal

China's First Large AI Models: Transforming Everyday Lives and Shifting the Competitive Landscape from Specifications to Real-World Applications

Most people like

Logomakerr.ai

161.8K

Design stunning professional logos effortlessly with Logomakerr.AI’s advanced AI-driven logo maker.

Other AI Logo Generator

Maika AI

57.5K

Discover the ultimate AI-powered content creation tool designed specifically for marketers and content creators. Enhance your content strategy, streamline your workflow, and engage your audience effectively with cutting-edge artificial intelligence technology. Unlock your creative potential and elevate your marketing campaigns today!

AI research AI SEO Assistant

Elium

17.3K

In today's fast-paced digital landscape, a Knowledge Sharing Platform plays a crucial role in harnessing collective intelligence. By offering a centralized space for individuals and organizations to exchange ideas and resources, these platforms foster collaboration and innovation. This not only empowers users to access diverse insights but also drives informed decision-making and effective problem-solving. Join us as we explore how a robust knowledge sharing platform can enhance collective intelligence and transform the way we learn and work together.

Knowledge Sharing Platform AI Knowledge Management

SciSummary

239.8K

Quickly summarize and comprehend scientific articles with SciSummary's AI-powered platform. Experience enhanced understanding and efficient reading with our innovative tools designed to simplify complex research findings.

AI Summarizer

Find AI tools in YBX