About a month and a half ago, Google introduced its advanced large language model, Gemini, which reportedly uses five times the computational power of GPT-4. Touted as a "multimodal and efficient machine learning tool," Gemini's development started in April 2023 after the merger of Google Brain and DeepMind. More details about Gemini are expected in the coming months. It is anticipated to match GPT-4’s parameter scale and has already shown remarkable multimodal capabilities during training.
Once Gemini is fine-tuned and undergoes thorough safety testing, Google plans to release various versions suited for different products, applications, and devices. Recent updates indicate that a select group of partner companies has been granted early access to Gemini’s software, which may soon be integrated into consumer services and business solutions through Google’s cloud offerings.
Meanwhile, OpenAI is working to incorporate multimodal capabilities into GPT-4, potentially replicating the features aimed for in Gemini. This initiative, codenamed Gobi, is expected to launch before Gemini's official release, as OpenAI strives to maintain its competitive advantage in the AI landscape. When GPT-4 was released earlier this year, OpenAI showcased its multimodal features, initially available only to select organizations, such as accessibility services like Be My Eyes.
After several months, OpenAI is gearing up to launch a broader version of its visual capabilities, named GPT-Vision. The launch has faced delays due to concerns that the new features could be misused for malicious purposes, such as bypassing CAPTCHA or unauthorized surveillance. OpenAI is reportedly addressing the legal implications of this technology, with announcements likely on the horizon.
Google has also come under scrutiny regarding the potential misuse of Gemini. In response to concerns, a spokesperson revealed that measures were taken as early as July to ensure the responsible development and rollout of related products. Leveraging its vast proprietary data across text, images, video, and audio—including information from search engines and YouTube—Gemini aims to draw on years of accumulated expertise.
An early user of Gemini has noted that it effectively mitigates issues associated with "AI hallucinations," a common challenge faced by existing large models. OpenAI's CEO, Sam Altman, has hinted at various enhancements for GPT-4, pointing towards the development of an upgraded model, although he has played down the imminent arrival of GPT-5. Conversely, Mustafa Suleyman, co-founder of DeepMind, has suggested that OpenAI may be secretly developing and training GPT-5 under another name.
While OpenAI is committed to Gobi to preserve its leadership in AI-generated content, reports indicate that Gobi may still be in the technical validation phase. Recently, Google CEO Sundar Pichai expressed confidence in his company's position in AI, highlighting their focus on balancing innovation with responsibility amid technological advancements.
This ongoing race in AI development mirrors the competition between iOS and Android in the mobile ecosystem, and excitement for Gemini's launch is palpable. People are eager to explore its robust functionalities and discover how it will shape the landscape between Google and OpenAI. Meanwhile, Baidu's CEO Li Yanhong noted that pursuing large models is less meaningful than capitalizing on application opportunities. Regardless of which platform prevails in the smartphone competition, services like WeChat have already attracted billions of users, expanding their usage across numerous scenarios.