At the recent AI Summit London, OpenAI's chief architect, Colin Jarvis, highlighted transformative advancements on the horizon for large language models, emphasizing four key areas of progress: smarter and more affordable models, enhanced model customization, expanded multimodality capabilities, and the emergence of market-leading chatbots that perform exceptionally well.
Jarvis cautioned attendees about the rapid pace of technological evolution: "Don't build for what’s available today because things are changing so fast." He pointed out that the advancements in capabilities could surpass existing technologies even before new applications have been deployed. Companies, therefore, should focus on creating distinct user experiences by leveraging language AI APIs, unique data approaches, and tailored model customizations.
A crucial differentiator for businesses developing services powered by language models is the use of their proprietary data. "The user experience you create, the data you bring to the model, and how you customize it are what will enable you to create something genuinely unique," Jarvis stated. He warned that simply wrapping existing models without adding unique value will make businesses indistinguishable from their competitors.
Jarvis noted that previous use cases that were abandoned due to costs or complexity can now be explored, thanks to reduced operating expenses and smarter models. He specifically referred to OpenAI’s cost-effective model embedding, which has made certain previously unviable use cases viable. "With GPT-4o, which operates twice as fast as GPT-4, we’ve seen many use cases that were painfully slow for users now fall below the acceptable threshold for deployment," he explained. "We are witnessing a trend where models become not just smarter, but also cheaper and faster.”
### The Chatbot Arms Race
Since its launch in late 2022, ChatGPT has sparked a crowded marketplace for chatbots, with notable competitors like Google’s Gemini and Anthropic’s Claude. Jarvis described this evolving landscape as an "arms race," where leading text-focused chatbots are achieving similar levels of intelligence.
Looking ahead, he predicted that companies will continuously strive to elevate their models' performance, striving for incremental improvements in capabilities. "The next year will be revealing in terms of whether anyone can create a leap in model capabilities akin to the jump from GPT-3 to GPT-4," he noted, emphasizing the expectation of ongoing progress within a diverse and fragmented market.
### Increased Model Customization
Traditionally, businesses would take a foundational model and fine-tune it for their specific application. However, limitations in fine-tuning and the technical expertise required for building on open-source models often present challenges. Jarvis anticipates a shift toward post-training methods using reinforcement learning, enabling models to specialize in particular fields, such as agriculture or law.
He acknowledged potential safety concerns that may arise from this approach but also recognized the exciting use cases it might inspire. "These expert-trained models could deliver substantial value in customer service applications," Jarvis argued, suggesting that they could automate certain functions while supporting human staff. He highlighted that "the more complex the process, the more human involvement is necessary, leading to collaborative experiences between AI and staff."
### Expanding Modalities: Reducing Costs
When ChatGPT first launched, its capabilities were limited to handling straightforward text and code. However, with the introduction of updates like GPT-4o, it can now process images, text, and code all at once. Jarvis noted that this multimodal capability allows businesses to run inputs through a single API call, significantly reducing operational costs.
"This advancement is accelerating processes," he stated. "We are entering a new era of user experiences that rely on low-latency interactions across various modalities." Jarvis envisaged future language models evolving to the point where users might communicate verbally and receive video responses, eliminating barriers between different modalities and promoting a seamless interaction experience.
As the field of language models continues to advance, the focus on smarter, customizable, and multimodal tools is set to redefine the landscape of AI applications, driving innovation and enhancing user experiences across sectors.