MiniMax's Yan Junjie: Fast Large Models Are Great, but Sometimes Slower Approaches Lead to Greater Speed

Home AI News MiniMax's Yan Junjie: Fast Large Models Are Great, but Sometimes Slower Approaches Lead to Greater Speed

Updated on October 12 2024

In the fast-paced realm of large models, sometimes a slower approach can lead to faster results. On August 31, the AI unicorn MiniMax, based in Shanghai, quietly unveiled its video model, abab-video-1, at its inaugural developer conference, "MiniMaxLink Partner Day." This innovative model allows users to generate high-resolution, high-frame-rate videos that can be up to six seconds long simply by inputting prompt keywords. MiniMax's video model is akin to OpenAI's Sora, which also focuses on text-to-video generation.

According to Yan Junjie, founder and CEO of MiniMax, speed is a fundamental goal in the development of their large model technology. However, the release of their video model was delayed by several months compared to Sora. "The main reason for our one to two month delay is that we are addressing a tougher technical challenge: how to natively train models that require substantial computational power," Yan explained. He elaborated that during the process of training the video generation capability, videos must first be converted into tokens, which can be lengthy and complex. "Our primary focus earlier this year was to reduce this complexity and enhance the compression rate, which contributed to the delay."

Internal evaluations suggest that MiniMax's video model outperforms Runway’s offerings. Currently, the company has implemented a subscription model for its product. When asked about the commercial strategy for their video model, Yan stated, “We plan to wait for another week or two to refine the model further before considering commercialization options.” Despite AI-generated videos not yet replacing traditional rendering engines, he sees potential for them in high-profile projects, stating they provide at least a glimpse of future possibilities.

Yan outlined MiniMax’s dual commercialization strategy: "One aspect involves our open platform, which boasts over 2,000 clients, including well-known internet companies and traditional enterprises. Many users already leverage capabilities for sound and visual content, and not every company, like Kuaishou, can handle this in-house, making us an ideal partner for B2B collaborations." The second avenue involves monetizing ads within the company’s own products. He emphasized that the current priority is not immediate commercialization but ensuring that the technology is accessible and widely usable.

The emergence of AI-generated video has become a notable trend among major model manufacturers. OpenAI initiated this movement earlier this year in February with Sora, which has yet to undergo public testing. Other companies, including Shengshu Technology and Kuaishou, have since released similar models, showcasing significant technological advancements.

When discussing MiniMax's strategic motives for developing video models, Yan pointed out the critical nature of visual content consumption, noting that text comprises a relatively small portion of what people engage with daily. "To achieve extensive user coverage and deeper engagement, the only path for large model companies is to output multimodal content, not just text," he added. MiniMax’s progression has followed a consistent strategy: starting with language models, then evolving into sound and image models, and now onto video.

AI engineer Zhang Yuxuan commented on MiniMax’s strong algorithm capabilities, even though specific parameters of their video model have not been disclosed. He contrasted their technology with Kuaishou's, which he views as being technically superior. Yan affirmed, “Whether for video, text, or sound, our team's core approach is not just to improve algorithms by a small percentage. Instead, we aim for substantial increases in performance.”

The current version of MiniMax's video model will be available for users to test free of charge temporarily, with plans for more advanced iterations in the pipeline. “Future updates will enhance data and algorithms, making the model easier to use,” Yan noted, mentioning that features for generating images and editable video capabilities are also in development.

As technological advancements in AI continue to progress, the demand for real-time generated games is burgeoning. Google's recent paper highlighted their creation of the first fully AI-driven real-time game engine, GameNGen, capable of producing 20 frames per second in the classic shooter Doom. When asked about the future of AI in producing high-end 3A games, Yan acknowledged that while current projects like "Black Myth: Wukong" rely on traditional modeling and rendering methods, the rapid advancements in video generation technology hold great promise. He emphasized that the improvements observed this year signify only the beginning of what AI can achieve in this space.

MiniMax has experienced significant growth and increased competitiveness, prompting them to focus on enhancing model speed and efficiency. Maintaining this momentum involves overcoming persistent challenges in the industry, such as reducing error rates, supporting infinite input and output, and advancing multimodal capabilities.

Founded in December 2021 and led by former SenseTime vice president Yan Junjie, MiniMax has undergone substantial evolution, securing substantial funding in its Series A and B rounds from major backers like Alibaba and Tencent, achieving a valuation of $2.5 billion. The company's diverse offerings cater to both B2B and B2C markets, showcasing products ranging from AI chat applications to customized API solutions for enterprises. Currently, MiniMax's models facilitate over 3 billion interactions with global users daily, showcasing remarkable growth in user engagement and response time compared to established players like ChatGPT.

As the AI model landscape evolves, the emergence of competitive pricing has encouraged traditional companies to adopt these technologies, enhancing overall model usage and performance, while simultaneously establishing a competitive foothold in international markets.

Goldman Sachs: The AI Wave Could Weigh on Oil Prices in the Next Decade

AI Tools for Accurate Descriptions of Cellular Metabolic States

Most people like

Imagga Image Recognition API

36.7K

Unlock the potential of advanced image recognition technology with our powerful API designed for efficient tagging, seamless categorization, and accurate face recognition. Whether you're looking to enhance your applications or streamline your image processing workflow, our API provides the tools you need to elevate user experiences and improve organizational efficiency. Transform how you manage visual content with cutting-edge solutions tailored to meet your unique needs.

Image tagging AI Image Recognition

MixAudio

71.5K

Immerse yourself in the exciting world of music creation with our cutting-edge Multimodal AI Music Generator. In just 2 seconds, you can generate 4 unique tracks that cater to all creators, from budding musicians to seasoned professionals. Unlock your creative potential and elevate your projects with innovative AI-driven music tailored to your vision!

AI music AI Music Generator

Super Teacher

5.6K

Super Teacher provides unlimited private lessons in a wide range of subjects for children aged 3-8, delivering superior results compared to traditional private tutoring.

education AI Education Assistant

Crayo AI

Your all-in-one tool for creating AI voiceovers, engaging subtitles, optimized gameplay, and more.

AI voiceover AI Video Generator

Find AI tools in YBX