On March 18, Moonlit Dark Side announced that its conversational AI assistant, Kimi, now supports 2 million words of lossless contextual input, a significant upgrade from the initial 200,000 words available at its launch last October. To enhance Kimi's functionality, the company has broadened its data sources. Xu Xinran, Vice President of Engineering, explained that Kimi employs various strategies to improve response times, with recent infrastructure optimizations tripling its generation speed. Kimi is accessible on web browsers, Android, iOS, and as a mini program.
According to SimilarWeb, Kimi's web version saw 2.919 million visits in February, reflecting a remarkable 104.99% increase from the previous month. Handling extended contextual windows is a vital competitive advantage for leading AI companies, especially for applications in long document question answering and summarization. CEO Yang Zhilin compared large models to computers, highlighting that long context functions like memory and has the potential to redefine computing paradigms.
However, extending contextual length introduces technical challenges, including diminishing intelligence and rising costs tied to techniques like retrieval-augmented generation (RAG) and sliding window methods. Yang emphasized that Moonlit Dark Side focuses on innovative network structures and engineering optimizations to address these challenges. Xu further noted that achieving lossless improvements in contextual length involves collaboration across various disciplines, including data, infrastructure, model training, and product design. The team has undertaken a comprehensive redesign of the model’s pre-training, alignment, and inference processes.
With the increase in contextual length, Kimi's applications have expanded. It can now serve traditional purposes like reading academic papers and analyzing financial reports while also exploring new areas, such as hosting tabletop role-playing games (TRPGs). However, the broader context length poses challenges for model evaluation. Previous evaluation methods relied on isolating unrelated sentences within extended text blocks, but as the industry moves towards specific performance metrics, this approach has become less relevant. Xu acknowledged that as contextual length increases, diverse evaluation metrics will emerge, remaining a topic of ongoing academic inquiry.
Despite Moonlit Dark Side's advancements in processing long texts, developments by other AI firms should not be neglected. Sora’s video generation capabilities position the Diffusion Transformer architecture (DiT) as a new industry benchmark, spurring companies like Shengshu Technology and Aishi Technology to ramp up efforts to catch up with Sora. During a recent communication session, Moonlit Dark Side did not reveal specific progress in multimodal technology, but co-founder Zhou Xinyu confirmed that research has been ongoing prior to Sora’s launch, with expectations for relevant product releases this year.
In less than a year since its inception, Moonlit Dark Side has successfully completed two major funding rounds, raising over $1 billion, which brings its valuation to $2.5 billion, all while operating with a lean team of around 80 employees. Zhou emphasized the importance of talent density over sheer size, indicating future hiring will prioritize adding individuals who enhance the team's overall skill level: “Every new hire should elevate the average skill level of the team.”