August Ranking Release for Chinese Multimodal Large Model SuperCLUE-V: Tencent's Hunyuan Takes the Lead

On August 5, Tencent Technology reported the release of the August rankings for the Chinese multimodal model benchmark, SuperCLUE-V. The Tencent Hunyuan model achieved the highest score among Chinese models, earning 71.95 points. According to the report, this model effectively identifies image elements and generates natural language descriptions, demonstrating a comprehensive understanding of intricate details.

The evaluation assessed 12 of the most prominent multimodal understanding models in China, with Tencent's Hunyuan model placing second overall, just behind GPT-4o, which scored 74.36 points. GPT-4o leads the multimodal benchmark with strong performance in both foundational multimodal cognitive abilities and application skills, each scoring above 70.

SuperCLUE's evaluation indicated that while Chinese models have made significant strides, there remains a gap when compared to their overseas counterparts, particularly in fine-grained visual recognition tasks, where the leading Chinese models lag by approximately 5 points. The assessment included a mix of four open-source and eight closed-source models to further evaluate their respective advancements in multimodal capabilities.

Most people like

Find AI tools in YBX