Yesterday, Anthropic announced the release of the upgraded Claude 3.5 Sonnet and the new Claude 3.5 Haiku model. The enhanced Claude 3.5 Sonnet boasts comprehensive improvements over its predecessor, particularly in coding, where it now leads the field. The Claude 3.5 Haiku performs comparably to the previously largest model, Claude 3 Opus, while maintaining similar cost and speed to the earlier Haiku.
Companies like Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company are actively exploring these new capabilities, automating tasks that typically require dozens, if not hundreds, of steps. For instance, Replit is leveraging Claude 3.5 Sonnet’s proficiency in computer usage and user interface navigation to develop a critical feature for their Replit Agent product.
The upgraded Claude 3.5 Sonnet is now available to all users. Starting today, developers can utilize the computer usage beta on Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The new Claude 3.5 Haiku will be released later this month.
In industry benchmark tests, the upgraded Claude 3.5 Sonnet exhibited significant advancements, particularly in agent coding and tool usage tasks. In coding, it improved the SWE-bench Verified performance from 33.4% to 49.0%, outperforming all publicly available models, including OpenAI’s o1-preview and specialized systems designed for agent coding. Additionally, its performance on the tool usage tasks, TAU-bench, increased from 62.6% to 69.2% in the retail sector and from 36.0% to 46.0% in the more challenging aviation field. Impressively, this enhanced performance comes without any additional costs or slowdowns.
Early customer feedback suggests that the upgraded Claude 3.5 Sonnet signifies a major leap forward in AI-driven coding technology. GitLab, which tested the model for DevSecOps tasks, found its reasoning capabilities improved by up to 10% across various use cases without any increase in latency, making it an ideal choice for multi-step software development processes. Cognition reported substantial improvements in coding, planning, and problem-solving with the new Claude 3.5 Sonnet compared to prior versions. The Browser Company noted that the model excelled in web-based workflow automation, outperforming all previously tested models.
As part of their ongoing collaboration with external experts, Claude 3.5 Sonnet underwent joint pre-deployment testing conducted by both the U.S. AI Safety Institute and the UK AI Safety Institute.
Moreover, a catastrophic risk assessment of the upgraded Claude 3.5 Sonnet confirmed that the ASL-2 standards outlined in their “Responsible Scaling Policy” still apply to this model.
Claude 3.5 Haiku: Merging Affordability with Speed
Claude 3.5 Haiku is their next-generation fastest model. At the same cost and speed as Claude 3 Haiku, it demonstrates improved performance across various skill sets and even surpasses Claude 3 Opus in numerous intelligent benchmark tests. Notably, it achieved a SWE-bench verification score of 40.6% in coding tasks, outperforming many agents using publicly available best-in-class models, including the original Claude 3.5 Sonnet and GPT-4o.
With low latency, enhanced instruction tracking, and more accurate tool usage, Claude 3.5 Haiku is perfect for user-facing products, specialized sub-agent tasks, and generating personalized experiences from vast data sets, such as for purchases and subscriptions.
Claude 3.5 Haiku will be available later this month through their first-party API, Amazon Bedrock, and Google Cloud’s Vertex AI as an initial text model, with image input to follow.