Claude 3.5 Sonnet Upgrade: The Most Advanced AI Model for Coding Just Got Even Better!

Anthropic Unveils Upgraded Claude 3.5 Sonnet and New Claude 3.5 Haiku

Yesterday, Anthropic announced the upgrade of Claude 3.5 Sonnet and launched Claude 3.5 Haiku. The enhanced Claude 3.5 Sonnet boasts significant improvements across multiple aspects, particularly in programming capabilities, an area where it has already excelled.

Recently, an 8-year-old girl made headlines for developing a web application using Cursor, which relied on the previous version of Claude 3.5 Sonnet. With this latest upgrade, the model has only become more powerful.

The newly launched Claude 3.5 Haiku has also shown impressive performance in various evaluations, matching the capabilities of its predecessor, Claude 3 Opus. Notably, the API pricing remains unchanged, with performance and speed comparable to previous versions.

AI Now operating Computers

In this update, Anthropic revealed a new feature currently in the testing phase—enabling the model to control computers. With this capability, Claude can learn computer skills to utilize tools and software without needing specific designs for each task. This innovation could automate repetitive processes, build and test software, and even handle open-ended tasks.

The introduction of this feature raises concerns about potential misuse, particularly in the context of existing hacks or data manipulation. To address these risks, Anthropic is implementing safety measures, including a newly developed classifier.

Developers can already access this functionality through APIs, allowing Claude to transform user commands into actions like viewing spreadsheets, opening browsers, navigating pages, clicking buttons, and filling out forms.

Claude 3.5 Sonnet is the first AI model in public beta to offer such computer operation capabilities. While still experimental, it faces challenges in executing actions like scrolling, dragging, and zooming.

Companies like Asana, Canva, Cognition, and Replit have begun exploring the new features of Claude 3.5 Sonnet, particularly for computer operations and user interface navigation, completing complex tasks that could take dozens or even hundreds of steps.

Developers can use the computer operation feature now via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

Claude 3.5 Sonnet: Exceptional Performance Across Industries, Significant Programming Improvements

Testing indicates that Claude 3.5 Sonnet has excelled in various industry benchmarks, particularly in programming tasks and tool utilization. In the SWE-bench Verified tests, its programming performance improved from 33.4% to 49.0%, surpassing other inference models like OpenAI's o1-preview and specially designed programming systems.

In TAU-bench's tool usage tasks, its score in the retail sector rose from 62.6% to 69.2%, while in the more challenging aviation sector, it improved from 36.0% to 46.0%.

Though benchmark scores are illustrative, client feedback highlights a significant leap in AI-driven programming capabilities with the new Claude 3.5 Sonnet.

GitLab noted that the model's reasoning ability in DevSecOps tasks improved by about 10% without increasing latency, making it suitable for multi-step software development processes.

Cognition reported enhanced performance in autonomous AI assessments, particularly in programming, planning, and problem-solving compared to the previous version.

The Browser Company found that Claude 3.5 Sonnet outperformed every other model they had tested for automating web workflows.

In terms of security, Claude 3.5 Sonnet has proven robust, having passed catastrophic risk assessments and meeting the ASL-2 standards outlined in the company's "responsible scaling policy."

Remarkably, despite its enhancements, the upgraded version of Claude 3.5 Sonnet maintains its original pricing and response speed.

The Claude 3.5 Sonnet upgrade is now available to all users.

Claude 3.5 Haiku: Compelling Performance Comparable to Its Predecessor

Among the Claude models, Opus is the largest, Sonnet is mid-tier, and Haiku is the smallest and fastest. Claude 3.5 Haiku matches the price and speed of the earlier Claude 3 Haiku while providing significant improvements across the board.

Claude 3.5 Haiku has surpassed Claude 3 Opus in intelligent benchmarking tests, which was the previous leading model.

It has demonstrated strong performance in programming tasks as well, achieving a score of 40.6% in the SWE-bench Verified tests, outpacing older models like Claude 3.5 Sonnet and GPT-4o.

Claude 3.5 Haiku offers low latency, improved command execution capabilities, and more accurate tool usage, making it exceptionally suited for user-facing products, sub-agent tasks, and generating personalized experiences from extensive data sets like purchase history, pricing, and inventory records.

This model will be released later this month, supporting API access through Amazon Bedrock and Google Cloud's Vertex AI, initially with text input capabilities, and plans to incorporate image input features in the future.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles