Empowering AI to Operate Computers Like Humans: Discover Anthropic's New Claude 3.5 Sonnet Model Features

On October 23, Anthropic, a competitor of OpenAI, announced the release of upgraded models: Claude 3.5 Sonnet and the new Claude 3.5 Haiku.

According to Anthropic, the upgraded Claude 3.5 Sonnet now possesses a revolutionary capability — the ability to interact with computers like a human. This enhanced model surpasses its predecessors across various metrics, particularly demonstrating remarkable improvements in coding capabilities, further solidifying its leadership in the industry.

The new Claude 3.5 Haiku matches the performance of Anthropic's previous largest model, Claude 3 Opus, in many assessments while maintaining the same cost and speed as its predecessor.

The upgraded Claude 3.5 Sonnet is now available to all users. Starting today, developers can utilize the computer interaction beta through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The new Claude 3.5 Haiku is expected to launch later this month.

Claude 3.5 Sonnet: Enhanced Coding Skills and User Interaction

The upgraded Claude 3.5 Sonnet has excelled in industry benchmarking, scoring 49% in the SWE-bench Verified coding tests, a significant rise from 33%. This score surpasses all publicly available models to date. In the TAU-bench, which evaluates a model's adaptability to tools and API capabilities, its retail sector score increased from 62.6% to 69.2%, while its aviation sector score rose from 36.0% to 46.0%. Notably, the operational costs remain consistent with earlier versions.

Feedback from early users indicates that the improved Claude 3.5 Sonnet marks a significant advancement in AI coding. For instance, GitLab tested the model in DevSecOps tasks, finding it delivered up to 10% stronger reasoning ability without increased latency, making it ideal for supporting multi-step software development processes.

Additionally, Anthropic introduced a groundbreaking feature in the Claude 3.5 Sonnet public beta: computer interaction. Through the API, developers can guide Claude to operate a computer, mimicking human actions like viewing a screen, moving the cursor, clicking buttons, and entering text.

Anthropic highlights that Claude 3.5 Sonnet is the first AI model to provide “computer interaction” in public testing. Currently experimental, this capability is expected to evolve rapidly over time.

According to Anthropic, rather than creating specific tools for Claude, the model has been trained in general computer skills, allowing it to use various standard tools and software designed for human use. Developers can leverage this new functionality to automate repetitive processes, build and test software, and perform open-ended research tasks.

Anthropic has built an API that allows Claude to perceive and interact with computer interfaces. Developers can integrate this API, enabling Claude to translate directives (e.g., “Use my computer and online data to fill out this form”) into computer commands (e.g., checking spreadsheets, moving the cursor to open a web browser, navigating to relevant pages, and using data from those pages to fill out forms).

In evaluations assessing AI models' capabilities to operate computers like humans in the OSWorld benchmark, Claude 3.5 Sonnet scored 14.9% in the "screenshot-only" category, significantly surpassing the next best AI system's score of 7.8%. As more steps are added to complete tasks, Claude's score rises to 22.0%.

However, Anthropic emphasizes that Claude 3.5 Sonnet’s computer interaction abilities are not yet perfect. Tasks like scrolling, dragging, and zooming, which humans perform effortlessly, currently pose challenges for Claude, prompting developers to start with low-risk tasks.

Companies such as Asana, Canva, Cognition, DoorDash, Replit, and various browser developers are exploring these functionalities, executing tasks that require dozens or even hundreds of steps. For example, Replit is using the “computer interaction” and UI navigation features of Claude 3.5 Sonnet to develop a key functionality that evaluates applications for their Replit Agent product.

Claude 3.5 Haiku: The Fastest Model Yet

The newly launched Claude 3.5 Haiku is Anthropic's fastest model to date. At the same cost and similar speed as Claude 3 Haiku, this iteration offers improvements across all skill sets and exceeds the performance of Anthropic's previous largest model, Claude 3 Opus, on many intelligence benchmarks.

Notably, Claude 3.5 Haiku excels in coding tasks, achieving a score of 40.6% on the SWE-bench Verified, surpassing both the original Claude 3.5 Sonnet and GPT-4o.

With low latency, improved instruction adherence, and enhanced tool usage accuracy, Claude 3.5 Haiku is well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from large datasets (such as purchase histories, pricing, or inventory records).

Anthropic has announced that Claude 3.5 Haiku will be available later this month, with image input capabilities set to follow.

Founded by former OpenAI employees, Anthropic counts Amazon as a significant investor. In March, Amazon announced a $4 billion investment in Anthropic aimed at advancing the development of generative AI technologies.

In March 2023, Anthropic released the Claude 3 series models—Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus—subsequently iterating on their capabilities. In June, they launched the powerful Claude 3.5 Sonnet, which offers double the reasoning speed and one-fifth the invocation costs compared to earlier models.

Most people like

Find AI tools in YBX