Enhanced Claude 3.5 Now Live: Welcome to the Era of One-Command Control Over Your Computer

Introducing Claude's Game-Changing “Computer Use” Feature

The latest innovation from Claude is the groundbreaking “computer use” feature, designed to transform the AI into a genuine agent capable of understanding user intentions and autonomously executing tasks. For AI to truly become versatile, it must learn not only to generate text and create art but also to interact and operate within various software environments. This capability fosters a significant level of independent exploration and problem-solving.

Upgraded Models: Claude 3.5 Sonnet and Claude 3.5 Haiku

The newly released Claude 3.5 Sonnet builds on previous iterations of the Claude model, which is categorized into three sizes: Opus, Sonnet, and Haiku, in descending order of capacity. In March, Claude launched its complete series, including models from Opus to Haiku. By June, the Claude 3.5 Sonnet was unveiled as the only upgrade, surpassing previous models despite its smaller parameter size.

Now, we have the enhanced Claude 3.5 Sonnet alongside the new Claude 3.5 Haiku. It's interesting to note that while Claude 3.5 Haiku has updated knowledge through training with a cutoff date of July, Claude 3.5 Sonnet retains its existing knowledge base but benefits from additional reinforcement learning and computer operation training.

In terms of overall performance, Claude 3.5 Sonnet stands out exceptionally, excelling in reasoning, foundational knowledge, and coding abilities. Unlike many models that inflate their scores, Claude’s benchmarks are credible and reliable.

Real-World Capabilities

On the Claude website, I tested the upgraded Claude 3.5 Sonnet with a simple request: “Generate a highly polished Tetris game.” The AI successfully produced 280 lines of code that resulted in a fully playable game—impressive indeed!

The Claude 3.5 Haiku

The Claude 3.5 Haiku is a more conventional upgrade but boasts remarkable speed and cost-effectiveness. It decisively outperforms the larger Claude 3 Opus model under the same operational conditions.

The Breakthrough: Computer Use

The standout feature is Claude’s “computer use,” enabling real-time analysis of user interactions on their computer screens. This function allows Claude to autonomously perform online tasks such as browsing, clicking, and inputting data.

According to Anthropic, “Claude 3.5 Sonnet can move the cursor, click on relevant areas, and input information via a virtual keyboard—all of which simulate human-computer interactions.” This makes Claude a true agent capable of interpreting user intentions and executing commands.

Previously, agents were more like robotic process automation (RPA), following pre-configured workflows without adaptability. In contrast, a real agent should comprehend complex semantics and translate them into actionable steps. Beyond writing and drawing, we need AI that can operate software effectively, encouraging strong autonomous exploration and problem-solving skills.

The upgraded Claude 3.5 has demonstrated the ability to operate straightforward software, including self-correction and continuous trial and error—an application of reinforcement learning and self-play.

Currently, Claude scores 14.9% in the OSWorld benchmark for developer testing of computer usage. While still below the human level of 70-75%, this score is significantly higher than the 7.7% benchmark of the leading AI models today, showcasing the advancements Claude has made in bridging the gap.

In summary, Claude's new features signify a major leap in AI capabilities, setting a promising foundation for future developments in autonomous technology.

Most people like

Find AI tools in YBX