Anthropic's new Claude 3.5 Sonnet AI model introduces an innovative feature in public beta that enables computer control through screen observation. This feature, dubbed "computer use," is available via API, allowing developers to instruct Claude to perform tasks on a computer similar to human interactions, as demonstrated with a Mac in the accompanying video.
While Microsoft's Copilot Vision and OpenAI's ChatGPT desktop app have showcased AI capabilities using screen visibility, as has Google's Gemini app for Android, none have yet widely deployed tools that autonomously click and execute tasks. Although Rabbit promised similar functionalities for its R1, these have not yet materialized.
Anthropic advises that the computer use feature is still experimental and may be "cumbersome and error-prone." The company states that they are releasing this feature early to gather developer feedback, with the expectation of rapid improvements.
Currently, there are several common actions—such as dragging and zooming—that Claude is unable to perform. Its "flipbook" approach to screen observation, which involves taking screenshots and stitching them together, may cause it to miss fleeting actions or notifications.
Additionally, this version of Claude has been configured to avoid engaging with social media, implementing measures to monitor requests related to election activity. It is also designed to prevent tasks like generating social media content, registering web domains, or engaging with government websites.
Moreover, the Claude 3.5 Sonnet model boasts significant enhancements across various benchmarks while maintaining the same pricing and speed as its predecessor. Notably, it has achieved impressive gains in coding and tool use tasks. Performance on the SWE-bench Verified coding benchmark has increased from 33.4% to 49.0%, surpassing all publicly available models, including reasoning and specialized agentic coding systems. It has also improved on the TAU-bench for agentic tool use tasks, with a rise from 62.6% to 69.2% in the retail sector and from 36.0% to 46.0% in the more complex airline industry.