Rabbit Launches Web-Based 'Large Action Model' Agent on R1 – October 1 Release

The Rabbit r1 was the must-have gadget of early 2024, but its initial appeal quickly faded as the company's ambitious promises went unfulfilled. CEO Jesse Lyu acknowledges that "on day one, we set our expectations too high," but he assures users that an upcoming update will finally unlock the much-anticipated Large Action Model (LAM) for web accessibility.

While skeptics may view this as a case of too little, too late, or merely a shifting of goalposts, Rabbit's vision of creating a platform-agnostic agent for both web and mobile applications still holds essential—even if largely theoretical—value.

Lyu shared that the last six months have involved a flurry of shipping, bug fixes, and enhancements to response times and features. However, despite 16 over-the-air updates, the r1 remains mainly confined to interacting with a specific Large Language Model (LLM) or accessing one of just seven designated services, such as Uber and Spotify. "That was the first version of the LAM, trained on data gathered from laborers, but it isn't versatile—it only connects to those services," he explained. Regardless of the model's name, it has not delivered the capabilities Rabbit promoted at launch.

A Generalist Web-Based Agent

Rabbit is now preparing to release the first generic version of the LAM, which Lyu demonstrated for me. This update will introduce a web-based agent based on the existing WebVoyager, capable of reasoning through tasks like purchasing concert tickets, registering a website, or even playing online games. "Our goal is clear: by the end of September, your r1 will suddenly have expanded capabilities. It should support everything you can accomplish on any website," Lyu stated. (The company later confirmed a final release date of October 1 for the update.)

When given a task, the agent breaks it down into manageable steps and executes them by analyzing what it sees on the screen—buttons, fields, and images—regardless of their layout. It interacts with the necessary elements based on its learned understanding of how websites function.

For instance, I instructed it (through Lyu's remote operation) to register a new domain for a film festival. The agent quickly navigated Google to find domain registries, chose one, entered "film festival" into the domain field, and selected "filmfestival2023.com" for $14. I hadn’t provided any constraints like "for 2025" or "horror festival."

In another test, when Lyu prompted the agent to search for and purchase an r1, it promptly found numerous listings on eBay—a great result for a user but not ideal for the company’s founder presenting to the press. Amused, Lyu redirected the instruction to limit the search to the official website, and the agent succeeded.

Next, he had it engage with Dictionary.com’s daily word game. While it required some prompt tweaking (as the model tried to end the game prematurely), it ultimately completed the task.

Which browser does it use? Lyu indicated it operates from a fresh cloud-based browser, but they are also working on local versions, such as a Chrome extension, which would enable users to leverage their existing sessions without needing to log into services each time.

Addressing privacy concerns—rightly held by users—Lyu reassured that the agent does not have access to personal credentials. He mentioned a future possibility for a small, isolated language model that could safely manage logins, though the specifics of this approach remain to be clarified as the technology evolves.

Still Evolving

The demonstration highlighted several points. First, if we assume the company and its developers aren’t engaging in an elaborate deception (as some skeptics suggest), it appears that there is indeed a functioning general-purpose web agent on the horizon. While not unique, it may be among the first easily accessible to consumers.

"There are companies focusing on niche markets, like Excel or legal documents, but I believe this could be one of the first general agents for consumers," Lyu shared. "The goal is to execute any task achievable through a website. We will establish the generic agent for websites first and then expand to apps."

Second, it underscored the importance of prompt engineering. The phrasing of requests significantly influences success rates, a hurdle most average consumers may find challenging.

Lyu noted that the current version is still a "playground version," indicating that while it is a fully functional general web agent, improvements are necessary. For example, he pointed out, "the model can strategize, but it doesn’t navigate steps intelligently." The system won’t learn user preferences, like avoiding eBay for electronics or scrolling down past sponsored results.

User data is not yet being collected to enhance the model since an effective evaluation method for this type of system is still underdeveloped. However, a “teach mode” is expected to allow users to demonstrate how to perform specific tasks.

Interestingly, the company is also developing a desktop agent that could interact with various applications like word processors and music players. While still in early stages, it shows promise. "You won't even need to specify an action; it will attempt to utilize the computer as long as there's an interface," Lyu explained.

The Case for a Platform

Despite the potential, there still isn’t a "killer app" that clearly showcases the agent's utility. It’s an impressive tool, but for users like me who already spend eight hours a day in front of a browser, its practicality remains uncertain. There are undoubtedly great applications, but none readily come to mind that equate to the clear benefits of, say, a robotic vacuum.

I raised the frequent critique of Rabbit's business model: "Why not create an app instead?" Lyu responded with confidence, having clearly encountered this question before.

"If you break down the numbers, it doesn’t add up," he replied. "While technically feasible, you would inadvertently upset Apple and Google from the start. They would never allow this to outperform Siri or Gemini. Just like there’s no way Apple’s intelligence is better at managing Google’s services, and vice versa. And don’t forget they take 30% of the revenue! If we had simply launched an app from the outset, we wouldn’t have gained this momentum."

Rabbit envisions a third-party AI or device that can interact with all your services from an external standpoint. Lyu described it as "a cross-platform, generic agent system." "We’re aiming to control every UI, starting with the web, and eventually expanding to Windows, MacOS, and mobile platforms."

On the note of future ambitions, Lyu mentioned, "We never stated we wouldn’t consider building a phone in the future." Could this be contrary to their original goal of developing a simpler device? Perhaps.

In the meantime, Rabbit is focused on delivering on the promises made earlier this year. The new model is set to roll out to all r1 owners this week with the over-the-air update. Instructions on how to access the features will be provided alongside the update. Lyu offered a realistic reminder for eager users.

"We're managing expectations. It’s not perfect," he cautioned. "But it’s the best that humanity has achieved so far."

Most people like

Find AI tools in YBX