Inside Amazon's Just Walk Out Technology: Revolutionizing Retail Experience
On the first floor of an industrial modern office building, a select group of journalists has been invited to explore a secret lab at Amazon, showcasing the latest innovations in Just Walk Out (JWO) technology.
Streamlining Shopping Globally
Currently deployed in over 170 retail locations worldwide, JWO allows customers to enter a store, select items, and exit without stopping to pay at a cashier, greatly enhancing the shopping experience.
Introducing AI Agents
We're about to witness Amazon's advanced AI system, which utilizes multi-modal foundation models and transformer-based machine learning to analyze data from various in-store sensors. This technology, resembling the principles behind large language models like GPT, generates receipts instead of text, improving accuracy in intricate shopping scenarios and simplifying deployment for retailers.
A Tour with Jon Jenkins
Guided by Jon Jenkins, Vice President of JWO at Amazon, we navigate past coffee-sipping employees through glass security gates and along a short, dimly lit hallway to an unremarkable door. Inside, we find a full-scale replica of a local bodega, complete with shelves stocked with chips, candy, and beverages like Coca-Cola and Vitamin Water.
While the lab store resembles a typical retail environment, it’s equipped with electronic gates and a network of Amazon’s specialized 4-in-1 camera devices overhead.
How JWO Functions
At Amazon, JWO (pronounced “jay-woh”) employs a sophisticated mix of computer vision, sensor fusion, and machine learning to monitor what shoppers pick up or return to shelves. Building the store starts with creating a 3D map of the space using an iPhone or iPad, dividing it into product areas that align with inventory.
RGB cameras are suspended on a rail system from the ceiling, and weight sensors are positioned at each shelf’s front and back.
By tracking the orientation of customers' heads and hands, JWO detects item interactions. By integrating data from multiple cameras, weight sensors, and object recognition, the system accurately predicts whether a shopper has retained a specific item.
Enhanced Processing with AI
Jon explains that earlier systems relied on multiple models processing data sequentially, leading to slower and less accurate outcomes. Now, a single transformer model processes all inputs simultaneously, generating receipts efficiently. This innovation simplifies complex scenarios, like multiple shoppers interacting with products at once, while minimizing receipt delays.
With self-learning capabilities, JWO adapts effortlessly to new layouts and can identify misplaced items, making frictionless shopping experiences more reliable and accessible.
Edge Computing Supports JWO
A notable feature of JWO is its edge computing technology. Amazon confirmed that model inference occurs on-site, managed by edge computing hardware, which enhances speed and reduces bandwidth requirements. Each edge node, an 8x5x3 enclosure, is designed for efficient processing.
While details about the internal components remain undisclosed, speculation suggests they may include Amazon's GPUs, making AI inference more accessible and cost-effective compared to Nvidia's offerings.
Advancing with RFID Technology
Next, we move to another mock retail lab resembling a clothing store, where RFID technology integrates seamlessly into JWO. The AI framework remains consistent, utilizing a multi-modal transformer to process sensor inputs with minimal infrastructure—a significant cost and complexity advantage.
This RFID version of JWO could also cater to temporary retail environments, such as fairs and festivals, enhancing its market potential.
The Journey of JWO Development
Announced publicly in 2018, the JWO project’s research and development likely began years earlier. Although Jon declined to comment on team size or total investment, estimates suggest at least 250 employees work on JWO, with compensation averaging around $180,000 annually. This could imply total R&D costs between $250 million and $800 million.
This speculative figure underscores the considerable investment required for companies looking to develop similar systems from scratch, emphasizing the advantages of leveraging established technologies.
The Build vs. Buy Dilemma in AI
The financial considerations associated with creating a JWO-like system highlight the high-risk nature of R&D in enterprise AI and technology integration. Many decision-makers recognize that substantial investments in infrastructure and R&D are more viable for companies like Amazon, which can capitalize on economies of scale.
The complexity and costs of AI development deter most retailers from pursuing in-house solutions, making pre-integrated systems like JWO more appealing.
The Future of AI Integration
The advancements in JWO AI models reveal the ongoing influence of transformer architecture across the AI domain. This innovation isn't just transforming natural language processing; it’s also revolutionizing complex, multi-modal tasks in retail.
Amazon’s strategic shift toward third-party retailers by offering JWO through AWS addresses critical pain points while continuing to expand its retail footprint. The incorporation of RFID technology, recently announced, positions JWO for mass-market adoption. With a vast retail landscape to explore, cost-effective implementations could lead to widespread acceptance.
As AI and edge computing evolve, Amazon's JWO technology exemplifies how hyperscalers are reshaping the future of retail. By providing intricate AI solutions as easily deployable services, the success of JWO and similar models could significantly influence broader AI adoption across industries.