If you're interested in AI-generated video, you're likely familiar with the leading players in this rapidly growing field: Runway ML with its Gen-3 Alpha Turbo model, OpenAI's upcoming Sora, Luma's Dream Machine, and Pika's self-titled AI video generator.
Now, there's a new contender: Hotshot. Founded in 2023 by Aakash Sastry, John Mullan, and Duncan Crawbuck, Hotshot has launched its text-to-video AI generator as an early public preview.
In a post on X, Sastry expressed excitement about the potential for powerful video applications, stating, “For the first time in over a decade, it’s possible to build powerful and novel video applications for customers. This model is our foundation for building those experiences, and this is only the beginning.”
Hotshot is currently available for free at Hotshot.co, allowing users to generate videos without watermarks, though the free version limits you to two generations per day.
Hotshot's Background
Originally, Hotshot debuted as a free app for AI photo creation and editing, but that project has transitioned to focus on the new text-to-video model. Sastry shared that the founding team has over a decade of experience in building consumer apps and is financially backed by investors including Lachy Groom and Alexis Ohanian.
Model Development in Just Four Months
In a detailed paper, the co-founders, along with team member Chaitu Aluru, describe Hotshot as a “text-to-video model that generates up to 10 seconds of footage at 720p,” developed in just four months. Prior to this, the team created an open-source model, Hotshot-XL, which generates 1-second videos at 8 frames per second and has attracted over 20,000 monthly users. They also produced Hotshot Act-One, capable of generating 3-second clips with the same frame rate, but the current model is their most ambitious project yet.
The team utilized 600 million clips and thousands of GPUs, facing challenges such as hardware failures while pushing limits in video model training. “Managing this pipeline was a 24/7 job for one of our team members for an entire month,” they noted.
They also trained a new autoencoder to compress videos spatially and temporally, allowing for smaller file sizes while preserving essential data for AI model training.
Hotshot's Capabilities
The new Hotshot model is highly versatile, with plans for longer video durations, higher resolutions, and additional features like audio integration. Sastry showcased various styles that Hotshot can produce, ranging from comic book-style animations to rotoscoped videos.
In a thread on X, Sastry shared his vision for AI-generated content becoming integral to digital media, anticipating that within the next year, entire YouTube videos will be AI-created, with creators controlling the process from text to video to audio generation.
Sastry claims Hotshot is currently the most advanced publicly available model in its category. However, testing the model revealed mixed results—one video of a “unicorn riding through Paris” resulted in a somewhat convincing horse video, showcasing strong potential but falling short in quality and detail compared to some competitors. As the landscape of AI video generation evolves, increased competition will ideally lead to more options and better outcomes for users.