Meta achieved a significant milestone last year with the launch of Segment Anything, a machine learning model recognized for its ability to swiftly and accurately identify and outline various elements in images. The highly anticipated sequel, introduced by CEO Mark Zuckerberg during SIGGRAPH on Monday, extends this capability into the realm of video, underscoring the rapid pace of advancements in this field.
Segmentation refers to the process where a vision model analyzes an image and identifies its components—for example, recognizing “this is a dog” and “this is a tree behind the dog,” rather than mistakenly identifying “this is a tree growing out of a dog.” While segmentation techniques have evolved over decades, the recent advancements, including Segment Anything, have marked a significant leap in both speed and efficiency.
Segment Anything 2 (SA2) naturally progresses the technology by focusing on video rather than solely on still images. Although the previous model could technically process each video frame independently, this method lacks efficiency. “Scientists utilize this technology for studying environments like coral reefs and natural habitats. Being able to apply it to video in a zero-shot manner makes it truly innovative,” Zuckerberg shared in a discussion with Nvidia CEO Jensen Huang.
Processing video demands greater computational resources, highlighting the impressive industry strides in operational efficiency that allow SA2 to run without overwhelming data centers. While SA2 is still a large model requiring robust hardware, rapid and flexible segmentation capabilities were nearly unachievable just a year ago.
Like its predecessor, this new model will be open and free to use, but there's no indication of a hosted version being available—an option some AI companies provide. However, a free demo is accessible.
Training such a model requires extensive data; therefore, Meta is also launching a substantial, annotated database containing 50,000 videos specifically created for this purpose. Additionally, the research paper on SA2 references another internal database of over 100,000 videos used for training, which won’t be publicly released. I have reached out to Meta for more information regarding this dataset and the reasoning behind its confidentiality. It’s speculated that it may originate from public Instagram and Facebook profiles.
Meta has positioned itself as a pioneer in the open AI sector for several years now. Zuckerberg noted that although Meta has a history of developing open-source tools like PyTorch, more recent initiatives such as LLaMa and Segment Anything have set a more accessible standard for AI performance—albeit the term “openness” is subject to interpretation.
Zuckerberg emphasized that their commitment to openness isn’t purely altruistic. “This isn’t merely a software solution; it requires an entire ecosystem. It wouldn't be nearly as effective without open sourcing it. Our intent isn’t just goodwill; it’s about optimizing the tools we develop to ensure they are as effective as possible for the community.”
Clearly, this model is poised for extensive use. Explore the GitHub repository here.