Meta's Image Generation Model Expands: Now Includes Video and Enhanced Image Editing Features

Home AI News Meta's Image Generation Model Expands: Now Includes Video and Enhanced Image Editing Features

Updated on October 25 2024

Meta has made significant strides in the realm of generative AI with the launch of its upgraded image generation foundation model, Emu (Expressive Media Universe). This powerful model now boasts the ability to generate videos from text, alongside enhanced capabilities for precise image editing.

Initially showcased during the Meta Connect event in September, Emu's technology provides a foundation for many dynamic generative AI experiences across Meta's social media platforms. For example, it enhances image editing tools on Instagram, allowing users to seamlessly change a photo's visual style or its background. Emu is integrated into Meta AI, a new user-assistant platform that operates similarly to OpenAI’s ChatGPT.

The new Emu Video model stands out for its dual capability to produce videos based on natural language text, images, or a combination of both. Unlike previous models such as Make-a-Video, which relied on five diffusion models, Emu Video operates using a more streamlined approach employing just two. The process unfolds in two main steps: first, it generates an image based on the text prompt, and then it creates a video guided by both the text and image prompts. This simplified methodology enables more efficient training of video generation models. In user studies, Emu Video outperformed Make-a-Video, with 96% of participants preferring the quality and 85% agreeing that it adhered more closely to their text prompts. Additionally, Emu Video can bring to life images uploaded by users, animating them according to specific text prompts.

Another exciting update is the introduction of Emu Edit, which enhances the editing capabilities of images using natural language instructions. Users can upload an image and specify the adjustments they wish to see. For instance, they can request the removal of an element, like a poodle, and replace it with a different object, such as a red bench—simply by typing their request. While there are existing AI-driven image alteration tools, such as Stable Diffusion-powered ClipDrop and image editing features on Runway, Meta’s researchers noted that existing methods often result in over-modification or under-performance in editing tasks.

In a blog post, Meta emphasized that the goal should not only be to create a "believable" image but to focus on accurately modifying only the pixels pertinent to the user's specific request. The team discovered that integrating computer vision tasks as instructions for image generation models delivers unparalleled control in the editing process.

To develop Emu Edit, Meta utilized a comprehensive dataset of 10 million synthesized images, each comprising an input image, a detailed task description, and the targeted output image. This allows the model to adhere closely to user instructions while maintaining the integrity of unrelated elements in the original image.

For those interested in exploring Emu Edit’s capabilities, they can view the generated images on Hugging Face. Additionally, Meta has introduced the Emu Edit Test Set, a new benchmark designed to facilitate further testing of image editing models. This set includes seven different image editing tasks, such as background alterations and object removals, paving the way for advancements in precise image editing technologies.

Revolutionary AI-Enhanced Backpack: Your New Hands-Free Digital Companion for Everyday Life

Nvidia CEO Promotes H200 Chips as the 'Next Evolution in AI' Approaches

Most people like

Arize AI

Enhance your model velocity and optimize AI outcomes for superior performance.

AI Observability Large Language Models (LLMs)

NSFW AI Chat

Engage with an AI designed to create NSFW images, allowing you to explore your sexuality in a safe and private environment.

NSFW AI chat AI Girlfriend

DataVisor

Introducing an AI-powered fraud management platform designed specifically for enterprises to effectively mitigate risks and protect their assets. This innovative solution harnesses advanced algorithms to detect and prevent fraudulent activities, ensuring a secure environment for your business operations.

Fraud detection Other

Mapify

Unlock your creativity with our free AI-powered mind mapping tool. Effortlessly brainstorm, organize thoughts, and visualize ideas to enhance productivity and collaboration. Whether you're planning a project, studying, or generating new concepts, this intuitive mind mapping software is designed to help you succeed. Start exploring endless possibilities today!

mind mapping AI Mind Mapping

Find AI tools in YBX