Meta's Fundamental AI Research (FAIR) team is unveiling several new AI models and tools for researchers, focusing on audio generation, text-to-vision capabilities, and watermarking technologies.
“By sharing our early research publicly, we aspire to inspire innovation and advance AI in a responsible manner,” the company stated in a press release.
Audio Creation Model: JASCO and Watermarking Tools
Meta introduces JASCO, which stands for Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation. This model enhances audio creation by allowing users to input various elements, such as chords or beats, to refine the final output. FAIR's research indicates that JASCO enables users to manipulate characteristics of the generated audio—like chords, drums, and melodies—through textual commands, facilitating the desired sound.
FAIR will release the JASCO inference code as part of its AudioCraft AI audio model library under an MIT license, while the pre-trained model will be available on a non-commercial Creative Commons license. Additionally, Meta is launching AudioSeal, an innovative tool that watermarks AI-generated speech, helping to identify such content more effectively.
Meta asserts, “AudioSeal is the first audio watermarking technique designed specifically for localized detection of AI-generated speech, enabling the identification of AI-created segments within longer audio files.” This tool enhances detection efficiency, reportedly increasing detection speed by 485 times compared to traditional methods. Unlike other models, AudioSeal will be released under a commercial license.
Chameleon Model Release
FAIR is also planning to release two versions of its multimodal text model, Chameleon, under a research-only license. The Chameleon 7B and 34B models are designed for tasks that require visual and textual understanding, such as image captioning. However, Meta has announced that it will not make the Chameleon image generation model available at this time, limiting access to the text-related functionalities.
Furthermore, researchers will gain access to a multi-token prediction method that trains language models on multiple future words simultaneously rather than sequentially. This feature will be accessible exclusively under a non-commercial and research-only license.