Meta has recently announced the launch of five cutting-edge AI models and research projects. These innovations include advanced multi-modal systems that integrate text and image processing, next-generation language models, new approaches to music generation, AI speech detection technologies, and significant efforts to improve diversity in AI-generated content.
These advancements are the result of work by Meta’s Fundamental AI Research (FAIR) team, which has been pioneering AI research through open collaboration for over ten years. As AI continues to evolve rapidly, Meta underscores the importance of global cooperation in advancing the field responsibly.
Chameleon: A Breakthrough in Multi-Modal AI
One of the key releases is the Chameleon model, which is available under a research license. Chameleon represents a series of multi-modal models that can simultaneously understand and generate both text and images. Unlike most existing large language models that focus on a single modality, Chameleon is designed to handle both.
“Just as humans can process both words and images together, Chameleon can interpret and produce text and images simultaneously,” Meta explained. “Chameleon can work with any combination of text and images as inputs and produce outputs in the same format.”
The applications for Chameleon are vast, from generating creative captions to producing new scenes based on combined text and image inputs.
Faster Language Model Training with Multi-Token Prediction
Meta has also introduced pre-trained models for code completion that leverage ‘multi-token prediction’ under a non-commercial research license. Traditional language model training often involves predicting one word at a time, which can be inefficient. The new multi-token models can predict several future words at once, making the training process much faster.
“The traditional approach, while simple and scalable, is inefficient as it requires significantly more text than what children need to achieve similar language fluency,” Meta noted.
JASCO: Advancing Text-to-Music Generation
On the creative side, Meta’s JASCO model facilitates the creation of music clips from text, offering enhanced control by accepting additional inputs like chords and beats.
“Existing text-to-music models like MusicGen primarily use text inputs. JASCO, however, can take in various inputs, such as chords or beats, providing better control over the generated music,” Meta explained.
AudioSeal: AI-Generated Speech Detection
Meta has unveiled AudioSeal, an audio watermarking system designed to detect AI-generated speech. It can identify AI-generated segments within larger audio clips up to 485 times faster than previous technologies.
“AudioSeal is released under a commercial license and is part of our broader efforts to ensure the responsible use of generative AI tools,” said Meta.
Enhancing Diversity in Text-to-Image Models
Another important release focuses on improving the diversity of text-to-image models, which can sometimes show geographical and cultural biases. Meta developed automatic indicators to evaluate these disparities and conducted a comprehensive study with over 65,000 annotations to understand global perceptions of geographic representation.
“This initiative promotes greater diversity and better representation in AI-generated images,” Meta stated. The accompanying code and annotations have been made available to support efforts to enhance diversity across generative models.
By sharing these pioneering models and research openly, Meta aims to encourage collaboration and drive innovation in the AI community.