Mark Zuckerberg’s Meta, on Friday, said it will release a series of new AI models from its research arm – Fundamental AI Research (FAIR). These models include a ‘Self-Teaching Explorer’ that may allow for minimal human involvement throughout the AI ​​development process, and another model that freely mixes text and speech.
The latest announcements come after a Meta paper in August that detailed how these models will rely on a ‘chain of reasoning’ mechanism, something OpenAI has used in its latest o1 models that think before they react. It should be noted that Google and Anthropic, too, have published research on the concept of Reinforcement Learning from AI Feedback. However, these have not yet made it into public use.
The Meta group of AI researchers under FAIR said the new release supports the company’s goal of achieving advanced machine intelligence while also supporting open science and innovation. Recently released models include the updated Segment Anything Model 2 for images and videos, Meta Spirit LM, Layer Skip, SALSA, Meta Lingua, OMat24, MEXMA, and Self Taught Evaluator.
A Self-taught Tester
Meta called this new model that can verify the functions of other types of AI as “a powerful generative reward model with artificial data”. The company says that this is a new way to generate popular data to train reward models without relying on human annotations. “This approach produces exceptional results and trains the LLM-as-a-Judge to produce leads for evaluation and final judgment, through an iterative self-improvement program,” the company said in its official blog post.
Basically, the Self Taught Evaluator is a new method that generates its own data to train reward models on the need for people to label it. Meta says the model generates different results from AI models and then uses other AI to test and improve those results. This is an iterative process. According to Meta, the model is more powerful and works better than models that rely on human-labeled data such as GPT-4 and others.
Meta Spirit LM
Spirit LM is an open source language model for seamless speech and text synthesis. Large-scale Language Models are often used to build programs that convert speech to text and vice versa. However, this may also lead to a loss of natural expression in the original speech. Meta has developed Spirit LM, its first open source model that can work with both text and speech in a natural way.
“Many existing voice AI experiences today use ASR in speech processing techniques before combining with LLM to produce text – but these techniques degrade speech features. By using phonetic, intonation and tone tokens, Spirit LM models can overcome these limitations in both input and output to produce natural sounding speech while learning new functions across ASR, TTS and speech segmentation,” Meta said in a tweet. .
Meta LM is trained on both speech and text data, making it possible to switch between the two without effort. Meta has created two versions of the model – Spirit LM Base which focuses on speech sounds, and Spirit LM which captures the tone and emotion in speech such as anger, excitement to sound more realistic. Meta says this model can create more natural sounding speech. It also learns functions such as speech recognition, converting text to speech, or distinguishing different types of speech.
