Apple's research reveals a major AI flaw in OpenAI, Google, and Meta LLMs

Large-scale Linguistic Models (LLMs) may not be as smart as they seem, according to a study from Apple researchers.

LLMs from OpenAI, Google, Meta, and others are gifted with their impressive thinking skills. But research suggests that their supposed intelligence may be closer to “paradoxical pattern matching” than “true rational thinking.” Yes, even OpenAI’s o1 advanced logic model.

A common benchmark for cognitive abilities is a test called GSM8K, but since it is so popular, there is a risk of data contamination. That means that LLMs may know the answers to tests because they have been trained for those answers, not because of their innate intelligence.

BREAKFUT:

The company funding OpenAI is worth 157 billion dollars

To test this, the research has developed a new benchmark called GSM-Symbolic which keeps the essence of reasoning problems, but changes the variables, such as words, numbers, complexity, and adds irrelevant information. What they found was a surprising “weakness” in LLM performance. The study tested more than 20 models including OpenAI’s o1 and GPT-4o, Google’s Gemma 2, and Meta’s Llama 3. For every single model, the model’s performance decreased when the variables were changed.

Accuracy dropped by a few percent when names and variables were changed. And as the researchers noted, OpenAI models performed better than other open source models. However, the difference was considered “negligible,” meaning that any real difference should not have occurred. However, things got a lot more interesting when the researchers added “seemingly important but ultimately irrelevant statements” to the mix.

Mashable Light Speed

SEE ALSO:

ChatGPT-4, Gemini, MistralAI, and others join forces in this personal AI tool

In the kiwi example, the study said that LLMs often removed five small kiwis from the equation without realizing that the size of the kiwi is irrelevant to the problem. This shows that “models tend to turn statements into activities without really understanding their meaning” which confirms the researchers’ view that LLMs look for patterns in thinking problems, rather than understanding the concept.

The study was tight-lipped about its findings. Testing models’ in the benchmark that includes irrelevant information “reveals a critical flaw in the LLM’s ability to truly understand mathematical concepts and identify relevant information to solve problems.” However, it is worth noting that the authors of this study work for Apple, which is obviously a major competitor to Google, Meta, and even OpenAI – although Apple and OpenAI have a partnership, Apple also works on its own AI models.

That said, LLMs seem to lack systematic thinking skills that cannot be ignored. Finally, a good reminder to temper AI hype with healthy skepticism.

Articles
Apple Artificial Intelligence

Source link

Apple’s research reveals a major AI flaw in OpenAI, Google, and Meta LLMs

Leave a Comment Cancel Reply