The most powerful open source AI model however can charge AI Agents

A more powerful open source AI model with visualization capabilities however could see more developers, researchers, and startups building AI agents that can perform useful tasks for you on your computers.

Released today by the Allen Institute for AI (Ai2), the Multimodal Open Language Model, or Molmo, can interpret images and communicate using a conversational interface. This means it can make sense of a computer screen, being able to help an AI agent perform tasks such as web browsing, navigating file directories, and writing scripts.

“With this release, more people can use the multimodal model,” said Ali Farhadi, CEO of Ai2, a research organization based in Seattle, Washington, and a computer scientist at the University of Washington. “It should be a provider of next-generation applications.”

So-called agent AIs are widely considered the next big thing in AI, with OpenAI, Google, and others racing to develop them. Agents have become a hot topic lately, but the ideal is for AI to go beyond chatter to take complex and sophisticated actions on computers when given a command. This ability will still work at any scale.

Some powerful AI models already have visualization capabilities, including GPT-4 from OpenAI, Claude from Anthropic, and Gemini from Google DeepMind. These models can be used to power other AI test agents, but they are hidden from view and only accessible through a paid programming interface, or API.

Meta has released a family of AI models called Llama under a license that restricts their commercial use, but has yet to offer a multimodal version to developers. Meta is expected to announce several new products, possibly including new Llama AI models, at its Connect event today.

“Having an open source, multimodal model means that any startup or researcher with an idea can try to do it,” said Ofir Press, a postdoc at Princeton University working on AI agents.

Press says the fact that Molmo is open source means developers will be able to easily tune their agents to perform specific tasks, such as working with spreadsheets, by providing additional training data. Models such as GPT-4 can be manipulated to a limited extent through their APIs, while a fully open model can be heavily modified. “When you have an open source model like this you have a lot of options,” Press.

Ai2 is releasing several sizes of Molmo today, including a 70-billion-parameter model and a 1-billion-parameter that’s small enough to run on a mobile phone. The parameter count of a model refers to the number of units it contains for storing and manipulating data and is roughly proportional to its power.

Ai2 claims that Molmo is as capable as the largest commercial models despite its small size, because it was carefully trained on high-quality data. The new model is also completely open source in that, unlike Meta’s Llama, there are no restrictions on its use. Ai2 also releases the training data used to create the model, providing researchers with additional details of its performance.

Releasing powerful models is not a risk. Such models can easily adapt to negative outcomes; one day, for example, we may see the emergence of AI agents designed to automate the hacking of computer systems.

Ai2’s Farhadi argues that Molmo’s efficiency and portability will allow developers to create powerful software agents that run natively on smartphones and other mobile devices. “A billion-parameter model is now performing at the level or in the league of models 10 times larger,” he said.

Building useful AI agents may depend on more than efficient multimodal models, however. The biggest challenge is making the models work reliably. This may require further improvements in AI reasoning abilities—something OpenAI wants to address with its latest model o1, which demonstrates step-by-step reasoning abilities. The next step might be to give multimodal models such reasoning capabilities.

Meanwhile, the release of Molmo means AI agents are closer than ever—and could soon help out the giants that rule the AI world.

Source link

Leave a Comment Cancel Reply