Sam Altman-run OpenAI on Monday launched ‘GPT-4o’ a new version of its GPT-4 model which can generate any combination of text, audio, and image outputs.

GPT-4o’s text and image capabilities have been rolled out for all ChatGPT users.

“We are making GPT-4o available in the free tier, and to Plus users with up to 5 times higher message limits,” the company said during a livestream event.

OpenAI will soon roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT.

“Our new model: GPT-4o, is our best model ever. it is smart, it is fast, it is natively multimodal,” Altman posted on X.

“It is available to all ChatGPT users, including on the free plan! so far, GPT-4 class models have only been available to people who pay a monthly subscription. this is important to our mission; we want to put great AI tools in the hands of everyone,” he noted.

Buy Me A Coffee

GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.

GPT-4o is especially better at vision and audio understanding compared to existing models.

The company plans to launch support for GPT-4o’s new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

With GPT-4o, the company trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.

READ
Brazil Supreme Court Orders X to Pay $5M in Fines to Lift Ban

“Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations,” said OpenAI.

The company also launched a Mac desktop app for ChatGPT.

During the event, OpenAI also announced that its custom GPT Store is now available free for users.

The GPT Store will let users create their own chatbots, called GPTs, and share them.