OpenAI said Thursday that its API is getting several new voice intelligence features designed to help developers build apps that can speak with users, transcribe conversations, and translate speech in real time.

One of the biggest updates is GPT-Realtime-2, a new voice model built to create more natural and realistic conversations. Unlike GPT-Realtime-1.5, the new model is powered by GPT-5-class reasoning, which OpenAI says makes it better at handling more complex user requests.

The company is also introducing GPT-Realtime-Translate, a real-time translation model designed to keep up with users during live conversations. It supports more than 70 input languages, meaning the languages it can understand, and 13 output languages, meaning the languages it can speak back to users.

OpenAI has also launched GPT-Realtime-Whisper, a new transcription feature that provides live speech-to-text as conversations happen.

“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” the company said.

These updates could be useful for companies looking to improve customer service, but OpenAI says the tools can also support education, media, events, creator platforms, and several other areas.

At the same time, OpenAI acknowledged that these kinds of tools could be misused. The company said it has added guardrails to prevent abuse, including spam, fraud, and other harmful online activity. OpenAI also said certain safety triggers are built into the system so conversations can be stopped if they violate its harmful content guidelines.


Buy ExpressVPN with PayPal or Credit Card
READ
Coupang Wins FTC Approval for $1.9 Million Subcontractor Support Plan

All of the new voice models are available through OpenAI’s Realtime API. GPT-Realtime-Translate and GPT-Realtime-Whisper are billed by the minute, while GPT-Realtime-2 is billed based on token usage.

Advertisement