OpenAI has started the gradual rollout of ChatGPT’s Advanced Voice Mode, allowing users to experience GPT-4o’s hyperrealistic audio responses. Starting today, a select group of ChatGPT Plus users will gain access, with a full rollout expected by fall 2024.
In May, OpenAI wowed audiences by showcasing GPT-4o’s voice capabilities, which delivered swift and lifelike responses closely resembling a human voice—specifically, that of Scarlett Johansson’s character in the film “Her.” Johansson denied any collaboration with OpenAI, leading to legal scrutiny and OpenAI subsequently removing the voice from its demo. Following safety improvements, OpenAI delayed the feature's release.
Now, OpenAI is ready to offer this groundbreaking feature to a broader audience. However, the video and screensharing functions from the Spring Update are not included in this alpha release and will be available later. For now, select premium users can explore ChatGPT’s voice capabilities that made such an impact during the initial demo.
The new Advanced Voice Mode differentiates itself from ChatGPT's previous audio solution, which relied on three separate models for processing voice-to-text, prompt handling, and text-to-voice conversion. GPT-4o’s multimodal capability streamlines this process, resulting in significantly lower latency. Moreover, GPT-4o can detect emotional intonations, adding a layer of human-like interaction by recognizing feelings such as sadness or excitement.
Selected ChatGPT Plus users will receive notifications within the ChatGPT app and an email with usage instructions. OpenAI is carefully monitoring the rollout to ensure optimal performance and user safety. The alpha group will be the first to experience just how realistic ChatGPT’s new voice feature is.
Since its initial demo, OpenAI has rigorously tested GPT-4o’s voice functionality with over 100 external red teamers speaking 45 different languages. The company plans to release a detailed report on these safety efforts in early August.
The current Advanced Voice Mode will feature four preset voices—Juniper, Breeze, Cove, and Ember—created with the help of professional voice actors. The controversial Sky voice, initially featured in OpenAI’s demo, is no longer available. OpenAI emphasizes that ChatGPT cannot impersonate any individual’s voice, including public figures, and will block any attempts to deviate from the preset voices.
To prevent deepfake issues, OpenAI has introduced filters to block requests for generating music or copyrighted audio. This move comes as the industry faces legal challenges over copyright infringements by AI-generated content, particularly from record labels.
By rolling out these sophisticated voice features, OpenAI aims to enhance the ChatGPT experience while prioritizing safety and ethical considerations. The gradual release strategy allows for close monitoring and adjustments, ensuring a smooth and secure integration for all users.