OpenAI built a voice cloning tool, but you can’t use it… yet

Introducing OpenAI’s Voice Engine – A Revolutionary Tool for Generating Synthetic Voices

OpenAI’s innovative Voice Engine provides a powerful solution for creating accurate synthetic voices, raising questions about ethics and impact on the voice acting industry.

OpenAI, the organization behind the popular text-to-speech API, has recently introduced Voice Engine, an extension of its existing technology designed to allow users to upload a short voice sample and generate a synthetic copy of that voice. While the exact launch date remains undisclosed, OpenAI assures that the technology will be released responsibly, addressing concerns related to deepfake abuse and its implications for the voice acting industry.

Development Process:

Voice Engine has been under development for approximately two years. Its underlying generative AI model powers both the “read aloud” feature in ChatGPT and the preloaded voices within OpenAI’s current text-to-speech API. Additionally, companies such as Spotify have utilized this model for dubbing podcasts in various languages.

Training Data:

Details regarding the training data for Voice Engine remain undisclosed due to competitive advantages and potential intellectual property disputes. However, OpenAI maintains licensing agreements with several content providers, including Shutterstock and Axel Springer, and permits website administrators to block OpenAI’s web crawlers from extracting data for training purposes.

Synthesis Capability:

Unlike many competing voice cloning solutions, Voice Engine does not require personalized modeling for each individual speaker. Instead, it takes a brief audio sample and corresponding text input, then produces lifelike speech that mirrors the original speaker’s voice. Notably, the audio is discarded following requests, ensuring privacy protection.

Pricing and Competition:

While specific pricing information for Voice Engine has been omitted from promotional materials, internal documentation reveals a cost of $15 per one million characters – equivalent to roughly 162,500 words. This places the service competitively below rates charged by rivals like ElevenLabs. Despite offering fewer customization options, OpenAI’s Voice Engine aims to deliver superior speech quality.

Impact on Voice Actors:

As OpenAI’s Voice Engine gains traction, it may potentially disrupt the income streams of professional voice actors. Concerns surrounding job displacement have led some AI voice platforms to form partnerships with trade unions, such as SAG-AFTRA, establishing fair terms for incorporating synthetic voices into new projects.

Safety Measures:

To minimize the likelihood of malicious applications, OpenAI initially intends to restrict access to Voice Engine for a select group of developers working on socially responsible projects. Cloned voices produced via the software will bear audible watermarks, allowing for straightforward identification of their origin. Furthermore, OpenAI plans to collaborate with its red teaming network – a group of external experts who assess the potential hazards posed by the company’s AI technologies – to identify possible nefarious uses of Voice Engine.

Conclusion:

With the impending release of Voice Engine, OpenAI seeks to revolutionize voice synthesis technology while navigating the delicate balance between innovation and responsibility. By carefully monitoring usage patterns and implementing safeguards, OpenAI hopes to harness the full potential of Voice Engine while minimizing negative consequences for the voice acting profession and society at large.

OpenAI built a voice cloning tool, but you can’t use it… yet