B&T Opinion: Can You Hear That? AI Is Coming For Voice

This article was originally published as an opinion piece in B&T.

The past few weeks in generative AI have been all about Voice.

Google and OpenAI have both released new products that are expanding the possibilities and importantly the realism, of AI-generated voices.

Hot on the heels of the product launch, OpenAI announced its raise of $9.6 billion on a valuation of nearly $228.7 billion, equalling Australia’s largest company, BHP with a market cap of $228.6 billion!

Firstly, three weeks ago Google released an update to its NotebookLM product, modestly called Audio Overviews. It is extraordinarily simple: upload a document (e.g. a large report, prospectus, research paper or a book) and NotebookLM will create (with AI magic) a lifelike podcast interview between two ‘hosts’ discussing this topic. It legitimately sounds human, with pauses, speaking over each other and a lifelike repartee between the two AI hosts.

One of the funniest (and most viral) examples of an Audio Overview is this example shared on Reddit which appears as if the two AI podcast hosts have just been told they’re AI, and they proceed to have an existential crisis. Don’t be fooled, this is clever “prompt hacking”, the robots are not self-aware.

Not to be outdone, over the past few weeks, OpenAI has been rolling out its Advanced Voice Mode in the ChatGPT mobile app. This was first demoed many months ago but has been a long time coming due to some concerns around copyright infringement of people’s voices. Most notably, Scarlett Johansson threatened to sue when an early voice demo sounded very much like her.

We no longer have to wait. Advanced Voice Mode has removed all “latency” from a voice chat with ChatGPT, making it seem incredibly natural and lifelike. With the reasonably new memory feature in ChatGPT (the app will store your conversations, and get to know you better over time, personalising your chats to your history), your long-awaited AI friend is now in your pocket, and you may not have even realised it.

Of course, with your friend trapped inside the ChatGPT app, there is only so much fun to be had. Fortunately, at yesterday’s OpenAI DevDay it released the API for Advanced Voice Mode, meaning brands can now incorporate natural, fast and perfectly accurate voice modes into their own apps. I don’t think this will replace traditional user experiences, but will be a welcome alternative.

I’m busy refreshing my iPhone apps, to see who will be the first to market with a great voice experience.

What OpenAI didn’t release is its multi-modal (with video) model. With a fresh $9.6 billion in the bank, hopefully, we won’t have to wait much longer.

Previous
Previous

GPT-4o Canvas: A New Way to Collaborate with AI

Next
Next

RMIT Future of Marketing event