Yeoh Siew Hoon
Yeoh Siew Hoon

I've always been in love with radio, and growing up in Penang, when I should be fast asleep, I'd instead be listening to voices over the airwaves transporting me with songs, dedications and letters.

So, when the trend of podcasting arrived, I jumped right in. During the pandemic, the Web in Travel Podcast took on a life of its own. At a time when we were denied human togetherness, the human voice was more needed than ever to bring our shattered industry together.

Well, here we stand at another new age of audio -- generative AI voice -- where you can clone your voice and get it to say anything, and even have access to a range of voices to broadcast whatever it is you want to create.

In late January ElevenLabs, a voice technology research company set up in 2022, announced a Series B fund raising ot $80 million, simultaneously releasing new products, including a Dubbing Studio to enable users to dub entire movies, a Voice Library marketplace for users to earn from AI versions of their own voices (you can rent your voice to others) and a mobile app that enables instant conversion of text and URLs into audio.

Finding my voice

Over the Lunar New Year period, with time on my hands, I decided to experiment with its Speech Synthesis product and clone my voice. I had heard it was free -- it never is, of course. Anyway, I started with a beginner's account, $1 a month. How bad could it be?

The beginner's account limits you to up to five minutes of audio and a certain amount of characters -- that's how they reel you in.

First, I described my voice:  Malaysian Chinese, female, neutral, deep. I then wrote my text, recorded it and uploaded the audio file, making sure it is "clean," because if it is a "dirty" recording the machine will just pick that up any written error and your clone may sound like it (and you) swallowed a hundred pebbles.

The first voice made me sound Australian. The second, part-American, part-British. The third attempt came the closest.

More Postcards:
Thoughts on the Lunar New Year

Rediscovering Japan

I then wondered what I would sound like in Mandarin and Japanese. I translated the same text in ChatGPT, copied and pasted it into ElevenLabs, and generated. In an instant, I was speaking Mandarin and Japanese like I was born to it. I sent it to my friends who speak those languages, and they were shocked. My mother fell off her chair; she'd never heard me string together a sentence in Mandarin before.

I decided to "train" it with a longer audio file, so I upgraded to a Creator's account, around $11 a month. (I told you, that's how they get you.)

I recorded a story of around 10 minutes, fed it to the machine and out came another version of my voice, reading like a professional broadcaster: no stumbles, no hems, no haws.

Mixed reviews

At this point, I felt good enough to go to the next stage: I fed in my weekly column. Listen to it and tell me what you think.

I've since sent the audio version of "WiT Thoughts" to friends, and response has been a mixed bag. Some people think it's all me. Some say it doesn't sound like me. One said, "The timbre is fine, the words are yours. But the accentuation is not there. It's too consistent."

Another said, "Positively creepy, we need to talk."

I think I will send him my clone to have that conversation. Meanwhile, I need to return to the machine and further train my clone. For Professional Voice Cloning, it recommends at least three hours of recording. I might read "Winnie-the-Pooh."

Comments

From Our Partners


From Our Partners

Discover Spain: A Perfect Destination for Every Traveler
Discover Spain: A Perfect Destination for Every Traveler
Register Now
Breathless Resorts & Spas®
Breathless Resorts & Spas®
Read More
Why Holland America is First in Alaska? A Review.
Why Holland America is First in Alaska? A Review.
Register Now
JDS Travel News JDS Viewpoints JDS Africa/MI