8 Comments
User's avatar
Matt Ferguson's avatar

Fantastic overview here - thanks for writing this. Curious if you have any insights as to how these voice cloning tools are trained. Any copyright issues on that front?

Expand full comment
Siddhi Sundar's avatar

Thanks so much for reading Matt! Great question.

Most voice cloning models are trained on massive datasets that are often scraped from places like podcasts, YouTube videos, interviews, and audiobooks. That’s exactly where many of the copyright and likeness concerns begin. Some companies (like the ones listed in the post) are shifting toward more ethical sourcing and licensing voices directly from actors, but that isn't the industry norm yet.

Legally, it’s all rapidly evolving. A voice isn’t protected under copyright like a song or an image, but Right of Publicity laws are becoming an important lever, especially in the US. SAG-AFTRA and NAVA are actively advocating for stronger safeguards and we’re seeing similar moves internationally. A good example is the recent case with Arijit Singh in India:

https://www.wipo.int/web/wipo-magazine/articles/ai-voice-cloning-how-a-bollywood-veteran-set-a-legal-precedent-73631

It’s a fast-moving area with a lot of gray zones. I’d love to hear your take on what you're hearing on any of this from the animation and sound side!

Expand full comment
Matt Ferguson's avatar

Thanks! That’s a ton of great info. Right now in animation it’s definitely the unions that are protecting the voice actors. I remember years ago (long before this current wave of genAI) we wanted a character to sound like it was created by a crude text-to-voice program, but our lawyers couldn’t figure out if we were allowed to actually use one. So we ended up hiring an actor to sound purposely stilted and robotic and we added a bit of post-processing. It was an odd experience to direct an actor to make it sound worse. Act less!

Expand full comment
Siddhi Sundar's avatar

What a wild story. not going to forget that one! Says so much about the strange intersection of tech, legality, and performance that we're all navigating.

Expand full comment
Ben's avatar

Great insights in here. I have been thinking about voice and sound and cloning a lot over the last 5 years. The recent voice updates to chatGPT voice mode have accelerated the landscape significantly - notice how the voice responses feel much more human - their system takes breaths, adds “um” and “eh” and does a very good job simulating human speech patterns.

Expand full comment
Siddhi Sundar's avatar

Totally agree! The ChatGPT voice updates are a quantum leap IMO. The addition of more natural prosody really shifts how we can experience machine-generated voice.

Expand full comment
JN's avatar
Jul 8Edited

Brilliantly written, Siddhi. Loved it - specifically how you broke down how it works to the point of even describing vectors and latent space! Impressive. You've included elements for tech nerds, creatives, and general community to ponder. And it's *highly* relevant. I just removed my voicemail greeting, for example.

Expand full comment
Siddhi Sundar's avatar

Means so much coming from you! Thank you for reading and following along. Always open to feedback on how to make these pieces land with these diverse audiences.

Expand full comment