Fantastic overview here - thanks for writing this. Curious if you have any insights as to how these voice cloning tools are trained. Any copyright issues on that front?
Most voice cloning models are trained on massive datasets that are often scraped from places like podcasts, YouTube videos, interviews, and audiobooks. That’s exactly where many of the copyright and likeness concerns begin. Some companies (like the ones listed in the post) are shifting toward more ethical sourcing and licensing voices directly from actors, but that isn't the industry norm yet.
Legally, it’s all rapidly evolving. A voice isn’t protected under copyright like a song or an image, but Right of Publicity laws are becoming an important lever, especially in the US. SAG-AFTRA and NAVA are actively advocating for stronger safeguards and we’re seeing similar moves internationally. A good example is the recent case with Arijit Singh in India:
Thanks! That’s a ton of great info. Right now in animation it’s definitely the unions that are protecting the voice actors. I remember years ago (long before this current wave of genAI) we wanted a character to sound like it was created by a crude text-to-voice program, but our lawyers couldn’t figure out if we were allowed to actually use one. So we ended up hiring an actor to sound purposely stilted and robotic and we added a bit of post-processing. It was an odd experience to direct an actor to make it sound worse. Act less!
What a wild story. not going to forget that one! Says so much about the strange intersection of tech, legality, and performance that we're all navigating.
Great insights in here. I have been thinking about voice and sound and cloning a lot over the last 5 years. The recent voice updates to chatGPT voice mode have accelerated the landscape significantly - notice how the voice responses feel much more human - their system takes breaths, adds “um” and “eh” and does a very good job simulating human speech patterns.
Totally agree! The ChatGPT voice updates are a quantum leap IMO. The addition of more natural prosody really shifts how we can experience machine-generated voice.
Brilliantly written, Siddhi. Loved it - specifically how you broke down how it works to the point of even describing vectors and latent space! Impressive. You've included elements for tech nerds, creatives, and general community to ponder. And it's *highly* relevant. I just removed my voicemail greeting, for example.
Means so much coming from you! Thank you for reading and following along. Always open to feedback on how to make these pieces land with these diverse audiences.
Fantastic overview here - thanks for writing this. Curious if you have any insights as to how these voice cloning tools are trained. Any copyright issues on that front?
Thanks so much for reading Matt! Great question.
Most voice cloning models are trained on massive datasets that are often scraped from places like podcasts, YouTube videos, interviews, and audiobooks. That’s exactly where many of the copyright and likeness concerns begin. Some companies (like the ones listed in the post) are shifting toward more ethical sourcing and licensing voices directly from actors, but that isn't the industry norm yet.
Legally, it’s all rapidly evolving. A voice isn’t protected under copyright like a song or an image, but Right of Publicity laws are becoming an important lever, especially in the US. SAG-AFTRA and NAVA are actively advocating for stronger safeguards and we’re seeing similar moves internationally. A good example is the recent case with Arijit Singh in India:
https://www.wipo.int/web/wipo-magazine/articles/ai-voice-cloning-how-a-bollywood-veteran-set-a-legal-precedent-73631
It’s a fast-moving area with a lot of gray zones. I’d love to hear your take on what you're hearing on any of this from the animation and sound side!
Thanks! That’s a ton of great info. Right now in animation it’s definitely the unions that are protecting the voice actors. I remember years ago (long before this current wave of genAI) we wanted a character to sound like it was created by a crude text-to-voice program, but our lawyers couldn’t figure out if we were allowed to actually use one. So we ended up hiring an actor to sound purposely stilted and robotic and we added a bit of post-processing. It was an odd experience to direct an actor to make it sound worse. Act less!
What a wild story. not going to forget that one! Says so much about the strange intersection of tech, legality, and performance that we're all navigating.
Great insights in here. I have been thinking about voice and sound and cloning a lot over the last 5 years. The recent voice updates to chatGPT voice mode have accelerated the landscape significantly - notice how the voice responses feel much more human - their system takes breaths, adds “um” and “eh” and does a very good job simulating human speech patterns.
Totally agree! The ChatGPT voice updates are a quantum leap IMO. The addition of more natural prosody really shifts how we can experience machine-generated voice.
Brilliantly written, Siddhi. Loved it - specifically how you broke down how it works to the point of even describing vectors and latent space! Impressive. You've included elements for tech nerds, creatives, and general community to ponder. And it's *highly* relevant. I just removed my voicemail greeting, for example.
Means so much coming from you! Thank you for reading and following along. Always open to feedback on how to make these pieces land with these diverse audiences.