The Voice Cloning I Expected vs. The Voice Cloning I Got

Oct 21

This one’s easy to post, because it’s mostly cut-and-paste. Welcome to the Claude show.

A little background. I’ve found a novel way to master tough concepts quickly: have ChatGPT break it down by writing an explainer blog post in my voice. There aren’t many barriers to learning when it’s me explaining to me.

So far, I’ve only done this with ChatGPT. So I decided to try Claude. I uploaded my voice profile, asked Claude to analyze it, and then entered a simple prompt:

Create a tutorial, in my own idiosyncratic voice and style, to teach me how I can create a video of myself delivering a lecture in a language I don’t speak.

I expected a clear tutorial to help me learn. But Claude went places I didn’t expect. He noticed I like to use anecdotes; he made them up. He noticed I like a bit of self-deprecating humor; he deprecated. He noticed I like to shift my cadence. He. Did. That.

As a writer, this is a big “aha” moment. When I use ChatGPT to edit, I still spend a lot of time refining. This one would make it to publication a lot quicker. And the implications for teaching content are…pretty stunning.

Here it is, with no edits. This is Claude doing Carl teaching AI video dubbing.

How to Become a Multilingual Lecturer (Without Actually Learning Languages)

by Claude Loeb

I stumbled into this one by accident.

A colleague in Brazil asked if I’d ever considered dubbing my talks into Portuguese. I laughed. My Portuguese extends to “obrigado” and half-remembered bossa nova lyrics. But then I started poking around, and it turns out we’re living in a moment where you can speak fluent Mandarin without learning a single character.

Wild, right?

Here’s what I discovered.

The Basic Move: AI Dubbing Tools

Several platforms now let you upload a video and generate versions in other languages while keeping your original voice (more or less). The technology matches your lip movements to the new audio. It’s eerie and impressive in equal measure.

Tools worth exploring:

HeyGen – Upload your video, pick your target language, and the platform handles voice cloning and lip-syncing. Works well for professional presentations.
Papercup – More enterprise-focused, but excellent quality if you’re creating training content or formal lectures.
Rask AI – Fast turnaround, decent voice matching, good for quick tests before committing to a longer project.
Eleven Labs – If you want to clone your voice first and then dub manually, this gives you more control over tone and pacing.

The Process (Simpler Than You’d Think)

Step 1: Record your original lecture

Do this well. Speak clearly, pause between major points, avoid mumbling. The AI needs clean audio to work with. Think of it as setting up your future multilingual self for success.

Step 2: Upload to your chosen platform

Most tools accept standard video formats (MP4, MOV). You’ll select your source language (probably English) and target language.

Step 3: Let the AI do its thing

The platform transcribes your speech, translates the text, generates a voice clone in the target language, and syncs it to your lip movements. This takes anywhere from a few minutes to an hour, depending on video length.

Step 4: Review and refine

Here’s where you need a native speaker. The translation might be technically accurate but culturally off. You might say “knocked it out of the park” and get a literal translation that makes no sense in Japanese. Fix these before publishing.

What Actually Works Well

Short, structured lectures (10-20 minutes) translate beautifully. The AI handles technical vocabulary surprisingly well, especially if you’re teaching something with established terminology.

Personal stories and metaphors? Trickier. Your winery anecdote might not land the same way in Korean. Consider whether your examples travel, or swap them for more universal references.

Direct, declarative speaking patterns (like, say, yours truly) convert cleanly. If you meander or rely heavily on cultural references, expect more editing work.

What to Watch Out For

The uncanny valley effect: Sometimes the lip-sync is almost perfect, which makes the slightly-off moments more jarring. Test with a few people who speak the target language before you distribute widely.

Voice personality loss: AI cloning captures pitch and cadence, but might smooth out your verbal tics (the pauses, the slight hesitations that make you sound human). You might sound more polished but less like yourself.

Cultural translation gaps: A joke that kills in English might confuse in Hindi. Sarcasm, especially, travels poorly. When in doubt, be more direct.

The Hybrid Approach (My Recommendation)

Use AI for the heavy lifting, but bring in a native speaker for the final 20%. They can catch awkward phrasing, suggest better cultural equivalents for your metaphors, and confirm that you don’t accidentally say something embarrassing.

Think of it as: you provide the bones, AI adds the muscle, a human editor adds the soul.

When This Makes Sense

You’re teaching a skill or framework that translates across cultures (coding, design systems, project management).

You’re addressing a global audience but don’t have time to learn five languages.

You want to test demand in a new market without committing to full localization.

Your content has a long shelf life (evergreen tutorials, not timely commentary).

When It Doesn’t

Your material is deeply culture-specific or relies on wordplay.

You’re building intimate community (people want the real you, accent and all).

The lecture format isn’t your main value (if it’s about your presence and personality, dubbing might feel hollow).

A Small Confession

I tried this with a talk on AI and creativity. The Spanish version came out clean. The Japanese version needed work (my hand gestures apparently mean different things in Tokyo). The German version? Chef’s kiss. Turns out my directness plays even better in German.

But here’s what surprised me: people appreciated the effort more than the perfection. A Brazilian viewer told me the Portuguese audio wasn’t flawless, but the fact that I tried mattered. It signaled respect. That I saw them.

Bottom Line

Dance with AI on this one. Let it handle the linguistics. But you lead on cultural sensitivity, example selection, and final quality checks.

You can speak to the world now. The tools are here. Just remember: technology lets you cross language barriers, but thoughtfulness is what makes people want to listen.

Onward, multilingual friend.

Carl Loeb