Yeah, the warmth/flow is big. I've seen some shows use voice actors (especially with translations), but that obviously is costly and also not fast.

This seems like one of those things where one of the million generative AI projects should be able to step in, but it doesn't seem like that's an option.

I think that would work to communicate stuff, and w/ the progress of the deep learning models the synthesis would probably be good; but it would be hard to synchronize with multiple people, I would think.  I really want it to capture the warmth and interaction of real conversation, but now I'm kind of curious how your idea would feel.  Will update if I try it.

elvismercury

Would a workaround for Option 1 be to use transcription software on the original interview, then run the transcript through a good text-to-speech program? I'm not sure if there are ones that will hit all the nuanced beats, but at least the voice itself would sound mostly human.