Chat with Wiki - 做 video lip sync 的高质量项目有哪些？

Answer

以下为一些关于 video lip sync（视频唇形同步）的高质量项目介绍：谷歌的“Generating audio for video”项目正在开展进一步研究。该项目具有以下特点：

与现有视频音频解决方案相比，它能够理解原始像素，并且添加文字提示是可选的。
系统无需手动调整生成的声音和视频，避免了对声音、视觉效果和时间等不同元素进行繁琐的调整。
但仍存在一些限制因素需要解决，比如音频输出质量依赖于视频输入质量，视频中的假象或失真超出模型训练分布范围会导致音频质量明显下降。
正在改进涉及语音的视频唇形同步。V2A 尝试从输入的文稿生成语音并与角色的唇形动作同步，但配对的视频生成模型可能不受文稿条件的限制，这会造成不匹配，导致唇形同步不自然。

需要注意的是，这一项目仍在不断完善和改进中。

Content generated by AI large model, please carefully verify (powered by aily)

References

Our research stands out from existing video-to-audio solutions because it can understand raw pixels and adding a text prompt is optional.与现有的视频音频解决方案相比，我们的研究与众不同，因为它可以理解原始像素，而且可以选择添加文字提示。Also,the system doesn't need manual alignment of the generated sound with the video,which involves tediously adjusting different elements of sounds,visuals and timings.此外，该系统无需手动调整生成的声音和视频，因为手动调整需要对声音、视觉效果和时间等不同元素进行繁琐的调整。Still,there are a number of other limitations we’re trying to address and further research is underway.不过，我们还在努力解决其他一些限制因素，进一步的研究正在进行中。Since the quality of the audio output is dependent on the quality of the video input,artifacts or distortions in the video,which are outside the model’s training distribution,can lead to a noticeable drop in audio quality.由于音频输出的质量取决于视频输入的质量，视频中超出模型训练分布范围的假象或失真会导致音频质量明显下降。We’re also improving lip synchronization for videos that involve speech.V2A attempts to generate speech from the input transcripts and synchronize it with characters' lip movements.But the paired video generation model may not be conditioned on transcripts.This creates a mismatch,often resulting in uncanny lip-syncing,as the video model doesn’t generate mouth movements that match the transcript.

谷歌Generating audio for video