Chat with Wiki - 视频智能解说

Answer

以下是关于视频智能解说的相关内容：

使用 Gemini 进行视频拆解：

思路：可以拿库布里克的电影来分析，学习大师手法；b 站大学复习有救了；可用于影视二创长剪短，生成特定故事，警察查监控找出可疑现象，替代长剪短工具等。
测试者阿强：将前几天用 AI 做的功夫熊猫-之离谱村版丢进去分析，效果很好，能准确认出“如来神掌”，并给出极致详细的每个镜头描述，包括物体、动作、形变、分镜设计原因等。
测试者张余、Ling、洋洋：均有拆解结果。

Han 基于现有能力项目应用的思考：

MM-Vid 是一个集成的视频理解系统，能处理和理解长视频内容并进行问答。
应用场景包括快速的视频剪辑、图生视频、快速诊断等。
由 Microsoft Azure AI 开发，结合了 GPT-4V 的能力和其他视觉、音频和语音处理工具，能自动识别和解释视频中的元素，将多模态信息转录成详细的文本脚本。

使用 GPT 的视觉功能和 TTS API 处理和讲述视频：在冬季的广袤白色景观中，一群灰狼从牛群中挑出一头野牛，展开了一场生存的激烈斗争。狼群精心策划攻击，包围猎物，野牛坚守阵地。双方的每一个动作、每一次佯攻都关乎生死。最终，随着斗争不可避免地结束，展现了自然界残酷而美丽的生存法则。之后可将脚本传递给 TTS API 生成画外音的 mp3。

Content generated by AI large model, please carefully verify (powered by aily)

References

用 Gemini 拆解视频

郑跃葵：拿个库布里克的电影来分析，连大师手法都学会了大峰AI绘画：太酷啦又多了一个AI视频创作助手Jones：b站大学复习有救了对！：拉片太方便了Shock：而且可以干掉人工标注了，安徽那边数据标注众包要失业了清慎：1.影视二创长剪短，一键生成小帅、小美、大壮、丧彪的故事2.警察叔叔查监控，一句话找出监控中的可疑现象3.替代Opus等长剪短工具[heading2]测试者：阿强[content]拆解作品[阿强：功夫熊猫版-如来神掌的完整复盘](https://waytoagi.feishu.cn/wiki/T6bFwVlppipd4ZkoddfcGbebnvd)好用，准确度很高，我把前几天用AI做的功夫熊猫-之离谱村版丢进去分析，效果很好拆解结果prompt：请你用极致详细的言语描述其中的每个镜头，包括其中有什么物体，在执行什么动作，产生什么形变，为什么这样设计分镜等等。感动，它竟然准确的认出了“如来神掌”Gemini 1.5 Pro给出的全文如下：[heading2]测试者：张余[content]拆解结果Gemini 1.5 Pro给出的全文如下：[heading2]测试者：Ling[heading2]测试者：洋洋[content]拆解结果

Han：基于现有能力项目应用的思考

[title]Han：基于现有能力项目应用的思考|技术名称|应用场景|技术类型|简介|主要特点|工作原理|其他|官方网站|项目及演示|论文|Github|在线体验|附件|最后更新时间| |-|-|-|-|-|-|-|-|-|-|-|-|-|-| |MM-Vid：一个集成的视频理解系统，能处理和理解长视频内容并进行问答。|可以想象到的应用场景： 1、快速的视频剪辑：只需要输入镜头脚本，提取素材库内容进行拼接。 2、快速的图生视频：结合图像识别为基础，根据镜头脚本提取出相同seed值的图片，通过runwayml思路生成3,5s视频，结合comfy ui自动化拼接。 3、快速诊断：思路和之前小互发的医疗诊断类似。|AI学习模型|由Microsoft Azure AI开发，结合了GPT-4V的能力和其他视觉、音频和语音处理工具，能处理和理解长视频和复杂任务。 能够自动识别和解释视频中的元素，如人物行为、情感表达、场景变化和对话内容，从而实现对视频故事线的理解。 其核心功能是将视频中的多模态信息（如视觉图像、音频信号和语言对话）转录成详细的文本脚本，这样大语言模型就能够理解视频内容。 这对于多种应用场景都非常有用，比如为视觉障碍人士提供视频内容的音频描述，或者在视频监控中自

使用GPT的视觉功能和TTS API处理和讲述视频

[title]使用GPT的视觉功能和TTS API处理和讲述视频[heading2]2.Generating a voiceover for a video with GPT-4 aIn the vast,white expanse of the winter landscape,a drama unfolds that is as timeless as it is raw.Here,in the cradle of nature's harshest trials,a pack of grey wolves has singled out a bison from the herd—a desperate struggle for life and sustenance is about to begin.In a carefully orchestrated assault,the pack encircles their quarry,each wolf keenly aware of its role.Muscles tense and breaths visible in the frigid air,they inch closer,probing for a weakness.The bison,a formidable giant,stands its ground,backed by the survival instincts honed over millennia.Its hulking form casts a solitary shadow against the snow's blinding canvas.The dance of predator and prey plays out as a symphony of survival—each movement,each feint,holds the weight of life itself.The wolves take turns attacking,conserving strength while wearing down their target.The herd,once the bison's allies,scatter into the distance,a stark reminder that in these wild territories,the law of survival supersedes the bonds of kinship.A burst of activity—the wolves close in.The bison,though mighty,is tiring,its breaths labored,its movements sluggish.The wolves sense the turning tide.With relentless determination,they press their advantage,a testament to the brutal beauty of the natural order.As the struggle reaches its inevitable conclusion,we are reminded of the delicate balance that governs these wild spaces.Life,death,struggle,and survival—the cycle continues,each chapter written in the snow,for as long as the wolf roams and the bison roves these frozen plains.Now we can pass the script to the TTS API where it will generate a mp3 of the voiceover:现在我们可以将脚本传递给TTS API，它将在其中生成画外音的mp3：