以下是一些常见的 ASR 方案:
这类数字人强调自驱动,人为干预更少,技术实现上更加复杂,一个最简单的算法驱动的数字人大致流程如下:其中三个核心算法分别是:1.ASR(Automatic Speech Recognition,语音识别):我们希望和数字人的交互跟人一样以对话为主,所以算法驱动的第一步就是语音识别,它能将用户的音频数据转化为文字,便于数字人理解和生成回应。2.AI Agent(人工智能体):充当数字人的大脑,可以直接接入大语言模型,强调Agent的概念是为了让数字人拥有记忆模块等更加真实。3.TTS(Text to Speech,文字转语音):数字人依靠LLM生成的输出是文字,为了保持语音交互一致性,需要将文字转换为语音,由数字人。
|类别|名称|描述|链接|详情||-|-|-|-|-||ASR-多语种|mozilla common voice|提供各种语言的音频,目前14122小时87中语言|[https://commonvoice.mozilla.org/zh-CN/datasets](https://commonvoice.mozilla.org/zh-CN/datasets)|||ASR-多语种|OpenSLR|提供各种语言的合成、识别等语料|[https://www.openslr.org/resources.php](https://www.openslr.org/resources.php)|||ASR-多语种|CI-AVSR|cantonese粤语车内auido-visual数据.8.3小时|[https://github.com/HLTCHKUST/CI-AVSR](https://github.com/HLTCHKUST/CI-AVSR)|||ASR-多语种|open speech corpora|各类数据搜集|[https://github.com/coqui-ai/open-speech-corpora](https://github.com/coqui-ai/open-speech-corpora)|||ASR-多语种|Hindi|1111小时|[https://www.openslr.org/118/](https://www.openslr.org/118/)|||ASR-多语种|Samrómur Queries 21.12|Samrómur Icelandic Speech corpus 20小时|[https://www.openslr.org/116/](https://www.openslr.org/116/)|||ASR-多语种|Samrómur Children 21.09|Icelandic Speech from children|[https://www.openslr.org/117/](https://www.openslr.org/117/)|||ASR-多语种|Golos|1240小时Russian|[https://www.openslr.org/114/](https://www.openslr.org/114/)|||ASR-多语种|MediaSpeech|10小时French,Arabic,Turkish and Spanish media speech|[https://www.openslr.org/108/](https://www.openslr.org/108/)|||噪声|Demand|各种各样的噪声|[https://zenodo.org/record/1227121#.Yi7QhhNBy3K](https://zenodo.org/record/1227121#.Yi7QhhNBy3K)|||噪声|Noisex-92|噪声|[https://github.com/yqlihust/noisex-92](https://github.com/yqlihust/noisex-92)|||噪声|MUSAN|music,speech,and noise|[https://www.openslr.org/17/](https://www.openslr.org/17/)||
|类别|名称|描述|链接|详情||-|-|-|-|-||ASR-中文|mozilla common voice|提供各种语言的音频,目前14122小时87中语言|[https://commonvoice.mozilla.org/zh-CN/datasets](https://commonvoice.mozilla.org/zh-CN/datasets)|||ASR-中文|OpenSLR|提供各种语言的合成、识别等语料|[https://www.openslr.org/resources.php](https://www.openslr.org/resources.php)|||ASR-中文|open speech corpora|各类数据搜集|[https://github.com/coqui-ai/open-speech-corpora](https://github.com/coqui-ai/open-speech-corpora)|||ASR-中文|AiShell-4|211场会议,120小时,多通道中文会议语音数据库|[http://www.aishelltech.com/aishell_4](http://www.aishelltech.com/aishell_4)|||ASR-中文|AliMeeting|118.75小时会议数据|[https://www.openslr.org/119/](https://www.openslr.org/119/)|||ASR-中文|Free ST Chinese Mandarin Corpus|855发音人102600句手机录制|[https://www.openslr.org/38/](https://www.openslr.org/38/)|||ASR-中文|aidatatang_200zh|200小时600发音人文本准确98%|[https://www.openslr.org/62/](https://www.openslr.org/62/)|||ASR-中文|magicData|755小时中文1080spks,安静室内环境,16k||||ASR-中文|magicData-RAMC|180小时中文spontaneous conversation MAGICDATA Mandarin Chinese Conversational Speech Corpus|[https://www.magicdatatech.com/datasets/mdt2021s003-1647827542](https://www.magicdatatech.com/datasets/mdt2021s003-1647827542)|||ASR-中文|TAL_CSASR|中英混合587小时|[https://ai.100tal.com/dataset](https://ai.100tal.com/dataset)|||ASR-中文|TAL_ASR|100小时讲课|[https://ai.100tal.com/dataset](https://ai.100tal.com/dataset)|||ASR-英文|GigaSpeech|10000小时,强烈推荐|[https://github.com/SpeechColab/GigaSpeech](https://github.com/SpeechColab/GigaSpeech)||