Chat with Wiki - 有哪些ASR方案

Answer

以下是一些常见的 ASR 方案：

算法驱动的数字人方案中，ASR（Automatic Speech Recognition，语音识别）是核心算法之一，能将用户的音频数据转化为文字，便于数字人理解和生成回应。
开源 ASR 数据方面：
- 多语种：
  - mozilla common voice：提供各种语言的音频，目前 14122 小时 87 种语言，链接：https://commonvoice.mozilla.org/zh-CN/datasets
  - OpenSLR：提供各种语言的合成、识别等语料，链接：https://www.openslr.org/resources.php
  - CI-AVSR：cantonese 粤语车内 audio-visual 数据，8.3 小时，链接：https://github.com/HLTCHKUST/CI-AVSR
  - open speech corpora：各类数据搜集，链接：https://github.com/coqui-ai/open-speech-corpora
  - Hindi：1111 小时，链接：https://www.openslr.org/118/
  - Samrómur Queries 21.12：Samrómur Icelandic Speech corpus 20 小时，链接：https://www.openslr.org/116/
  - Samrómur Children 21.09：Icelandic Speech from children，链接：https://www.openslr.org/117/
  - Golos：1240 小时 Russian，链接：https://www.openslr.org/114/
  - MediaSpeech：10 小时 French,Arabic,Turkish and Spanish media speech，链接：https://www.openslr.org/108/
- 中文：
  - mozilla common voice：提供各种语言的音频，目前 14122 小时 87 种语言，链接：https://commonvoice.mozilla.org/zh-CN/datasets
  - OpenSLR：提供各种语言的合成、识别等语料，链接：https://www.openslr.org/resources.php
  - open speech corpora：各类数据搜集，链接：https://github.com/coqui-ai/open-speech-corpora
  - AiShell-4：211 场会议，120 小时，多通道中文会议语音数据库，链接：http://www.aishelltech.com/aishell_4
  - AliMeeting：118.75 小时会议数据，链接：https://www.openslr.org/119/
  - Free ST Chinese Mandarin Corpus：855 发音人 102600 句手机录制，链接：https://www.openslr.org/38/
  - aidatatang_200zh：200 小时 600 发音人文本准确 98%，链接：https://www.openslr.org/62/
  - magicData：755 小时中文 1080spks，安静室内环境，16k
  - magicData-RAMC：180 小时中文 spontaneous conversation MAGICDATA Mandarin Chinese Conversational Speech Corpus，链接：https://www.magicdatatech.com/datasets/mdt2021s003-1647827542
  - TAL_CSASR：中英混合 587 小时，链接：https://ai.100tal.com/dataset
  - TAL_ASR：100 小时讲课，链接：https://ai.100tal.com/dataset
- 英文：
  - GigaSpeech：10000 小时，强烈推荐，链接：https://github.com/SpeechColab/GigaSpeech

Content generated by AI large model, please carefully verify (powered by aily)

References

这类数字人强调自驱动，人为干预更少，技术实现上更加复杂，一个最简单的算法驱动的数字人大致流程如下：其中三个核心算法分别是：1.ASR(Automatic Speech Recognition，语音识别)：我们希望和数字人的交互跟人一样以对话为主，所以算法驱动的第一步就是语音识别，它能将用户的音频数据转化为文字，便于数字人理解和生成回应。2.AI Agent（人工智能体）：充当数字人的大脑，可以直接接入大语言模型，强调Agent的概念是为了让数字人拥有记忆模块等更加真实。3.TTS(Text to Speech，文字转语音）：数字人依靠LLM生成的输出是文字，为了保持语音交互一致性，需要将文字转换为语音，由数字人。

TTS 数据整理

|类别|名称|描述|链接|详情||-|-|-|-|-||ASR-多语种|mozilla common voice|提供各种语言的音频,目前14122小时87中语言|[https://commonvoice.mozilla.org/zh-CN/datasets](https://commonvoice.mozilla.org/zh-CN/datasets)|||ASR-多语种|OpenSLR|提供各种语言的合成、识别等语料|[https://www.openslr.org/resources.php](https://www.openslr.org/resources.php)|||ASR-多语种|CI-AVSR|cantonese粤语车内auido-visual数据.8.3小时|[https://github.com/HLTCHKUST/CI-AVSR](https://github.com/HLTCHKUST/CI-AVSR)|||ASR-多语种|open speech corpora|各类数据搜集|[https://github.com/coqui-ai/open-speech-corpora](https://github.com/coqui-ai/open-speech-corpora)|||ASR-多语种|Hindi|1111小时|[https://www.openslr.org/118/](https://www.openslr.org/118/)|||ASR-多语种|Samrómur Queries 21.12|Samrómur Icelandic Speech corpus 20小时|[https://www.openslr.org/116/](https://www.openslr.org/116/)|||ASR-多语种|Samrómur Children 21.09|Icelandic Speech from children|[https://www.openslr.org/117/](https://www.openslr.org/117/)|||ASR-多语种|Golos|1240小时Russian|[https://www.openslr.org/114/](https://www.openslr.org/114/)|||ASR-多语种|MediaSpeech|10小时French,Arabic,Turkish and Spanish media speech|[https://www.openslr.org/108/](https://www.openslr.org/108/)|||噪声|Demand|各种各样的噪声|[https://zenodo.org/record/1227121#.Yi7QhhNBy3K](https://zenodo.org/record/1227121#.Yi7QhhNBy3K)|||噪声|Noisex-92|噪声|[https://github.com/yqlihust/noisex-92](https://github.com/yqlihust/noisex-92)|||噪声|MUSAN|music,speech,and noise|[https://www.openslr.org/17/](https://www.openslr.org/17/)||

TTS 数据整理

|类别|名称|描述|链接|详情||-|-|-|-|-||ASR-中文|mozilla common voice|提供各种语言的音频,目前14122小时87中语言|[https://commonvoice.mozilla.org/zh-CN/datasets](https://commonvoice.mozilla.org/zh-CN/datasets)|||ASR-中文|OpenSLR|提供各种语言的合成、识别等语料|[https://www.openslr.org/resources.php](https://www.openslr.org/resources.php)|||ASR-中文|open speech corpora|各类数据搜集|[https://github.com/coqui-ai/open-speech-corpora](https://github.com/coqui-ai/open-speech-corpora)|||ASR-中文|AiShell-4|211场会议，120小时，多通道中文会议语音数据库|[http://www.aishelltech.com/aishell_4](http://www.aishelltech.com/aishell_4)|||ASR-中文|AliMeeting|118.75小时会议数据|[https://www.openslr.org/119/](https://www.openslr.org/119/)|||ASR-中文|Free ST Chinese Mandarin Corpus|855发音人102600句手机录制|[https://www.openslr.org/38/](https://www.openslr.org/38/)|||ASR-中文|aidatatang_200zh|200小时600发音人文本准确98%|[https://www.openslr.org/62/](https://www.openslr.org/62/)|||ASR-中文|magicData|755小时中文1080spks，安静室内环境，16k||||ASR-中文|magicData-RAMC|180小时中文spontaneous conversation MAGICDATA Mandarin Chinese Conversational Speech Corpus|[https://www.magicdatatech.com/datasets/mdt2021s003-1647827542](https://www.magicdatatech.com/datasets/mdt2021s003-1647827542)|||ASR-中文|TAL_CSASR|中英混合587小时|[https://ai.100tal.com/dataset](https://ai.100tal.com/dataset)|||ASR-中文|TAL_ASR|100小时讲课|[https://ai.100tal.com/dataset](https://ai.100tal.com/dataset)|||ASR-英文|GigaSpeech|10000小时，强烈推荐|[https://github.com/SpeechColab/GigaSpeech](https://github.com/SpeechColab/GigaSpeech)||