SWITCHBOARD: Telephone speech corpus for research and development

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：280 相关文章所有 10 个版本

[HTML] sciencedirect.com

[HTML][HTML] Turn-taking in conversational systems and human-robot interaction: a review

G Skantze - Computer Speech & Language, 2021 - Elsevier

The taking of turns is a fundamental aspect of dialogue. Since it is difficult to speak and listen
at the same time, the participants need to coordinate who is currently speaking and when …

被引用次数：187 相关文章所有 3 个版本

[HTML] nature.com

[HTML][HTML] A high-performance speech neuroprosthesis

FR Willett, EM Kunz, C Fan, DT Avansino, GH Wilson… - Nature, 2023 - nature.com

Speech brain–computer interfaces (BCIs) have the potential to restore rapid communication
to people with paralysis by decoding neural activity evoked by attempted speech into text, or …

被引用次数：164 相关文章所有 16 个版本

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

被引用次数：110 相关文章所有 8 个版本

[PDF] mlr.press

Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding

Y Peng, S Dalmia, I Lane… - … Conference on Machine …, 2022 - proceedings.mlr.press

Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …

被引用次数：107 相关文章所有 8 个版本

[PDF] neurips.cc

Dataperf: Benchmarks for data-centric ai development

M Mazumder, C Banbury, X Yao… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Machine learning research has long focused on models rather than datasets, and
prominent datasets are used for common ML tasks without regard to the breadth, difficulty …

被引用次数：93 相关文章所有 6 个版本

[PDF] arxiv.org

A comparative study on transformer vs rnn in speech applications

S Karita, N Chen, T Hayashi, T Hori… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org

Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …

被引用次数：794 相关文章所有 10 个版本

[HTML] sciencedirect.com

[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild

A Nagrani, JS Chung, W Xie, A Zisserman - Computer Speech & Language, 2020 - Elsevier

The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …

被引用次数：664 相关文章所有 11 个版本

[PDF] arxiv.org

Specaugment: A simple data augmentation method for automatic speech recognition

DS Park, W Chan, Y Zhang, CC Chiu, B Zoph… - arXiv preprint arXiv …, 2019 - arxiv.org

We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …

被引用次数：3869 相关文章所有 8 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：75 相关文章所有 6 个版本