Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
[HTML][HTML] Turn-taking in conversational systems and human-robot interaction: a review
G Skantze - Computer Speech & Language, 2021 - Elsevier
The taking of turns is a fundamental aspect of dialogue. Since it is difficult to speak and listen
at the same time, the participants need to coordinate who is currently speaking and when …
at the same time, the participants need to coordinate who is currently speaking and when …
[HTML][HTML] A high-performance speech neuroprosthesis
Speech brain–computer interfaces (BCIs) have the potential to restore rapid communication
to people with paralysis by decoding neural activity evoked by attempted speech into text, or …
to people with paralysis by decoding neural activity evoked by attempted speech into text, or …
Voicebox: Text-guided multilingual universal speech generation at scale
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …
community. These models not only generate high fidelity outputs, but are also generalists …
Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding
Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …
benefits of extracting local dependencies using convolutions and global dependencies …
Dataperf: Benchmarks for data-centric ai development
Abstract Machine learning research has long focused on models rather than datasets, and
prominent datasets are used for common ML tasks without regard to the breadth, difficulty …
prominent datasets are used for common ML tasks without regard to the breadth, difficulty …
A comparative study on transformer vs rnn in speech applications
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild
The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …
Specaugment: A simple data augmentation method for automatic speech recognition
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …