Speaker diarization: A review of recent research
Speaker diarization is the task of determining “who spoke when?” in an audio or video
recording that contains an unknown amount of speech and also an unknown number of …
recording that contains an unknown amount of speech and also an unknown number of …
Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild
The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …
SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization
This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue
summaries. We investigate the challenges it poses for automated summarization by testing …
summaries. We investigate the challenges it poses for automated summarization by testing …
FakeAVCeleb: A novel audio-video multimodal deepfake dataset
While the significant advancements have made in the generation of deepfakes using deep
learning technologies, its misuse is a well-known issue now. Deepfakes can cause severe …
learning technologies, its misuse is a well-known issue now. Deepfakes can cause severe …
Voxceleb: a large-scale speaker identification dataset
Most existing datasets for speaker identification contain samples obtained under quite
constrained conditions, and are usually hand-annotated, hence limited in size. The goal of …
constrained conditions, and are usually hand-annotated, hence limited in size. The goal of …
Wavesplit: End-to-end speech separation by speaker clustering
N Zeghidour, D Grangier - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the
model infers a representation for each source and then estimates each source signal given …
model infers a representation for each source and then estimates each source signal given …
A survey on semantic processing techniques
Semantic processing is a fundamental research domain in computational linguistics. In the
era of powerful pre-trained language models and large language models, the advancement …
era of powerful pre-trained language models and large language models, the advancement …
Gender and dialect bias in YouTube's automatic captions
R Tatman - Proceedings of the first ACL workshop on ethics in …, 2017 - aclanthology.org
This project evaluates the accuracy of YouTube's automatically-generated captions across
two genders and five dialect groups. Speakers' dialect and gender was controlled for by …
two genders and five dialect groups. Speakers' dialect and gender was controlled for by …
MediaSum: A large-scale media interview dataset for dialogue summarization
MediaSum, a large-scale media interview dataset consisting of 463.6 K transcripts with
abstractive summaries. To create this dataset, we collect interview transcripts from NPR and …
abstractive summaries. To create this dataset, we collect interview transcripts from NPR and …