The AMI meeting corpus

X Anguera, S Bozonnet, N Evans… - … on audio, speech …, 2012 - ieeexplore.ieee.org

Speaker diarization is the task of determining “who spoke when?” in an audio or video
recording that contains an unknown amount of speech and also an unknown number of …

被引用次数：908 相关文章所有 20 个版本

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

被引用次数：837 相关文章所有 13 个版本

[HTML] sciencedirect.com

[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild

A Nagrani, JS Chung, W Xie, A Zisserman - Computer Speech & Language, 2020 - Elsevier

The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …

被引用次数：743 相关文章所有 11 个版本

[PDF] arxiv.org

SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization

B Gliwa, I Mochol, M Biesek, A Wawer - arXiv preprint arXiv:1911.12237, 2019 - arxiv.org

This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue
summaries. We investigate the challenges it poses for automated summarization by testing …

被引用次数：561 相关文章所有 7 个版本

[PDF] arxiv.org

FakeAVCeleb: A novel audio-video multimodal deepfake dataset

H Khalid, S Tariq, M Kim, SS Woo - arXiv preprint arXiv:2108.05080, 2021 - arxiv.org

While the significant advancements have made in the generation of deepfakes using deep
learning technologies, its misuse is a well-known issue now. Deepfakes can cause severe …

被引用次数：189 相关文章所有 7 个版本

[PDF] arxiv.org

Voxceleb: a large-scale speaker identification dataset

A Nagrani, JS Chung, A Zisserman - arXiv preprint arXiv:1706.08612, 2017 - arxiv.org

Most existing datasets for speaker identification contain samples obtained under quite
constrained conditions, and are usually hand-annotated, hence limited in size. The goal of …

被引用次数：2709 相关文章所有 15 个版本

[PDF] arxiv.org

Wavesplit: End-to-end speech separation by speaker clustering

N Zeghidour, D Grangier - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the
model infers a representation for each source and then estimates each source signal given …

被引用次数：288 相关文章所有 8 个版本

[PDF] arxiv.org

A survey on semantic processing techniques

R Mao, K He, X Zhang, G Chen, J Ni, Z Yang… - Information …, 2024 - Elsevier

Semantic processing is a fundamental research domain in computational linguistics. In the
era of powerful pre-trained language models and large language models, the advancement …

被引用次数：29 相关文章所有 11 个版本

[PDF] aclanthology.org

Gender and dialect bias in YouTube's automatic captions

R Tatman - Proceedings of the first ACL workshop on ethics in …, 2017 - aclanthology.org

This project evaluates the accuracy of YouTube's automatically-generated captions across
two genders and five dialect groups. Speakers' dialect and gender was controlled for by …

被引用次数：480 相关文章所有 6 个版本

[PDF] arxiv.org

MediaSum: A large-scale media interview dataset for dialogue summarization

C Zhu, Y Liu, J Mei, M Zeng - arXiv preprint arXiv:2103.06410, 2021 - arxiv.org

MediaSum, a large-scale media interview dataset consisting of 463.6 K transcripts with
abstractive summaries. To create this dataset, we collect interview transcripts from NPR and …

被引用次数：129 相关文章所有 4 个版本

Speaker diarization: A review of recent research

Ego4d: Around the world in 3,000 hours of egocentric video

[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild

SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization

FakeAVCeleb: A novel audio-video multimodal deepfake dataset

Voxceleb: a large-scale speaker identification dataset

Wavesplit: End-to-end speech separation by speaker clustering

A survey on semantic processing techniques

Gender and dialect bias in YouTube's automatic captions

MediaSum: A large-scale media interview dataset for dialogue summarization

高级搜索

引用