Scala: Supervised contrastive learning for end-to-end automatic speech recognition

E Choshen, A Tamar - International Conference on Machine …, 2023 - proceedings.mlr.press

In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy–the
optimal policy when facing an unknown task that is sampled from some known task …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

C-SENN: Contrastive self-explaining neural network

Y Sawada, K Nakamura - arXiv preprint arXiv:2206.09575, 2022 - arxiv.org

In this study, we use a self-explaining neural network (SENN), which learns unsupervised
concepts, to acquire concepts that are easy for people to understand automatically. In …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Exploring temporal granularity in self-supervised video representation learning

R Qian, Y Li, L Yuan, B Gong, T Liu, M Brown… - arXiv preprint arXiv …, 2021 - arxiv.org

This work presents a self-supervised learning framework named TeG to explore Temporal
Granularity in learning video representations. In TeG, we sample a long clip from a video …

被引用次数：7 相关文章所有 2 个版本

[PDF] mpg.de

[PDF][PDF] On Temporal Granularity in Self-Supervised Video Representation Learning.

R Qian, Y Li, L Yuan, B Gong, T Liu, M Brown… - BMVC, 2022 - bmvc2022.mpi-inf.mpg.de

This work presents an empirical exploration of temporal granularity in self-supervised video
representation learning. While state-of-the-art methods commonly enforce the learned …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Achieving timestamp prediction while recognizing with non-autoregressive end-to-end asr model

X Shi, Y Chen, S Zhang, Z Yan - National Conference on Man-Machine …, 2022 - Springer

Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment
(FA) and provide timestamps, while end-to-end ASR systems especially AED based ones …

被引用次数：5 相关文章所有 3 个版本

[PDF] cornell.edu

Learning to Represent and Recognize Multimodal Videos

R Qian - 2023 - search.proquest.com

In today's digital landscape, the staggering growth of video resources has resulted in a
wealth of visual, auditory, and textual information readily available on the internet. To fully …

Xian Shi), Yanni Chen, Shiliang Zhang, and Zhijie Yan Speech Lab, Alibaba Group, Hangzhou, China {shixian. shi, cyn244124, sly. zsl, zhijie. yzj}@ alibaba-inc. com

ATP While, ASR End-to-End - Man-Machine Speech …, 2023 - books.google.com

Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment
(FA) and provide timestamps, while endto-end ASR systems especially AED based ones are …