Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration.

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent methods for visual question answering rely on large-scale annotated datasets.
Manual annotation of questions and answers for videos, however, is tedious, expensive and …

被引用次数：312 相关文章所有 14 个版本

[PDF] thecvf.com

Mad: A scalable dataset for language grounding in videos from movie audio descriptions

M Soldan, A Pardo, JL Alcázar… - Proceedings of the …, 2022 - openaccess.thecvf.com

The recent and increasing interest in video-language research has driven the development
of large-scale datasets that enable data-intensive machine learning techniques. In …

被引用次数：104 相关文章所有 8 个版本

[PDF] thecvf.com

Stepformer: Self-supervised step discovery and localization in instructional videos

N Dvornik, I Hadji, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Instructional videos are an important resource to learn procedural tasks from human
demonstrations. However, the instruction steps in such videos are typically short and sparse …

被引用次数：28 相关文章所有 5 个版本

[PDF] aclanthology.org

Punctuation restoration using transformer models for high-and low-resource languages

T Alam, A Khan, F Alam - Proceedings of the Sixth Workshop on …, 2020 - aclanthology.org

Punctuation restoration is a common post-processing problem for Automatic Speech
Recognition (ASR) systems. It is important to improve the readability of the transcribed text …

被引用次数：88 相关文章所有 8 个版本

Attention-based parallel networks (APNet) for PM2. 5 spatiotemporal prediction

J Zhu, F Deng, J Zhao, H Zheng - Science of The Total Environment, 2021 - Elsevier

Urban particulate matter forecast is an important part of air pollution early warning and
control management, especially the forecast of fine particulate matter (PM 2.5). However, the …

被引用次数：74 相关文章所有 7 个版本

[PDF] arxiv.org

Learning to answer visual questions from web videos

A Yang, A Miech, J Sivic, I Laptev, C Schmid - arXiv preprint arXiv …, 2022 - arxiv.org

Recent methods for visual question answering rely on large-scale annotated datasets.
Manual annotation of questions and answers for videos, however, is tedious, expensive and …

被引用次数：42 相关文章所有 10 个版本

[PDF] thecvf.com

Learning to segment actions from visual and language instructions via differentiable weak sequence alignment

Y Shen, L Wang, E Elhamifar - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

We address the problem of unsupervised localization of key-steps and feature learning in
instructional videos using both visual and language instructions. Our key observation is that …

被引用次数：53 相关文章所有 9 个版本

[PDF] iospress.nl

Advanced rich transcription system for Estonian speech

T Alumäe, O Tilk - Human language technologies–the Baltic …, 2018 - ebooks.iospress.nl

This paper describes the current TTÜ speech transcription system for Estonian speech. The
system is designed to handle semi-spontaneous speech, such as broadcast conversations …

被引用次数：95 相关文章所有 4 个版本

[PDF] arxiv.org

Towards automatic detection of misinformation in online medical videos

R Hou, V Pérez-Rosas, S Loeb… - … International conference on …, 2019 - dl.acm.org

Recent years have witnessed a significant increase in the online sharing of medical
information, with videos representing a large fraction of such online sources. Previous …

被引用次数：75 相关文章所有 3 个版本

[PDF] arxiv.org

Capitalization and punctuation restoration: a survey

V Păiş, D Tufiş - Artificial Intelligence Review, 2022 - Springer

Ensuring proper punctuation and letter casing is a key pre-processing step towards applying
complex natural language processing algorithms. This is especially significant for textual …

被引用次数：33 相关文章所有 8 个版本