Japanese and korean voice search

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：439 相关文章所有 3 个版本

[PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：377 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2697 相关文章所有 4 个版本

[PDF] arxiv.org

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

被引用次数：247 相关文章所有 3 个版本

[PDF] thecvf.com

Lit: Zero-shot transfer with locked-image text tuning

X Zhai, X Wang, B Mustafa, A Steiner… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper presents contrastive-tuning, a simple method employing contrastive training to
align image and text models while still taking advantage of their pre-training. In our empirical …

被引用次数：512 相关文章所有 7 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：4024 相关文章所有 2 个版本

[PDF] nowpublishers.com

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：392 相关文章所有 7 个版本

[图书][B] Natural language processing with transformers

L Tunstall, L Von Werra, T Wolf - 2022 - books.google.com

Since their introduction in 2017, transformers have quickly become the dominant
architecture for achieving state-of-the-art results on a variety of natural language processing …

被引用次数：375 相关文章所有 4 个版本

[PDF] arxiv.org

W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training

YA Chung, Y Zhang, W Han, CC Chiu… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Motivated by the success of masked language modeling (MLM) in pre-training natural
language processing models, we propose w2v-BERT that explores MLM for self-supervised …

被引用次数：404 相关文章所有 5 个版本

[PDF] arxiv.org

Grit: A generative region-to-text transformer for object understanding

J Wu, J Wang, Z Yang, Z Gan, Z Liu, J Yuan… - European Conference on …, 2025 - Springer

This paper presents a Generative RegIon-to-Text transformer, GRiT, for object
understanding. The spirit of GRiT is to formulate object understanding as< region, text> …

被引用次数：96 相关文章所有 2 个版本