Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
Deep encoder, shallow decoder: Reevaluating non-autoregressive machine translation
Much recent effort has been invested in non-autoregressive neural machine translation,
which appears to be an efficient alternative to state-of-the-art autoregressive machine …
which appears to be an efficient alternative to state-of-the-art autoregressive machine …
BLADE: combining vocabulary pruning and intermediate pretraining for scaleable neural CLIR
Learning sparse representations using pretrained language models enhances the
monolingual ranking effectiveness. Such representations are sparse vectors in the …
monolingual ranking effectiveness. Such representations are sparse vectors in the …
Hard non-monotonic attention for character-level transduction
Character-level string-to-string transduction is an important component of various NLP tasks.
The goal is to map an input string to an output string, where the strings may be of different …
The goal is to map an input string to an output string, where the strings may be of different …
Multilingual neural machine translation with deep encoder and multiple shallow decoders
Recent work in multilingual translation advances translation quality surpassing bilingual
baselines using deep transformer models with increased capacity. However, the extra …
baselines using deep transformer models with increased capacity. However, the extra …
Game-theoretic vocabulary selection via the shapley value and banzhaf index
The input vocabulary and the representations learned are crucial to the performance of
neural NLP models. Using the full vocabulary results in less explainable and more memory …
neural NLP models. Using the full vocabulary results in less explainable and more memory …
Efficient inference for multilingual neural machine translation
A Bérard, D Lee, S Clinchant, K Jung… - arXiv preprint arXiv …, 2021 - arxiv.org
Multilingual NMT has become an attractive solution for MT deployment in production. But to
match bilingual quality, it comes at the cost of larger and slower models. In this work, we …
match bilingual quality, it comes at the cost of larger and slower models. In this work, we …
OpenNMT system description for WNMT 2018: 800 words/sec on a single-core CPU
J Senellart, D Zhang, B Wang, G Klein… - Proceedings of the …, 2018 - aclanthology.org
We present a system description of the OpenNMT Neural Machine Translation entry for the
WNMT 2018 evaluation. In this work, we developed a heavily optimized NMT inference …
WNMT 2018 evaluation. In this work, we developed a heavily optimized NMT inference …
The devil is in the details: on the pitfalls of vocabulary selection in neural machine translation
Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of
Neural Machine Translation models by constraining the set of allowed output words during …
Neural Machine Translation models by constraining the set of allowed output words during …
Locality-sensitive hashing for long context neural machine translation
After its introduction the Transformer architecture quickly became the gold standard for the
task of neural machine translation. A major advantage of the Transformer compared to …
task of neural machine translation. A major advantage of the Transformer compared to …