Esrl: Efficient sampling-based reinforcement learning for sequence generation

C Wang, H Zhou, Y Hu, Y Huo, B Li, T Liu… - Proceedings of the …, 2024 - ojs.aaai.org
Applying Reinforcement Learning (RL) to sequence generation models enables the direct
optimization of long-term rewards (\textit {eg,} BLEU and human feedback), but typically …

Introduction to Transformers: an NLP Perspective

T Xiao, J Zhu - arXiv preprint arXiv:2311.17633, 2023 - arxiv.org
Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

TranSFormer: Slow-fast transformer for machine translation

B Li, Y Jing, X Tan, Z Xing, T Xiao, J Zhu - arXiv preprint arXiv:2305.16982, 2023 - arxiv.org
Learning multiscale Transformer models has been evidenced as a viable approach to
augmenting machine translation systems. Prior research has primarily focused on treating …

Learning Evaluation Models from Large Language Models for Sequence Generation

C Wang, H Zhou, K Chang, T Liu, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models achieve state-of-the-art performance on sequence generation
evaluation, but typically have a large number of parameters. This is a computational …

Pluggable Neural Machine Translation Models via Memory-augmented Adapters

Y Xu, S Wang, P Li, X Liu, X Wang, W Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Although neural machine translation (NMT) models perform well in the general domain, it
remains rather challenging to control their generation behavior to satisfy the requirement of …

RSMformer: an efficient multiscale transformer-based framework for long sequence time-series forecasting

G Tong, Z Ge, D Peng - Applied Intelligence, 2024 - Springer
Long sequence time-series forecasting (LSTF) is a significant and challenging task. Many
real-world applications require long-term forecasting of time series. In recent years …

Enhancing Neural Machine Translation with Semantic Units

L Huang, S Gu, Z Zhang, Y Feng - arXiv preprint arXiv:2310.11360, 2023 - arxiv.org
Conventional neural machine translation (NMT) models typically use subwords and words
as the basic units for model input and comprehension. However, complete words and …

End-to-end Planner Training for Language Modeling

N Cornille, F Mai, J Sun, MF Moens - arXiv preprint arXiv:2410.12492, 2024 - arxiv.org
Through end-to-end training to predict the next token, LLMs have become valuable tools for
various tasks. Enhancing their core training in language modeling can improve numerous …

EIT: Enhanced interactive transformer

T Zheng, B Li, H Bao, T Xiao, J Zhu - arXiv preprint arXiv:2212.10197, 2022 - arxiv.org
In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT),
to address the issue of head degradation in self-attention mechanisms. Our approach …

[HTML][HTML] Compressive Strength Prediction of Fly Ash-Based Concrete Using Single and Hybrid Machine Learning Models

H Li, H Chung, Z Li, W Li - Buildings, 2024 - mdpi.com
The compressive strength of concrete is a crucial parameter in structural design, yet its
determination in a laboratory setting is both time-consuming and expensive. The prediction …