Learning multiscale transformer models for sequence generation

C Wang, H Zhou, Y Hu, Y Huo, B Li, T Liu… - Proceedings of the …, 2024 - ojs.aaai.org

Applying Reinforcement Learning (RL) to sequence generation models enables the direct
optimization of long-term rewards (\textit {eg,} BLEU and human feedback), but typically …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Introduction to Transformers: an NLP Perspective

T Xiao, J Zhu - arXiv preprint arXiv:2311.17633, 2023 - arxiv.org

Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

TranSFormer: Slow-fast transformer for machine translation

B Li, Y Jing, X Tan, Z Xing, T Xiao, J Zhu - arXiv preprint arXiv:2305.16982, 2023 - arxiv.org

Learning multiscale Transformer models has been evidenced as a viable approach to
augmenting machine translation systems. Prior research has primarily focused on treating …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Learning Evaluation Models from Large Language Models for Sequence Generation

C Wang, H Zhou, K Chang, T Liu, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models achieve state-of-the-art performance on sequence generation
evaluation, but typically have a large number of parameters. This is a computational …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Pluggable Neural Machine Translation Models via Memory-augmented Adapters

Y Xu, S Wang, P Li, X Liu, X Wang, W Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Although neural machine translation (NMT) models perform well in the general domain, it
remains rather challenging to control their generation behavior to satisfy the requirement of …

被引用次数：1 相关文章所有 3 个版本

RSMformer: an efficient multiscale transformer-based framework for long sequence time-series forecasting

G Tong, Z Ge, D Peng - Applied Intelligence, 2024 - Springer

Long sequence time-series forecasting (LSTF) is a significant and challenging task. Many
real-world applications require long-term forecasting of time series. In recent years …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Enhancing Neural Machine Translation with Semantic Units

L Huang, S Gu, Z Zhang, Y Feng - arXiv preprint arXiv:2310.11360, 2023 - arxiv.org

Conventional neural machine translation (NMT) models typically use subwords and words
as the basic units for model input and comprehension. However, complete words and …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

End-to-end Planner Training for Language Modeling

N Cornille, F Mai, J Sun, MF Moens - arXiv preprint arXiv:2410.12492, 2024 - arxiv.org

Through end-to-end training to predict the next token, LLMs have become valuable tools for
various tasks. Enhancing their core training in language modeling can improve numerous …

[PDF] arxiv.org

EIT: Enhanced interactive transformer

T Zheng, B Li, H Bao, T Xiao, J Zhu - arXiv preprint arXiv:2212.10197, 2022 - arxiv.org

In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT),
to address the issue of head degradation in self-attention mechanisms. Our approach …

被引用次数：2 相关文章所有 3 个版本

[HTML] mdpi.com

[HTML][HTML] Compressive Strength Prediction of Fly Ash-Based Concrete Using Single and Hybrid Machine Learning Models

H Li, H Chung, Z Li, W Li - Buildings, 2024 - mdpi.com

The compressive strength of concrete is a crucial parameter in structural design, yet its
determination in a laboratory setting is both time-consuming and expensive. The prediction …