Clustering and ranking: Diversity-preserved instruction selection through expert-aligned quality estimation

Y Ge, Y Liu, C Hu, W Meng, S Tao, X Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
With contributions from the open-source community, a vast amount of instruction tuning (IT)
data has emerged. Given the significant resource allocation required for training and …

Know where to go: Make LLM a relevant, responsible, and trustworthy searchers

X Shi, J Liu, Y Liu, Q Cheng, W Lu - Decision Support Systems, 2025 - Elsevier
Abstract The advent of Large Language Models (LLMs) has shown the potential to improve
relevance and provide direct answers in web searches. However, challenges arise in …

Conversational simulmt: Efficient simultaneous translation with large language models

M Wang, TT Vu, Y Wang, E Shareghi… - arXiv preprint arXiv …, 2024 - arxiv.org
Simultaneous machine translation (SimulMT) presents a challenging trade-off between
translation quality and latency. Recent studies have shown that LLMs can achieve good …

Emerging Opportunities of Using Large Language Language Models for Translation Between Drug Molecules and Indications

D Oniani, J Hilsman, C Zang, J Wang, L Cai… - arXiv preprint arXiv …, 2024 - arxiv.org
A drug molecule is a substance that changes the organism's mental or physical state. Every
approved drug has an indication, which refers to the therapeutic use of that drug for treating …

Emerging opportunities of using large language models for translation between drug molecules and indications

D Oniani, J Hilsman, C Zang, J Wang, L Cai… - Scientific Reports, 2024 - nature.com
A drug molecule is a substance that changes an organism's mental or physical state. Every
approved drug has an indication, which refers to the therapeutic use of that drug for treating …

A Context-aware Framework for Translation-mediated Conversations

J Pombal, S Agrawal, P Fernandes, E Zaranis… - arXiv preprint arXiv …, 2024 - arxiv.org
Effective communication is fundamental to any interaction, yet challenges arise when
participants do not share a common language. Automatic translation systems offer a …

Creative and Context-Aware Translation of East Asian Idioms with GPT-4

K Tang, P Song, Y Qin, X Yan - arXiv preprint arXiv:2410.00988, 2024 - arxiv.org
As a type of figurative language, an East Asian idiom condenses rich cultural background
into only a few characters. Translating such idioms is challenging for human translators, who …

Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

S Koneru, M Exel, M Huck, J Niehues - arXiv preprint arXiv:2310.14855, 2023 - arxiv.org
Large Language Models (LLM's) have demonstrated considerable success in various
Natural Language Processing tasks, but they have yet to attain state-of-the-art performance …

Optimizing example selection for retrieval-augmented machine translation with translation memories

M Bouthors, J Crego, F Yvon - arXiv preprint arXiv:2405.15070, 2024 - arxiv.org
Retrieval-augmented machine translation leverages examples from a translation memory by
retrieving similar instances. These examples are used to condition the predictions of a …

How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes

I Vieira, W Allred, S Lankford, S Castilho… - arXiv preprint arXiv …, 2024 - arxiv.org
Decoder-only LLMs have shown impressive performance in MT due to their ability to learn
from extensive datasets and generate high-quality translations. However, LLMs often …