A survey on data selection for language models

A Albalak, Y Elazar, SM Xie, S Longpre… - arXiv preprint arXiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Cost-effective in-context learning for entity resolution: A design space exploration

M Fan, X Han, J Fan, C Chai, N Tang… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Entity resolution (ER) is an important data integration task with a wide spectrum of
applications. The state-of-the-art solutions on ER rely on pre-trained language models …

Learning or self-aligning? rethinking instruction fine-tuning

M Ren, B Cao, H Lin, L Cao, X Han, K Zeng… - arXiv preprint arXiv …, 2024 - arxiv.org
Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs).
Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the …

Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

Z Wu, X Lin, Z Dai, W Hu, Y Shu, SK Ng… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown impressive capabilities in real-world
applications. The capability of in-context learning (ICL) allows us to adapt an LLM to …

Understanding the Role of User Profile in the Personalization of Large Language Models

B Wu, Z Shi, HA Rahmani, V Ramineni… - arXiv preprint arXiv …, 2024 - arxiv.org
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to
enhance the performance on a wide range of tasks. However, the precise role of user …

Improving In-Context Learning via Sequentially Selection and Preference Alignment for Few-Shot Aspect-Based Sentiment Analysis

Q Wang, K Ding, X Luo, R Xu - … of the 47th International ACM SIGIR …, 2024 - dl.acm.org
In this paper, we leverage in-context learning (ICL) paradigm to handle few-shot aspect-
based sentiment analysis (ABSA). Previous works first rank candidate examples by some …

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

W He, S Liu, J Zhao, Y Ding, Y Lu, Z Xi, T Gui… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown promising abilities of in-context learning (ICL),
adapting swiftly to new tasks with only few-shot demonstrations. However, current few-shot …

Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark

M Wu, T Zhu, H Han, C Tan, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-
like tools. Seal-Tools not only offers a large number of tools, but also includes instances …