C-pack: Packed resources for general chinese embeddings

S Xiao, Z Liu, P Zhang, N Muennighoff, D Lian… - Proceedings of the 47th …, 2024 - dl.acm.org
We introduce C-Pack, a package of resources that significantly advances the field of general
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …

Improving text embeddings with large language models

L Wang, N Yang, X Huang, L Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we introduce a novel and simple method for obtaining high-quality text
embeddings using only synthetic data and less than 1k training steps. Unlike existing …

SAILER: structure-aware pre-trained language model for legal case retrieval

H Li, Q Ai, J Chen, Q Dong, Y Wu, Y Liu… - Proceedings of the 46th …, 2023 - dl.acm.org
Legal case retrieval, which aims to find relevant cases for a query case, plays a core role in
the intelligent legal system. Despite the success that pre-training has achieved in ad-hoc …

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

J Chen, S Xiao, P Zhang, K Luo, D Lian… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we present a new embedding model, called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …

Scaling laws for dense retrieval

Y Fang, J Zhan, Q Ai, J Mao, W Su, J Chen… - Proceedings of the 47th …, 2024 - dl.acm.org
Scaling laws have been observed in a wide range of tasks, particularly in language
generation. Previous studies have found that the performance of large language models …

Lecardv2: A large-scale chinese legal case retrieval dataset

H Li, Y Shao, Y Wu, Q Ai, Y Ma, Y Liu - Proceedings of the 47th …, 2024 - dl.acm.org
As an important component of intelligent legal systems, legal case retrieval plays a critical
role in ensuring judicial justice and fairness. However, the development of legal case …

Relevance Feedback with Brain Signals

Z Ye, X Xie, Q Ai, Y Liu, Z Wang, W Su… - ACM Transactions on …, 2024 - dl.acm.org
The Relevance Feedback (RF) process relies on accurate and real-time relevance
estimation of feedback documents to improve retrieval performance. Since collecting explicit …

Constructing tree-based index for efficient and effective dense retrieval

H Li, Q Ai, J Zhan, J Mao, Y Liu, Z Liu… - Proceedings of the 46th …, 2023 - dl.acm.org
Recent studies have shown that Dense Retrieval (DR) techniques can significantly improve
the performance of first-stage retrieval in IR systems. Despite its empirical effectiveness, the …

MGTE: generalized long-context text representation and reranking models for multilingual text retrieval

X Zhang, Y Zhang, D Long, W Xie, Z Dai, J Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
We present systematic efforts in building long-context multilingual text representation model
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …

Unsupervised large language model alignment for information retrieval via contrastive feedback

Q Dong, Y Liu, Q Ai, Z Wu, H Li, Y Liu, S Wang… - Proceedings of the 47th …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated remarkable capabilities across various
research domains, including the field of Information Retrieval (IR). However, the responses …