C-pack: Packed resources for general chinese embeddings
We introduce C-Pack, a package of resources that significantly advances the field of general
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …
Improving text embeddings with large language models
In this paper, we introduce a novel and simple method for obtaining high-quality text
embeddings using only synthetic data and less than 1k training steps. Unlike existing …
embeddings using only synthetic data and less than 1k training steps. Unlike existing …
SAILER: structure-aware pre-trained language model for legal case retrieval
Legal case retrieval, which aims to find relevant cases for a query case, plays a core role in
the intelligent legal system. Despite the success that pre-training has achieved in ad-hoc …
the intelligent legal system. Despite the success that pre-training has achieved in ad-hoc …
Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation
In this paper, we present a new embedding model, called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …
Scaling laws for dense retrieval
Scaling laws have been observed in a wide range of tasks, particularly in language
generation. Previous studies have found that the performance of large language models …
generation. Previous studies have found that the performance of large language models …
Lecardv2: A large-scale chinese legal case retrieval dataset
As an important component of intelligent legal systems, legal case retrieval plays a critical
role in ensuring judicial justice and fairness. However, the development of legal case …
role in ensuring judicial justice and fairness. However, the development of legal case …
Relevance Feedback with Brain Signals
The Relevance Feedback (RF) process relies on accurate and real-time relevance
estimation of feedback documents to improve retrieval performance. Since collecting explicit …
estimation of feedback documents to improve retrieval performance. Since collecting explicit …
Constructing tree-based index for efficient and effective dense retrieval
Recent studies have shown that Dense Retrieval (DR) techniques can significantly improve
the performance of first-stage retrieval in IR systems. Despite its empirical effectiveness, the …
the performance of first-stage retrieval in IR systems. Despite its empirical effectiveness, the …
MGTE: generalized long-context text representation and reranking models for multilingual text retrieval
We present systematic efforts in building long-context multilingual text representation model
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …
(TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base …
Unsupervised large language model alignment for information retrieval via contrastive feedback
Large language models (LLMs) have demonstrated remarkable capabilities across various
research domains, including the field of Information Retrieval (IR). However, the responses …
research domains, including the field of Information Retrieval (IR). However, the responses …