LAPCA: Language-Agnostic Pretraining with Cross-Lingual Alignment
Data collection and mining is a crucial bottleneck for cross-lingual information retrieval
(CLIR). While previous works used machine translation and iterative training, we present a …
(CLIR). While previous works used machine translation and iterative training, we present a …
Retrieval-augmented generation in multilingual settings
Retrieval-augmented generation (RAG) has recently emerged as a promising solution for
incorporating up-to-date or domain-specific knowledge into large language models (LLMs) …
incorporating up-to-date or domain-specific knowledge into large language models (LLMs) …
Steering large language models for cross-lingual information retrieval
In today's digital age, accessing information across language barriers poses a significant
challenge, with conventional search systems often struggling to interpret and retrieve …
challenge, with conventional search systems often struggling to interpret and retrieve …
Augmenting passage representations with query generation for enhanced cross-lingual dense retrieval
Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language
models (PLMs) need to be trained to encompass both the relevance matching task and the …
models (PLMs) need to be trained to encompass both the relevance matching task and the …
Empowering dual-encoder with query generator for cross-lingual dense retrieval
In monolingual dense retrieval, lots of works focus on how to distill knowledge from cross-
encoder re-ranker to dual-encoder retriever and these methods achieve better performance …
encoder re-ranker to dual-encoder retriever and these methods achieve better performance …
Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative Experience
In the contemporary digital landscape, search engines play an invaluable role in information
access, yet they often face challenges in Cross-Lingual Information Retrieval (CLIR) …
access, yet they often face challenges in Cross-Lingual Information Retrieval (CLIR) …
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Translating major language resources to build minor language resources becomes a widely-
used approach. Particularly in translating complex data points composed of multiple …
used approach. Particularly in translating complex data points composed of multiple …
Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets
Code search is an important task that has seen many developments in recent years.
However, previous attempts have mostly considered the problem of searching for code by a …
However, previous attempts have mostly considered the problem of searching for code by a …
[PDF][PDF] Few-shot Multilingual Open-domain QA from 5 Examples
F Jiang, T Drummond, T Cohn - fantabulous-j.github.io
Recent approaches to multilingual opendomain question answering (MLODQA) have
achieved promising results given abundant language-specific training data. However, the …
achieved promising results given abundant language-specific training data. However, the …
CCT: Cross-consistency training for Clone Detection and Code Search Tasks
Clone detection is a well known task, which could be formulated on any programming
language. Although to the best of our knowledge there is no cross-lingual clone detection …
language. Although to the best of our knowledge there is no cross-lingual clone detection …