LAPCA: Language-Agnostic Pretraining with Cross-Lingual Alignment

D Abulkhanov, N Sorokin, S Nikolenko… - Proceedings of the 46th …, 2023 - dl.acm.org
Data collection and mining is a crucial bottleneck for cross-lingual information retrieval
(CLIR). While previous works used machine translation and iterative training, we present a …

Retrieval-augmented generation in multilingual settings

N Chirkova, D Rau, H Déjean, T Formal… - arXiv preprint arXiv …, 2024 - arxiv.org
Retrieval-augmented generation (RAG) has recently emerged as a promising solution for
incorporating up-to-date or domain-specific knowledge into large language models (LLMs) …

Steering large language models for cross-lingual information retrieval

P Guo, Y Ren, Y Hu, Y Cao, Y Li, H Huang - Proceedings of the 47th …, 2024 - dl.acm.org
In today's digital age, accessing information across language barriers poses a significant
challenge, with conventional search systems often struggling to interpret and retrieve …

Augmenting passage representations with query generation for enhanced cross-lingual dense retrieval

S Zhuang, L Shou, G Zuccon - Proceedings of the 46th International ACM …, 2023 - dl.acm.org
Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language
models (PLMs) need to be trained to encompass both the relevance matching task and the …

Empowering dual-encoder with query generator for cross-lingual dense retrieval

H Ren, L Shou, N Wu, M Gong, D Jiang - arXiv preprint arXiv:2303.14991, 2023 - arxiv.org
In monolingual dense retrieval, lots of works focus on how to distill knowledge from cross-
encoder re-ranker to dual-encoder retriever and these methods achieve better performance …

Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative Experience

P Guo, Y Hu, Y Cao, Y Ren, Y Li, H Huang - Proceedings of the ACM on …, 2024 - dl.acm.org
In the contemporary digital landscape, search engines play an invaluable role in information
access, yet they often face challenges in Cross-Lingual Information Retrieval (CLIR) …

Translation of Multifaceted Data without Re-Training of Machine Translation Systems

H Moon, S Lee, S Hong, S Lee, C Park… - arXiv preprint arXiv …, 2024 - arxiv.org
Translating major language resources to build minor language resources becomes a widely-
used approach. Particularly in translating complex data points composed of multiple …

Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets

I Sedykh, D Abulkhanov, N Sorokin… - arXiv preprint arXiv …, 2023 - arxiv.org
Code search is an important task that has seen many developments in recent years.
However, previous attempts have mostly considered the problem of searching for code by a …

[PDF][PDF] Few-shot Multilingual Open-domain QA from 5 Examples

F Jiang, T Drummond, T Cohn - fantabulous-j.github.io
Recent approaches to multilingual opendomain question answering (MLODQA) have
achieved promising results given abundant language-specific training data. However, the …

CCT: Cross-consistency training for Clone Detection and Code Search Tasks

N Sorokin, D Abulkhanov, V Malykh - openreview.net
Clone detection is a well known task, which could be formulated on any programming
language. Although to the best of our knowledge there is no cross-lingual clone detection …