Bert, mbert, or bibert? a study on contextualized embeddings for neural machine translation

H Xu, B Van Durme, K Murray - arXiv preprint arXiv:2109.04588, 2021 - arxiv.org
The success of bidirectional encoders using masked language models, such as BERT, on
numerous natural language processing tasks has prompted researchers to attempt to …

Cross-lingual few-shot learning on unseen languages

G Winata, S Wu, M Kulkarni, T Solorio… - Proceedings of the …, 2022 - aclanthology.org
Large pre-trained language models (LMs) have demonstrated the ability to obtain good
performance on downstream tasks with limited examples in cross-lingual settings. However …

Hybrid knowledge transfer for improved cross-lingual event detection via hierarchical sample selection

LG Nateras, F Dernoncourt… - Proceedings of the 61st …, 2023 - aclanthology.org
In this paper, we address the Event Detection task under a zero-shot cross-lingual setting
where a model is trained on a source language but evaluated on a distinct target language …

Frustratingly easy label projection for cross-lingual transfer

Y Chen, C Jiang, A Ritter, W Xu - arXiv preprint arXiv:2211.15613, 2022 - arxiv.org
Translating training data into many languages has emerged as a practical solution for
improving cross-lingual transfer. For tasks that involve span-level annotations, such as …

Lost in translation, found in spans: Identifying claims in multilingual social media

S Mittal, M Sundriyal, P Nakov - arXiv preprint arXiv:2310.18205, 2023 - arxiv.org
Claim span identification (CSI) is an important step in fact-checking pipelines, aiming to
identify text segments that contain a checkworthy claim or assertion in a social media post …

Dureader_retrieval: A large-scale chinese benchmark for passage retrieval from web search engine

Y Qiu, H Li, Y Qu, Y Chen, Q She, J Liu, H Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we present DuReader_retrieval, a large-scale Chinese dataset for passage
retrieval. DuReader_retrieval contains more than 90K queries and over 8M unique …

Multilingual Clinical NER: Translation or Cross-lingual Transfer?

F Gaschi, X Fontaine, P Rastin… - 5th Clinical Natural …, 2023 - hal.science
Natural language tasks like Named Entity Recognition (NER) in the clinical domain on non-
English texts can be very time-consuming and expensive due to the lack of annotated data …

Iterative document-level information extraction via imitation learning

Y Chen, W Gantt, W Gu, T Chen, AS White… - arXiv preprint arXiv …, 2022 - arxiv.org
We present a novel iterative extraction model, IterX, for extracting complex relations, or
templates (ie, N-tuples representing a mapping from named slots to spans of text) within a …

Contextual label projection for cross-lingual structure extraction

T Parekh, I Hsu, KH Huang, KW Chang… - arXiv preprint arXiv …, 2023 - arxiv.org
Translating training data into target languages has proven beneficial for cross-lingual
transfer. However, for structure extraction tasks, translating data requires a label projection …

Multitacred: a multilingual version of the tac relation extraction dataset

L Hennig, P Thomas, S Möller - arXiv preprint arXiv:2305.04582, 2023 - arxiv.org
Relation extraction (RE) is a fundamental task in information extraction, whose extension to
multilingual settings has been hindered by the lack of supervised resources comparable in …