MultiVitaminBooster at PARSEME shared task 2020: Combining window-and dependency-based features with multilingual contextualised word embeddings for …

S Gombert, S Bartsch - Proceedings of the Joint Workshop on …, 2020 - aclanthology.org
Proceedings of the Joint Workshop on Multiword Expressions and …, 2020aclanthology.org
In this paper, we present MultiVitaminBooster, a system implemented for the PARSEME
shared task on semi-supervised identification of verbal multiword expressions-edition 1.2.
For our approach, we interpret detecting verbal multiword expressions as a token
classification task aiming to decide whether a token is part of a verbal multiword expression
or not. For this purpose, we train gradient boosting-based models. We encode tokens as
feature vectors combining multilingual contextualized word embeddings provided by the …
Abstract
In this paper, we present MultiVitaminBooster, a system implemented for the PARSEME shared task on semi-supervised identification of verbal multiword expressions-edition 1.2. For our approach, we interpret detecting verbal multiword expressions as a token classification task aiming to decide whether a token is part of a verbal multiword expression or not. For this purpose, we train gradient boosting-based models. We encode tokens as feature vectors combining multilingual contextualized word embeddings provided by the XLM-RoBERTa language model with a more traditional linguistic feature set relying on context windows and dependency relations. Our system was ranked 7th in the official open track ranking of the shared task evaluations with an encoding-related bug distorting the results. For this reason we carry out further unofficial evaluations. Unofficial versions of our systems would have achieved higher ranks.
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果