Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

ID10M: Idiom identification in 10 languages

S Tedeschi, F Martelli, R Navigli - Findings of the Association for …, 2022 - aclanthology.org
Idioms are phrases which present a figurative meaning that cannot be (completely) derived
by looking at the meaning of their individual components. Identifying and understanding …

Idioms and phraseology

MT Espinal, J Mateu - Oxford research encyclopedia of linguistics, 2019 - oxfordre.com
Idioms, conceived as fixed multi-word expressions that conceptually encode non-
compositional meaning, are linguistic units that raise a number of questions relevant in the …

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

U Inurrieta, I Aduriz, A Díaz de Ilarraza, G Labaka… - Plos one, 2020 - journals.plos.org
Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose
important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal …

[HTML][HTML] Multiword expressions–a tough typological nut for Swedish FrameNet++

L Borin - The Swedish FrameNet++, 2021 - jbe-platform.com
Multiword expressions have attracted much attention in language technology over the last
two decades or so, and in general linguistics, the interest in phraseology–which includes the …

Overview of MWE history, challenges, and horizons: standing at the 20th anniversary of the MWE workshop series via MWE-UD2024

L Han, K Evang, A Bhatia, G Bouma… - arXiv preprint arXiv …, 2024 - arxiv.org
Starting in 2003 when the first MWE workshop was held with ACL in Sapporo, Japan, this
year, the joint workshop of MWE-UD co-located with the LREC-COLING 2024 conference …

[PDF][PDF] The framework of multiword expression in Indonesian language

T Suhardijanto, R Mahendra, Z Nuriah… - Proceedings of the …, 2020 - aclanthology.org
This paper presents our attempt to develop an Indonesian multi-word expression (MWE)
identification framework. The framework consists of three different steps. In the first step, we …

Encoding polylexical units with TEI Lex-o: A case study

T Tasovac, A Salgado, R Costa - Slovenščina 2.0: empirične …, 2020 - journals.uni-lj.si
The modelling and encoding of polylexical units, ie recurrent sequences of lexemes that are
perceived as independent lexical units, is a topic that has not been covered adequately and …

Collocations in Parsing and Translation

E Wehrli - Frontiers in Artificial Intelligence, 2022 - frontiersin.org
Proper identification of collocations (and more generally of multiword expressions (MWEs),
is an important qualitative step for several NLP applications and particularly so for …

Фразеологія як об'єкт поліаспектного вивчення

ЮГ Полєжаєв - Нова філологія, 2021 - novafilolohiia.zp.ua
Анотація Інтегративний характер сучасної науки, опанування нових способів пізнання
вможливили впровадження міждисциплінарних методів у мовознавчі дослідження. У …