[PDF][PDF] The tagged Icelandic corpus (MÍM)

S Helgadóttir, Á Svavarsdóttir… - Proceedings of the …, 2012 - academia.edu
In this paper, we describe the development of a morphosyntactically tagged corpus of
Icelandic, the MÍM corpus. The corpus consists of about 25 million tokens of contemporary …

[PDF][PDF] Almannaromur: An open icelandic speech corpus

J Guðnason, O Kjartansson, J Jóhannsson… - … for Under-Resourced …, 2012 - isca-archive.org
The purpose of the Almannarómur project is collecting data for a speech corpus (database)
for Icelandic. Its main aim is creating an open source speech project to enable research and …

Dealing with ambiguity in NLP: finding the best tree in the parse forest

RB Baldursson - 2023 - skemman.is
Context-free grammars (CFGs) are not typically used to parse natural languages, whereas
they are commonly used to parse programming languages. In the latter case, the CFG …

[PDF][PDF] Lexicon Acquisition through Noun Clustering

AB Nikulásdóttir, M Whelpton - LexicoNordica, 2010 - tidsskrift.dk
This paper describes an experiment with clustering of Icelandic nouns based on semantic
relatedness. This work is part of a larger project aiming at semi-automatically constructing a …

From human-oriented dictonaries to computer-oriented lexical resources-trying to pin down words

M Whelpton - Orð og tunga, 2012 - ordogtunga.arnastofnun.is
Dictionaries are designed for the human user; electronic lexical resources are often
designed with computers in mind: to represent information about the form, use and meaning …

[PDF][PDF] Icelandic language technology: an overview

E Rögnvaldsson - Language, Languages and New Technologies: ICT …, 2010 - efnil.nytud.hu
We describe the establishment and development of Icelandic language technology since its
very beginning ten years ago. The ground was laid with a report from an Expert Group …

Samba: Automatic identification of verbal expressions in Icelandic

K Rúnarsson - 2017 - skemman.is
This thesis discusses the development of Samba, a software solution designed to identify
known verbal expressions in PoS-tagged and lemmatized text. Samba uses a database of …