[PDF][PDF] INEL corpora general transcription and annotation principles
A Arkhipov - 2020 - researchgate.net
INEL (“Grammatical Descriptions, Corpora and Language Technology for Indigenous
Northern Eurasian Languages”) is a long-term research project (2016–2033), whose …
Northern Eurasian Languages”) is a long-term research project (2016–2033), whose …
The nature of Icelandic as a second language: An insight from the learner error corpus for Icelandic
I Glisic, AK Ingason - CLARIN Annual Conference, 2022 - ecp.ep.liu.se
Abstract The Icelandic L2 Error Corpus is an expanding collection of texts written by users of
Icelandic as a second language, published on CLARIN. It currently consisting of 22,705 …
Icelandic as a second language, published on CLARIN. It currently consisting of 22,705 …
WebAnno-MM: EXMARaLDA meets WebAnno
In this paper, we present WebAnno-MM, an extension of the popular web-based annotation
tool WebAnno, which is designed for the linguistic annotation of transcribed spoken data …
tool WebAnno, which is designed for the linguistic annotation of transcribed spoken data …
Towards comprehensive definitions of data quality for audiovisual annotated language resources
H Hedeland - CLARIN Annual Conference, 2020 - ecp.ep.liu.se
Though digital infrastructures such as CLARIN have been successfully established and now
provide large collections of digital resources, the lack of widely accepted standards for data …
provide large collections of digital resources, the lack of widely accepted standards for data …
The TEI-based ISO Standard 'Transcription of spoken language'as an Exchange Format within CLARIN and beyond
H Hedeland, T Schmidt - CLARIN Annual Conference, 2022 - ecp.ep.liu.se
This paper describes the TEI-based ISO standard 24624: 2016 'Transcription of spoken
language'and other formats used within CLARIN for spoken language resources. It assesses …
language'and other formats used within CLARIN for spoken language resources. It assesses …
Development of the Siberian Ingrian Finnish Speech Corpus
I Ubaleht, TK Raudalainen - Proceedings of the Fifth Workshop on …, 2022 - aclanthology.org
In this paper we present the speech corpus for the Siberian Ingrian Finnish language. The
speech corpus includes audio data, annotations, software tools for data-processing, two …
speech corpus includes audio data, annotations, software tools for data-processing, two …
Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data
D Jettka, T Lehmberg - … of the Twelfth Language Resources and …, 2020 - aclanthology.org
This paper reports on challenges and solution approaches in the development of methods
for language resource overarching data analysis in the field of language documentation. It is …
for language resource overarching data analysis in the field of language documentation. It is …
The INEL Dolgan corpus: Insights into an endangered language of Northern Eurasia
CL Däbritz - Finno-Ugric Languages and Linguistics, 2021 - full.btk.ppke.hu
The paper at hand presents a description of the INEL Dolgan Corpus that has been created
from 2016 to 2019 within the INEL project, located at the Institute for Finno-Ugric/Uralic …
from 2016 to 2019 within the INEL project, located at the Institute for Finno-Ugric/Uralic …
Providing Digital Infrastructure for Audio-Visual Linguistic Research Data with Diverse Usage Scenarios: Lessons Learnt
H Hedeland - Publications, 2020 - mdpi.com
This article describes the development of the digital infrastructure at a research data centre
for audio-visual linguistic research data, the Hamburg Centre for Language Corpora (HZSK) …
for audio-visual linguistic research data, the Hamburg Centre for Language Corpora (HZSK) …