Uralic multimedia corpora: ISO/TEI corpus data in the project INEL

A Arkhipov - 2020 - researchgate.net

INEL (“Grammatical Descriptions, Corpora and Language Technology for Indigenous
Northern Eurasian Languages”) is a long-term research project (2016–2033), whose …

被引用次数：6 相关文章所有 2 个版本

[PDF] liu.se

The nature of Icelandic as a second language: An insight from the learner error corpus for Icelandic

I Glisic, AK Ingason - CLARIN Annual Conference, 2022 - ecp.ep.liu.se

Abstract The Icelandic L2 Error Corpus is an expanding collection of texts written by users of
Icelandic as a second language, published on CLARIN. It currently consisting of 22,705 …

被引用次数：3 相关文章所有 12 个版本

[PDF] bsz-bw.de

WebAnno-MM: EXMARaLDA meets WebAnno

S Remus, H Hedeland, A Ferger… - Selected papers from …, 2018 - ids-pub.bsz-bw.de

In this paper, we present WebAnno-MM, an extension of the popular web-based annotation
tool WebAnno, which is designed for the linguistic annotation of transcribed spoken data …

被引用次数：7 相关文章所有 6 个版本

[PDF] liu.se

Towards comprehensive definitions of data quality for audiovisual annotated language resources

H Hedeland - CLARIN Annual Conference, 2020 - ecp.ep.liu.se

Though digital infrastructures such as CLARIN have been successfully established and now
provide large collections of digital resources, the lack of widely accepted standards for data …

被引用次数：5 相关文章所有 8 个版本

[PDF] liu.se

The TEI-based ISO Standard 'Transcription of spoken language'as an Exchange Format within CLARIN and beyond

H Hedeland, T Schmidt - CLARIN Annual Conference, 2022 - ecp.ep.liu.se

This paper describes the TEI-based ISO standard 24624: 2016 'Transcription of spoken
language'and other formats used within CLARIN for spoken language resources. It assesses …

被引用次数：2 相关文章所有 13 个版本

[PDF] aclanthology.org

Development of the Siberian Ingrian Finnish Speech Corpus

I Ubaleht, TK Raudalainen - Proceedings of the Fifth Workshop on …, 2022 - aclanthology.org

In this paper we present the speech corpus for the Siberian Ingrian Finnish language. The
speech corpus includes audio data, annotations, software tools for data-processing, two …

被引用次数：1 相关文章所有 3 个版本

[PDF] aclanthology.org

Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data

D Jettka, T Lehmberg - … of the Twelfth Language Resources and …, 2020 - aclanthology.org

This paper reports on challenges and solution approaches in the development of methods
for language resource overarching data analysis in the field of language documentation. It is …

被引用次数：2 相关文章所有 3 个版本

[PDF] ppke.hu

The INEL Dolgan corpus: Insights into an endangered language of Northern Eurasia

CL Däbritz - Finno-Ugric Languages and Linguistics, 2021 - full.btk.ppke.hu

The paper at hand presents a description of the INEL Dolgan Corpus that has been created
from 2016 to 2019 within the INEL project, located at the Institute for Finno-Ugric/Uralic …

被引用次数：1 相关文章所有 4 个版本

[PDF] mdpi.com

Providing Digital Infrastructure for Audio-Visual Linguistic Research Data with Diverse Usage Scenarios: Lessons Learnt

H Hedeland - Publications, 2020 - mdpi.com

This article describes the development of the digital infrastructure at a research data centre
for audio-visual linguistic research data, the Hamburg Centre for Language Corpora (HZSK) …

被引用次数：1 相关文章所有 8 个版本