[PDF][PDF] Collecting highly parallel data for paraphrase evaluation
A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing
from making the kind of rapid progress enjoyed by the machine translation community over …
from making the kind of rapid progress enjoyed by the machine translation community over …
[图书][B] Second language research: Methodology and design
In this second edition of the best-selling Second Language Research, Alison Mackey and
Sue Gass continue to guide students step-by-step through conducting the second language …
Sue Gass continue to guide students step-by-step through conducting the second language …
[PDF][PDF] Crowdsourcing translation: Professional quality from non-professionals
O Zaidan, C Callison-Burch - … of the 49th annual meeting of the …, 2011 - aclanthology.org
Naively collecting translations by crowdsourcing the task to non-professional translators
yields disfluent, low-quality results if no quality control is exercised. We demonstrate a …
yields disfluent, low-quality results if no quality control is exercised. We demonstrate a …
[PDF][PDF] Creating speech and language data with amazon's mechanical turk
C Callison-Burch, M Dredze - … of the NAACL HLT 2010 workshop …, 2010 - aclanthology.org
In this paper we give an introduction to using Amazon's Mechanical Turk crowdsourcing
platform for the purpose of collecting data for human language technologies. We survey the …
platform for the purpose of collecting data for human language technologies. We survey the …
[PDF][PDF] Corpus annotation through crowdsourcing: Towards best practice guidelines.
Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of
annotated corpora and a wide range of other linguistic resources. Although the use of this …
annotated corpora and a wide range of other linguistic resources. Although the use of this …
[PDF][PDF] The arabic online commentary dataset: an annotated dataset of informal arabic with high dialectal content
O Zaidan, C Callison-Burch - … of the 49th Annual Meeting of the …, 2011 - aclanthology.org
The written form of Arabic, Modern Standard Arabic (MSA), differs quite a bit from the spoken
dialects of Arabic, which are the true “native” languages of Arabic speakers used in daily life …
dialects of Arabic, which are the true “native” languages of Arabic speakers used in daily life …
Making sense of social media streams through semantics: a survey
K Bontcheva, D Rout - Semantic Web, 2014 - content.iospress.com
Using semantic technologies for mining and intelligent information access to social media is
a challenging, emerging research area. Traditional search methods are no longer able to …
a challenging, emerging research area. Traditional search methods are no longer able to …
[PDF][PDF] Language identification for creating language-specific twitter collections
Social media services such as Twitter offer an immense volume of real-world linguistic data.
We explore the use of Twitter to obtain authentic user-generated text in low-resource …
We explore the use of Twitter to obtain authentic user-generated text in low-resource …
Crowdsourcing research opportunities: lessons from natural language processing
Although the field has led to promising early results, the use of crowdsourcing as an integral
part of science projects is still regarded with skepticism by some, largely due to a lack of …
part of science projects is still regarded with skepticism by some, largely due to a lack of …
The language demographics of amazon mechanical turk
We present a large scale study of the languages spoken by bilingual workers on Mechanical
Turk (MTurk). We establish a methodology for determining the language skills of anonymous …
Turk (MTurk). We establish a methodology for determining the language skills of anonymous …