[PDF][PDF] CATS: A tool for customized alignment of text simplification corpora
S Štajner, M Franco-Salvador, P Rosso… - Proceedings of the …, 2018 - aclanthology.org
Proceedings of the Eleventh International Conference on Language …, 2018•aclanthology.org
In text simplification (TS), parallel corpora consisting of original sentences and their
manually simplified counterparts are very scarce and small in size, which impedes building
supervised automated TS systems with sufficient coverage. Furthermore, the existing
corpora usually do not distinguish sentence pairs which present full matches (both
sentences contain the same information), and those that present only partial matches (the
two sentences share the meaning only partially), thus not allowing for building customized …
manually simplified counterparts are very scarce and small in size, which impedes building
supervised automated TS systems with sufficient coverage. Furthermore, the existing
corpora usually do not distinguish sentence pairs which present full matches (both
sentences contain the same information), and those that present only partial matches (the
two sentences share the meaning only partially), thus not allowing for building customized …
Abstract
In text simplification (TS), parallel corpora consisting of original sentences and their manually simplified counterparts are very scarce and small in size, which impedes building supervised automated TS systems with sufficient coverage. Furthermore, the existing corpora usually do not distinguish sentence pairs which present full matches (both sentences contain the same information), and those that present only partial matches (the two sentences share the meaning only partially), thus not allowing for building customized automated TS systems which would separately model different simplification transformations. In this paper, we present our freely available, language-independent tool for sentence alignment from parallel/comparable TS resources (document-aligned resources), which additionally offers the possibility for filtering sentences depending on the level of their semantic overlap. We perform in-depth human evaluation of the tool’s performance on English and Spanish corpora, and explore its capacities for classification of sentence pairs according to the simplification operation they model.
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果