Controllable paraphrase generation with a syntactic exemplar

M Chen, Q Tang, S Wiseman, K Gimpel - arXiv preprint arXiv:1906.00565, 2019 - arxiv.org
Prior work on controllable text generation usually assumes that the controlled attribute can
take on one of a small set of values known a priori. In this work, we propose a novel task …

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

A multi-task approach for disentangling syntax and semantics in sentence representations

M Chen, Q Tang, S Wiseman, K Gimpel - arXiv preprint arXiv:1904.01173, 2019 - arxiv.org
We propose a generative model for a sentence that uses two latent variables, with one
intended to represent the syntax of the sentence and the other to represent its semantics. We …

Towards realistic practices in low-resource natural language processing: The development set

K Kann, K Cho, SR Bowman - arXiv preprint arXiv:1909.01522, 2019 - arxiv.org
Development sets are impractical to obtain for real low-resource languages, since using all
available data for training is often more effective. However, development sets are widely …

Chroniclingamericaqa: A large-scale question answering dataset based on historical american newspaper pages

B Piryani, J Mozafari, A Jatowt - … of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Question answering (QA) and Machine Reading Comprehension (MRC) tasks have
significantly advanced in recent years due to the rapid development of deep learning …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

PHD: Pixel-Based Language Modeling of Historical Documents

N Borenstein, P Rust, D Elliott, I Augenstein - arXiv preprint arXiv …, 2023 - arxiv.org
The digitisation of historical documents has provided historians with unprecedented
research opportunities. Yet, the conventional approach to analysing historical documents …

[PDF][PDF] Annotations matter: Leveraging multi-task learning to parse UD and SUD

ZA Sayyed, D Dakota - Findings of the Association for …, 2021 - aclanthology.org
Using multiple treebanks to improve parsing performance has shown positive results.
However, to what extent similar, yet competing annotation decisions play in parser behavior …

Semi-supervised contextual historical text normalization

P Makarov, S Clematide - 2020 - zora.uzh.ch
Historical text normalization, the task of mapping historical word forms to their modern
counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; …

When is Multi-task Learning Beneficial for Low-Resource Noisy Code-switched User-generated Algerian Texts?

W Adouane, JP Bernardy - Proceedings of the 4th Workshop on …, 2020 - aclanthology.org
We investigate when is it beneficial to simultaneously learn representations for several
tasks, in low-resource settings. For this, we work with noisy user-generated texts in Algerian …