Computational methods for uncovering reprinted texts in antebellum newspapers
The Viral Texts Project (http://viraltexts. org) is an interdisciplinary and collaborative effort
among the authors listed here, with contributions from project alumni Elizabeth Maddock …
among the authors listed here, with contributions from project alumni Elizabeth Maddock …
Infectious texts: Modeling text reuse in nineteenth-century newspapers
Texts propagate through many social networks and provide evidence for their structure. We
present efficient algorithms for detecting clusters of reused passages embedded within …
present efficient algorithms for detecting clusters of reused passages embedded within …
Detecting and modeling local text reuse
Texts propagate through many social networks and provide evidence for their structure. We
describe and evaluate efficient algorithms for detecting clusters of reused passages …
describe and evaluate efficient algorithms for detecting clusters of reused passages …
Near duplicate detection in an academic digital library
K Williams, CL Giles - Proceedings of the 2013 ACM symposium on …, 2013 - dl.acm.org
The detection and potential removal of duplicates is desirable for a number of reasons, such
as to reduce the need for unnecessary storage and computation, and to provide users with …
as to reduce the need for unnecessary storage and computation, and to provide users with …
Estimating the date of first publication in a large-scale digital library
One prerequisite for cultural analysis in large-scale digital libraries is an accurate estimate of
the date of composition of the text-as distinct from the date of publication of an edition-for the …
the date of composition of the text-as distinct from the date of publication of an edition-for the …
Cataloging for a billion word library of Greek and Latin
G Crane, B Almas, A Babeu, L Cerrato… - Proceedings of the First …, 2014 - dl.acm.org
This paper reports work on a catalog that includes not only standard metadata but also a
complete reference transcription for each work so that users can explicitly cite not only every …
complete reference transcription for each work so that users can explicitly cite not only every …
Poetry: Identification, entity recognition, and retrieval
JJ Foley IV - 2019 - scholarworks.umass.edu
Modern advances in natural language processing (NLP) and information retrieval (IR)
provide for the ability to automatically analyze, categorize, process and search textual …
provide for the ability to automatically analyze, categorize, process and search textual …
The lowest form of flattery: characterising text re-use and plagiarism patterns in a digital library corpus
G Buchanan, D McKay - 2017 ACM/IEEE Joint Conference on …, 2017 - ieeexplore.ieee.org
The re-use of text-particularly misuse, or plagiarism-is a contentious issue. Technological
approaches to identifying student plagiarism are now widespread, but academic …
approaches to identifying student plagiarism are now widespread, but academic …
Massively-parallel similarity join, edge-isoperimetry, and distance correlations on the hypercube
P Beame, C Rashtchian - Proceedings of the Twenty-Eighth Annual ACM …, 2017 - SIAM
We study distributed protocols for finding all pairs of similar vectors in a large dataset. Our
results pertain to a variety of discrete metrics, and we give concrete instantiations for …
results pertain to a variety of discrete metrics, and we give concrete instantiations for …
Sequential word spotting in historical handwritten documents
D Fernández-Mota, R Manmatha… - 2014 11th IAPR …, 2014 - ieeexplore.ieee.org
In this work we present a handwritten word spotting approach that takes advantage of the a
priori known order of appearance of the query words. Given an ordered sequence of query …
priori known order of appearance of the query words. Given an ordered sequence of query …