Research directions for principles of data management (dagstuhl perspectives workshop 16151)

S Abiteboul, M Arenas, P Barceló, M Bienvenu… - 2018 - drops.dagstuhl.de
The area of Principles of Data Management (PDM) has made crucial contributions to the
development of formal frameworks for understanding and managing data and knowledge …

Pytheas pattern-based table discovery in CSV files

C Christodoulakis, EB Munson, M Gabel… - Proceedings of the …, 2020 - dl.acm.org
CSV is a popular Open Data format widely used in a variety of domains for its simplicity and
effectiveness in storing and disseminating data. Unfortunately, data published in this format …

Pollock: A Data Loading Benchmark

G Vitagliano, M Hameed, L Jiang, L Reisener… - Proceedings of the …, 2023 - dl.acm.org
Any system at play in a data-driven project has a fundamental requirement: the ability to load
data. The de-facto standard format to distribute and consume raw data is csv. Yet, the plain …

Constant-delay enumeration for nondeterministic document spanners

A Amarilli, P Bourhis, S Mengel… - ACM Transactions on …, 2021 - dl.acm.org
We consider the information extraction framework known as document spanners and study
the problem of efficiently computing the results of the extraction from an input document …

[PDF][PDF] Face recognition based automated attendance management system

A Trivedi, CM Tripathi, Y Perwej… - Int. J. Sci. Res. Sci …, 2022 - academia.edu
At the beginning and end of each session, attendance is an important aspect of the daily
classroom evaluation. When using traditional methods such as calling out roll calls or taking …

Constant delay algorithms for regular document spanners

F Florenzano, C Riveros, M Ugarte… - Proceedings of the 37th …, 2018 - dl.acm.org
Regular expressions and automata models with capture variables are core tools in rule-
based information extraction. These formalisms, also called regular document spanners, use …

Efficient enumeration algorithms for regular document spanners

F Florenzano, C Riveros, M Ugarte… - ACM Transactions on …, 2020 - dl.acm.org
Regular expressions and automata models with capture variables are core tools in rule-
based information extraction. These formalisms, also called regular document spanners, use …

Document spanners for extracting incomplete information: Expressiveness and complexity

F Maturana, C Riveros, D Vrgoc - Proceedings of the 37th ACM SIGMOD …, 2018 - dl.acm.org
Rule-based information extraction has lately received a fair amount of attention from the
database community, with several languages appearing in the last few years. Although …

Multi-hypothesis CSV parsing

T Döhmen, H Mühleisen, P Boncz - Proceedings of the 29th International …, 2017 - dl.acm.org
Comma Separated Value (CSV) files are commonly used to represent data. CSV is a very
simple format, yet we show that it gives rise to a surprisingly large amount of ambiguities in …

Recursive programs for document spanners

L Peterfreund, B Cate, R Fagin, B Kimelfeld - arXiv preprint arXiv …, 2017 - arxiv.org
A document spanner models a program for Information Extraction (IE) as a function that
takes as input a text document (string over a finite alphabet) and produces a relation of …