Tegra: Table extraction by global record alignment

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org

Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

被引用次数：1164 相关文章所有 6 个版本

[PDF] arxiv.org

Table-gpt: Table-tuned gpt for diverse table tasks

P Li, Y He, D Yashar, W Cui, S Ge, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Language models, such as GPT-3.5 and ChatGPT, demonstrate remarkable abilities to
follow diverse human instructions and perform a wide range of tasks. However, when …

被引用次数：50 相关文章所有 2 个版本

[PDF] github.io

Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks

C Yan, Y He - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org

Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …

被引用次数：92 相关文章所有 4 个版本

[PDF] psu.edu

Blinkfill: Semi-supervised programming by example for syntactic string transformations

R Singh - Proceedings of the VLDB Endowment, 2016 - dl.acm.org

The recent Programming By Example (PBE) techniques such as FlashFill have shown great
promise for enabling end-users to perform data transformation tasks using input-output …

被引用次数：137 相关文章所有 7 个版本

[PDF] vldb.org

Ten years of webtables

M Cafarella, A Halevy, H Lee, J Madhavan… - Proceedings of the …, 2018 - dl.acm.org

In 2008, we wrote about WebTables, an effort to exploit the large and diverse set of
structured databases casually published online in the form of HTML tables. The past decade …

被引用次数：88 相关文章所有 5 个版本

[PDF] sfu.ca

Uni-detect: A unified approach to automated error detection in tables

P Wang, Y He - Proceedings of the 2019 International Conference on …, 2019 - dl.acm.org

Data errors are ubiquitous in tables. Extensive research in this area has resulted in a rich
variety of techniques, each often targeting a specific type of errors, eg, numeric outliers …

被引用次数：77 相关文章所有 3 个版本

[PDF] vldb.org

Pytheas pattern-based table discovery in CSV files

C Christodoulakis, EB Munson, M Gabel… - Proceedings of the …, 2020 - dl.acm.org

CSV is a popular Open Data format widely used in a variety of domains for its simplicity and
effectiveness in storing and disseminating data. Unfortunately, data published in this format …

被引用次数：50 相关文章所有 6 个版本

[PDF] vldb.org

Auto-join: Joining tables by leveraging transformations

E Zhu, Y He, S Chaudhuri - Proceedings of the VLDB Endowment, 2017 - dl.acm.org

Traditional equi-join relies solely on string equality comparisons to perform joins. However,
in scenarios such as ad-hoc data analysis in spreadsheets, users increasingly need to join …

被引用次数：81 相关文章所有 5 个版本

[PDF] github.io

Auto-detect: Data-driven error detection in tables

Z Huang, Y He - Proceedings of the 2018 International Conference on …, 2018 - dl.acm.org

Given a single column of values, existing approaches typically employ regex-like rules to
detect errors by finding anomalous values inconsistent with others. Such techniques make …

被引用次数：61 相关文章所有 3 个版本

TEXUS: A unified framework for extracting and understanding tables in PDF documents

R Rastan, HY Paik, J Shepherd - Information Processing & Management, 2019 - Elsevier

Tables in documents are a widely-available and rich source of information, but not yet well-
utilised computationally because of the difficulty in automatically extracting their structure …

被引用次数：46 相关文章所有 2 个版本