Sema-join: joining semantically-related tables using big table corpora

C Yan, Y He - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org

Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …

被引用次数：92 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient joinable table discovery in data lakes: A high-dimensional similarity-based approach

Y Dong, K Takeoka, C Xiao… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org

Finding joinable tables in data lakes is key procedure in many applications such as data
integration, data augmentation, data analysis, and data market. Traditional approaches that …

被引用次数：91 相关文章所有 6 个版本

[PDF] psu.edu

Blinkfill: Semi-supervised programming by example for syntactic string transformations

R Singh - Proceedings of the VLDB Endowment, 2016 - dl.acm.org

The recent Programming By Example (PBE) techniques such as FlashFill have shown great
promise for enabling end-users to perform data transformation tasks using input-output …

被引用次数：137 相关文章所有 7 个版本

[PDF] arxiv.org

Deepjoin: Joinable table discovery with pre-trained language models

Y Dong, C Xiao, T Nozawa, M Enomoto… - arXiv preprint arXiv …, 2022 - arxiv.org

Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery
has become an important operation in data lake management. Existing approaches target …

被引用次数：29 相关文章所有 4 个版本

Design and analysis of a processing-in-dimm join algorithm: A case study with upmem dimms

C Lim, S Lee, J Choi, J Lee, S Park, H Kim… - Proceedings of the …, 2023 - dl.acm.org

Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by
implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly …

被引用次数：20 相关文章

[PDF] sfu.ca

Uni-detect: A unified approach to automated error detection in tables

P Wang, Y He - Proceedings of the 2019 International Conference on …, 2019 - dl.acm.org

Data errors are ubiquitous in tables. Extensive research in this area has resulted in a rich
variety of techniques, each often targeting a specific type of errors, eg, numeric outliers …

被引用次数：77 相关文章所有 3 个版本

[PDF] vldb.org

Auto-join: Joining tables by leveraging transformations

E Zhu, Y He, S Chaudhuri - Proceedings of the VLDB Endowment, 2017 - dl.acm.org

Traditional equi-join relies solely on string equality comparisons to perform joins. However,
in scenarios such as ad-hoc data analysis in spreadsheets, users increasingly need to join …

被引用次数：81 相关文章所有 5 个版本

[PDF] arxiv.org

Auto-fuzzyjoin: Auto-program fuzzy similarity joins without labeled examples

P Li, X Cheng, X Chu, Y He, S Chaudhuri - Proceedings of the 2021 …, 2021 - dl.acm.org

Fuzzy similarity join is an important database operator widely used in practice. So far the
research community has focused exclusively on optimizing fuzzy joinscalability. However …

被引用次数：35 相关文章所有 4 个版本

[PDF] vldb.org

Auto-transform: learning-to-transform by patterns

Z Jin, Y He, S Chauduri - Proceedings of the VLDB Endowment, 2020 - dl.acm.org

Data Transformation is a long-standing problem in data management. Recent work adopts
a" transform-by-example"(TBE) paradigm to infer transformation programs based on user …

被引用次数：39 相关文章所有 3 个版本

[PDF] github.io

Auto-detect: Data-driven error detection in tables

Z Huang, Y He - Proceedings of the 2018 International Conference on …, 2018 - dl.acm.org

Given a single column of values, existing approaches typically employ regex-like rules to
detect errors by finding anomalous values inconsistent with others. Such techniques make …

被引用次数：61 相关文章所有 3 个版本