Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks

C Yan, Y He - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org
Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …

Efficient joinable table discovery in data lakes: A high-dimensional similarity-based approach

Y Dong, K Takeoka, C Xiao… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Finding joinable tables in data lakes is key procedure in many applications such as data
integration, data augmentation, data analysis, and data market. Traditional approaches that …

Blinkfill: Semi-supervised programming by example for syntactic string transformations

R Singh - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
The recent Programming By Example (PBE) techniques such as FlashFill have shown great
promise for enabling end-users to perform data transformation tasks using input-output …

Deepjoin: Joinable table discovery with pre-trained language models

Y Dong, C Xiao, T Nozawa, M Enomoto… - arXiv preprint arXiv …, 2022 - arxiv.org
Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery
has become an important operation in data lake management. Existing approaches target …

Design and analysis of a processing-in-dimm join algorithm: A case study with upmem dimms

C Lim, S Lee, J Choi, J Lee, S Park, H Kim… - Proceedings of the …, 2023 - dl.acm.org
Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by
implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly …

Uni-detect: A unified approach to automated error detection in tables

P Wang, Y He - Proceedings of the 2019 International Conference on …, 2019 - dl.acm.org
Data errors are ubiquitous in tables. Extensive research in this area has resulted in a rich
variety of techniques, each often targeting a specific type of errors, eg, numeric outliers …

Auto-join: Joining tables by leveraging transformations

E Zhu, Y He, S Chaudhuri - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Traditional equi-join relies solely on string equality comparisons to perform joins. However,
in scenarios such as ad-hoc data analysis in spreadsheets, users increasingly need to join …

Auto-fuzzyjoin: Auto-program fuzzy similarity joins without labeled examples

P Li, X Cheng, X Chu, Y He, S Chaudhuri - Proceedings of the 2021 …, 2021 - dl.acm.org
Fuzzy similarity join is an important database operator widely used in practice. So far the
research community has focused exclusively on optimizing fuzzy joinscalability. However …

Auto-transform: learning-to-transform by patterns

Z Jin, Y He, S Chauduri - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Data Transformation is a long-standing problem in data management. Recent work adopts
a" transform-by-example"(TBE) paradigm to infer transformation programs based on user …

Auto-detect: Data-driven error detection in tables

Z Huang, Y He - Proceedings of the 2018 International Conference on …, 2018 - dl.acm.org
Given a single column of values, existing approaches typically employ regex-like rules to
detect errors by finding anomalous values inconsistent with others. Such techniques make …