Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks
C Yan, Y He - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org
Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …
business intelligence (BI) and machine learning (ML) projects. Automating complex data …
Efficient joinable table discovery in data lakes: A high-dimensional similarity-based approach
Finding joinable tables in data lakes is key procedure in many applications such as data
integration, data augmentation, data analysis, and data market. Traditional approaches that …
integration, data augmentation, data analysis, and data market. Traditional approaches that …
Blinkfill: Semi-supervised programming by example for syntactic string transformations
R Singh - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
The recent Programming By Example (PBE) techniques such as FlashFill have shown great
promise for enabling end-users to perform data transformation tasks using input-output …
promise for enabling end-users to perform data transformation tasks using input-output …
Deepjoin: Joinable table discovery with pre-trained language models
Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery
has become an important operation in data lake management. Existing approaches target …
has become an important operation in data lake management. Existing approaches target …
Design and analysis of a processing-in-dimm join algorithm: A case study with upmem dimms
Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by
implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly …
implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly …
Uni-detect: A unified approach to automated error detection in tables
P Wang, Y He - Proceedings of the 2019 International Conference on …, 2019 - dl.acm.org
Data errors are ubiquitous in tables. Extensive research in this area has resulted in a rich
variety of techniques, each often targeting a specific type of errors, eg, numeric outliers …
variety of techniques, each often targeting a specific type of errors, eg, numeric outliers …
Auto-join: Joining tables by leveraging transformations
E Zhu, Y He, S Chaudhuri - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Traditional equi-join relies solely on string equality comparisons to perform joins. However,
in scenarios such as ad-hoc data analysis in spreadsheets, users increasingly need to join …
in scenarios such as ad-hoc data analysis in spreadsheets, users increasingly need to join …
Auto-fuzzyjoin: Auto-program fuzzy similarity joins without labeled examples
Fuzzy similarity join is an important database operator widely used in practice. So far the
research community has focused exclusively on optimizing fuzzy joinscalability. However …
research community has focused exclusively on optimizing fuzzy joinscalability. However …
Auto-transform: learning-to-transform by patterns
Z Jin, Y He, S Chauduri - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Data Transformation is a long-standing problem in data management. Recent work adopts
a" transform-by-example"(TBE) paradigm to infer transformation programs based on user …
a" transform-by-example"(TBE) paradigm to infer transformation programs based on user …
Auto-detect: Data-driven error detection in tables
Z Huang, Y He - Proceedings of the 2018 International Conference on …, 2018 - dl.acm.org
Given a single column of values, existing approaches typically employ regex-like rules to
detect errors by finding anomalous values inconsistent with others. Such techniques make …
detect errors by finding anomalous values inconsistent with others. Such techniques make …