Auto-differentiation of relational computations for very large scale machine learning

Y Tang, Z Ding, D Jankov, B Yuan… - International …, 2023 - proceedings.mlr.press
The relational data model was designed to facilitate large-scale data management and
analytics. We consider the problem of how to differentiate computations expressed …

A Comparison of End-to-End Decision Forest Inference Pipelines

H Guan, S Masood, M Dwarampudi, V Gunda… - Proceedings of the …, 2023 - dl.acm.org
Decision forest, including RandomForest, XGBoost, and LightGBM, dominates the machine
learning tasks over tabular data. Recently, several frameworks were developed for decision …

Serving deep learning models with deduplication from relational databases

L Zhou, J Chen, A Das, H Min, L Yu, M Zhao… - arXiv preprint arXiv …, 2022 - arxiv.org
There are significant benefits to serve deep learning models from relational databases. First,
features extracted from databases do not need to be transferred to any decoupled deep …

[PDF][PDF] Evolving exact decompilation

E Schulte, J Ruchti, M Noonan, D Ciarletta… - Workshop on Binary …, 2018 - cs.unm.edu
We introduce a novel technique for C decompilation that provides the correctness
guarantees and readability properties essential for accurate and efficient binary analysis …

Automatic optimization of matrix implementations for distributed machine learning and linear algebra

S Luo, D Jankov, B Yuan, C Jermaine - Proceedings of the 2021 …, 2021 - dl.acm.org
Machine learning (ML) computations are often expressed using vectors, matrices, or higher-
dimensional tensors. Such data structures can have many different implementations …

Distributed numerical and machine learning computations via two-phase execution of aggregated join trees

D Jankov, B Yuan, S Luo, C Jermaine - Proceedings of the VLDB …, 2021 - par.nsf.gov
When numerical and machine learning (ML) computations are expressed relationally,
classical query execution strategies (hash-based joins and aggregations) can do a poor job …

Lachesis: automatic partitioning for UDF-centric analytics

J Zou, A Das, P Barhate, A Iyengar, B Yuan… - arXiv preprint arXiv …, 2020 - arxiv.org
Persistent partitioning is effective in avoiding expensive shuffling operations. However it
remains a significant challenge to automate this process for Big Data analytics workloads …

Architecture of a distributed storage that combines file system, memory and computation in a single layer

J Zou, A Iyengar, C Jermaine - The VLDB Journal, 2020 - Springer
Storage and memory systems for modern data analytics are heavily layered, managing
shared persistent data, cached data, and non-shared execution data in separate systems …

Pangea: monolithic distributed storage for data analytics

J Zou, A Iyengar, C Jermaine - arXiv preprint arXiv:1808.06094, 2018 - arxiv.org
Storage and memory systems for modern data analytics are heavily layered, managing
shared persistent data, cached data, and non-shared execution data in separate systems …

Monsoon: Multi-step optimization and execution of queries with partially obscured predicates

S Sikdar, C Jermaine - Proceedings of the 2020 ACM SIGMOD …, 2020 - dl.acm.org
User-defined functions (UDFs) in modern SQL database systems and Big Data processing
systems such as Spark---that offer API bindings in high-level languages such as Python or …