PlinyCompute: A platform for high-performance, distributed, data-intensive tool development

Y Tang, Z Ding, D Jankov, B Yuan… - International …, 2023 - proceedings.mlr.press

The relational data model was designed to facilitate large-scale data management and
analytics. We consider the problem of how to differentiate computations expressed …

被引用次数：6 相关文章所有 8 个版本

[PDF] acm.org

A Comparison of End-to-End Decision Forest Inference Pipelines

H Guan, S Masood, M Dwarampudi, V Gunda… - Proceedings of the …, 2023 - dl.acm.org

Decision forest, including RandomForest, XGBoost, and LightGBM, dominates the machine
learning tasks over tabular data. Recently, several frameworks were developed for decision …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Serving deep learning models with deduplication from relational databases

L Zhou, J Chen, A Das, H Min, L Yu, M Zhao… - arXiv preprint arXiv …, 2022 - arxiv.org

There are significant benefits to serve deep learning models from relational databases. First,
features extracted from databases do not need to be transferred to any decoupled deep …

被引用次数：14 相关文章所有 9 个版本

[PDF] unm.edu

[PDF][PDF] Evolving exact decompilation

E Schulte, J Ruchti, M Noonan, D Ciarletta… - Workshop on Binary …, 2018 - cs.unm.edu

We introduce a novel technique for C decompilation that provides the correctness
guarantees and readability properties essential for accurate and efficient binary analysis …

被引用次数：46 相关文章所有 4 个版本

[PDF] acm.org

Automatic optimization of matrix implementations for distributed machine learning and linear algebra

S Luo, D Jankov, B Yuan, C Jermaine - Proceedings of the 2021 …, 2021 - dl.acm.org

Machine learning (ML) computations are often expressed using vectors, matrices, or higher-
dimensional tensors. Such data structures can have many different implementations …

被引用次数：13 相关文章所有 3 个版本

[PDF] nsf.gov

Distributed numerical and machine learning computations via two-phase execution of aggregated join trees

D Jankov, B Yuan, S Luo, C Jermaine - Proceedings of the VLDB …, 2021 - par.nsf.gov

When numerical and machine learning (ML) computations are expressed relationally,
classical query execution strategies (hash-based joins and aggregations) can do a poor job …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Lachesis: automatic partitioning for UDF-centric analytics

J Zou, A Das, P Barhate, A Iyengar, B Yuan… - arXiv preprint arXiv …, 2020 - arxiv.org

Persistent partitioning is effective in avoiding expensive shuffling operations. However it
remains a significant challenge to automate this process for Big Data analytics workloads …

被引用次数：11 相关文章所有 9 个版本

[PDF] google.com

Architecture of a distributed storage that combines file system, memory and computation in a single layer

J Zou, A Iyengar, C Jermaine - The VLDB Journal, 2020 - Springer

Storage and memory systems for modern data analytics are heavily layered, managing
shared persistent data, cached data, and non-shared execution data in separate systems …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

Pangea: monolithic distributed storage for data analytics

J Zou, A Iyengar, C Jermaine - arXiv preprint arXiv:1808.06094, 2018 - arxiv.org

Storage and memory systems for modern data analytics are heavily layered, managing
shared persistent data, cached data, and non-shared execution data in separate systems …

被引用次数：16 相关文章所有 8 个版本

[PDF] nsf.gov

Monsoon: Multi-step optimization and execution of queries with partially obscured predicates

S Sikdar, C Jermaine - Proceedings of the 2020 ACM SIGMOD …, 2020 - dl.acm.org

User-defined functions (UDFs) in modern SQL database systems and Big Data processing
systems such as Spark---that offer API bindings in high-level languages such as Python or …

被引用次数：13 相关文章所有 2 个版本