Auto-differentiation of relational computations for very large scale machine learning
The relational data model was designed to facilitate large-scale data management and
analytics. We consider the problem of how to differentiate computations expressed …
analytics. We consider the problem of how to differentiate computations expressed …
A Comparison of End-to-End Decision Forest Inference Pipelines
Decision forest, including RandomForest, XGBoost, and LightGBM, dominates the machine
learning tasks over tabular data. Recently, several frameworks were developed for decision …
learning tasks over tabular data. Recently, several frameworks were developed for decision …
Serving deep learning models with deduplication from relational databases
There are significant benefits to serve deep learning models from relational databases. First,
features extracted from databases do not need to be transferred to any decoupled deep …
features extracted from databases do not need to be transferred to any decoupled deep …
[PDF][PDF] Evolving exact decompilation
E Schulte, J Ruchti, M Noonan, D Ciarletta… - Workshop on Binary …, 2018 - cs.unm.edu
We introduce a novel technique for C decompilation that provides the correctness
guarantees and readability properties essential for accurate and efficient binary analysis …
guarantees and readability properties essential for accurate and efficient binary analysis …
Automatic optimization of matrix implementations for distributed machine learning and linear algebra
Machine learning (ML) computations are often expressed using vectors, matrices, or higher-
dimensional tensors. Such data structures can have many different implementations …
dimensional tensors. Such data structures can have many different implementations …
Distributed numerical and machine learning computations via two-phase execution of aggregated join trees
When numerical and machine learning (ML) computations are expressed relationally,
classical query execution strategies (hash-based joins and aggregations) can do a poor job …
classical query execution strategies (hash-based joins and aggregations) can do a poor job …
Lachesis: automatic partitioning for UDF-centric analytics
Persistent partitioning is effective in avoiding expensive shuffling operations. However it
remains a significant challenge to automate this process for Big Data analytics workloads …
remains a significant challenge to automate this process for Big Data analytics workloads …
Architecture of a distributed storage that combines file system, memory and computation in a single layer
Storage and memory systems for modern data analytics are heavily layered, managing
shared persistent data, cached data, and non-shared execution data in separate systems …
shared persistent data, cached data, and non-shared execution data in separate systems …
Pangea: monolithic distributed storage for data analytics
Storage and memory systems for modern data analytics are heavily layered, managing
shared persistent data, cached data, and non-shared execution data in separate systems …
shared persistent data, cached data, and non-shared execution data in separate systems …
Monsoon: Multi-step optimization and execution of queries with partially obscured predicates
S Sikdar, C Jermaine - Proceedings of the 2020 ACM SIGMOD …, 2020 - dl.acm.org
User-defined functions (UDFs) in modern SQL database systems and Big Data processing
systems such as Spark---that offer API bindings in high-level languages such as Python or …
systems such as Spark---that offer API bindings in high-level languages such as Python or …