RobustPeriod: Robust time-frequency mining for multiple periodicity detection
Periodicity detection is a crucial step in time series tasks, including monitoring and
forecasting of metrics in many areas, such as IoT applications and self-driving database …
forecasting of metrics in many areas, such as IoT applications and self-driving database …
Steering query optimizers: A practical take on big data workloads
In recent years, there has been tremendous interest in research that applies machine
learning to database systems. Being one of the most complex components of a DBMS, query …
learning to database systems. Being one of the most complex components of a DBMS, query …
Deploying a steered query optimizer in production at microsoft
Modern analytical workloads are highly heterogeneous and massively complex, making
generic out of the box query optimizers untenable for many customers and scenarios. As a …
generic out of the box query optimizers untenable for many customers and scenarios. As a …
Robustscaler: Qos-aware autoscaling for complex workloads
Autoscaling is a critical component for efficient resource utilization with satisfactory quality of
service (QoS) in cloud computing. This paper investigates proactive autoscaling for widely …
service (QoS) in cloud computing. This paper investigates proactive autoscaling for widely …
Unearthing inter-job dependencies for better cluster scheduling
Inter-job dependencies pervade shared data analytics infrastructures (so-called``data
lakes''), as jobs read output files written by previous jobs, yet are often invisible to current …
lakes''), as jobs read output files written by previous jobs, yet are often invisible to current …
Kea: Tuning an exabyte-scale data infrastructure
Microsoft's internal big-data infrastructure is one of the largest in the world---with over 300k
machines running billions of tasks from over 0.6 M daily jobs. Operating this infrastructure is …
machines running billions of tasks from over 0.6 M daily jobs. Operating this infrastructure is …
Microlearner: A fine-grained learning optimizer for big data workloads at microsoft
Big data systems have become increasingly complex making the job of a query optimizer
incredibly difficult. This is due to more complicated decision making, more complex query …
incredibly difficult. This is due to more complicated decision making, more complex query …
Autotoken: Predicting peak parallelism for big data analytics at microsoft
Right-sizing resource allocation for big-data queries, particularly in serverless environments,
is critical for improving infrastructure operational efficiency, capacity availability, query …
is critical for improving infrastructure operational efficiency, capacity availability, query …
DISTILL: low-overhead data-driven techniques for filtering and costing indexes for scalable index tuning
Many database systems offer index tuning tools that help automatically select appropriate
indexes for improving the performance of an input workload. Index tuning is a resource …
indexes for improving the performance of an input workload. Index tuning is a resource …
Optimal resource allocation for serverless queries
Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-
data services. At the same time, it is incredibly hard for users to allocate resources per query …
data services. At the same time, it is incredibly hard for users to allocate resources per query …