RobustPeriod: Robust time-frequency mining for multiple periodicity detection

Q Wen, K He, L Sun, Y Zhang, M Ke, H Xu - Proceedings of the 2021 …, 2021 - dl.acm.org
Periodicity detection is a crucial step in time series tasks, including monitoring and
forecasting of metrics in many areas, such as IoT applications and self-driving database …

Steering query optimizers: A practical take on big data workloads

P Negi, M Interlandi, R Marcus, M Alizadeh… - Proceedings of the …, 2021 - dl.acm.org
In recent years, there has been tremendous interest in research that applies machine
learning to database systems. Being one of the most complex components of a DBMS, query …

Deploying a steered query optimizer in production at microsoft

W Zhang, M Interlandi, P Mineiro, S Qiao… - Proceedings of the …, 2022 - dl.acm.org
Modern analytical workloads are highly heterogeneous and massively complex, making
generic out of the box query optimizers untenable for many customers and scenarios. As a …

Robustscaler: Qos-aware autoscaling for complex workloads

H Qian, Q Wen, L Sun, J Gu, Q Niu… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Autoscaling is a critical component for efficient resource utilization with satisfactory quality of
service (QoS) in cloud computing. This paper investigates proactive autoscaling for widely …

Unearthing inter-job dependencies for better cluster scheduling

A Chung, S Krishnan, K Karanasos, C Curino… - … USENIX Symposium on …, 2020 - usenix.org
Inter-job dependencies pervade shared data analytics infrastructures (so-called``data
lakes''), as jobs read output files written by previous jobs, yet are often invisible to current …

Kea: Tuning an exabyte-scale data infrastructure

Y Zhu, S Krishnan, K Karanasos, I Tarte… - Proceedings of the …, 2021 - dl.acm.org
Microsoft's internal big-data infrastructure is one of the largest in the world---with over 300k
machines running billions of tasks from over 0.6 M daily jobs. Operating this infrastructure is …

Microlearner: A fine-grained learning optimizer for big data workloads at microsoft

A Jindal, S Qiao, R Sen, H Patel - 2021 IEEE 37th International …, 2021 - ieeexplore.ieee.org
Big data systems have become increasingly complex making the job of a query optimizer
incredibly difficult. This is due to more complicated decision making, more complex query …

Autotoken: Predicting peak parallelism for big data analytics at microsoft

R Sen, A Jindal, H Patel, S Qiao - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Right-sizing resource allocation for big-data queries, particularly in serverless environments,
is critical for improving infrastructure operational efficiency, capacity availability, query …

DISTILL: low-overhead data-driven techniques for filtering and costing indexes for scalable index tuning

T Siddiqui, W Wu, V Narasayya… - Proceedings of the VLDB …, 2022 - dl.acm.org
Many database systems offer index tuning tools that help automatically select appropriate
indexes for improving the performance of an input workload. Index tuning is a resource …

Optimal resource allocation for serverless queries

A Pimpley, S Li, A Srivastava, V Rohra, Y Zhu… - arXiv preprint arXiv …, 2021 - arxiv.org
Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-
data services. At the same time, it is incredibly hard for users to allocate resources per query …