Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges

Q Cheng, D Sahoo, A Saha, W Yang, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …

A survey of binary code fingerprinting approaches: taxonomy, methodologies, and features

S Alrabaee, M Debbabi, L Wang - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
Binary code fingerprinting is crucial in many security applications. Examples include
malware detection, software infringement, vulnerability analysis, and digital forensics. It is …

24/7 characterization of petascale I/O workloads

P Carns, R Latham, R Ross, K Iskra… - … on Cluster Computing …, 2009 - ieeexplore.ieee.org
Developing and tuning computational science applications to run on extreme scale systems
are increasingly complicated processes. Challenges such as managing memory access and …

Binary code is not easy

X Meng, BP Miller - Proceedings of the 25th International Symposium on …, 2016 - dl.acm.org
Binary code analysis is an enabling technique for many applications. Modern compilers and
run-time libraries have introduced significant complexities to binary code, which negatively …

D3S: Debugging deployed distributed systems

X Liu, Z Guo, X Wang, F Chen, X Lian, J Tang, M Wu… - NSDI, 2008 - usenix.org
Testing large-scale distributed systems is a challenge, because some errors manifest
themselves only after a distributed sequence of events that involves machine and network …

DCD—disk caching disk: a new approach for boosting I/O performance

Y Hu, Q Yang - ACM SIGARCH Computer Architecture News, 1996 - dl.acm.org
This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the
purpose of optimizing I/O performance. The main idea of the DCD is to use a small log disk …

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

M Noeth, P Ratn, F Mueller, M Schulz… - Journal of Parallel and …, 2009 - Elsevier
Characterizing the communication behavior of large-scale applications is a difficult and
costly task due to code/system complexity and long execution times. While many tools to …

Hardware transactional memory for GPU architectures

WWL Fung, I Singh, A Brownsword… - Proceedings of the 44th …, 2011 - dl.acm.org
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism
(TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of …

Crash graphs: An aggregated view of multiple crashes to improve crash triage

S Kim, T Zimmermann… - 2011 IEEE/IFIP 41st …, 2011 - ieeexplore.ieee.org
Crash reporting systems play an important role in the overall reliability and dependability of
the system helping in identifying and debugging crashes in software systems deployed in …

Flux: A next-generation resource management framework for large HPC centers

DH Ahn, J Garlick, M Grondona, D Lipari… - 2014 43rd …, 2014 - ieeexplore.ieee.org
Resource and job management software is crucial to High Performance Computing (HPC)
for efficient application execution. However, current systems and approaches can no longer …