{FlexMem}: Adaptive page profiling and migration for tiered memory

D Xu, J Ryu, K Shin, P Su, D Li - 2024 USENIX Annual Technical …, 2024 - usenix.org
Tiered memory, combining multiple memory components with different performance and
capacity, provides a cost-effective solution to increase memory capacity and improve …

Watching for software inefficiencies with witch

S Wen, X Liu, J Byrne, M Chabbi - Proceedings of the Twenty-Third …, 2018 - dl.acm.org
Inefficiencies abound in complex, layered software. A variety of inefficiencies show up as
wasteful memory operations. Many existing tools instrument every load and store instruction …

Slowcoach: Mutating code to simulate performance bugs

Y Chen, O Schwahn, R Natella… - 2022 IEEE 33rd …, 2022 - ieeexplore.ieee.org
Performance bugs are unnecessarily inefficient code chunks in software codebases that
cause prolonged execution times and degraded computational resource utilization. For …

ComDetective: a lightweight communication detection tool for threads

MA Sasongko, M Chabbi, P Akhtar, D Unat - Proceedings of the …, 2019 - dl.acm.org
Inter-thread communication is a vital performance indicator in shared-memory systems. Prior
works on identifying inter-thread communication employed hardware simulators or binary …

MemPerf: Profiling Allocator-Induced Performance Slowdowns

J Zhou, S Silvestro, S Tang, H Yang, H Liu… - Proceedings of the …, 2023 - dl.acm.org
The memory allocator plays a key role in the performance of applications, but none of the
existing profilers can pinpoint performance slowdowns caused by a memory allocator …

Parallelism-centric what-if and differential analyses

A Yoga, S Nagarakatte - Proceedings of the 40th ACM SIGPLAN …, 2019 - dl.acm.org
This paper proposes TaskProf2, a parallelism profiler and an adviser for task parallel
programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization …

Precise event sampling on amd versus intel: Quantitative and qualitative comparison

MA Sasongko, M Chabbi, PHJ Kelly… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Precise event sampling is a profiling feature in commodity processors that can sample
hardware events and accurately locate the instructions that trigger the events. This feature …

Huron: hybrid false sharing detection and repair

TA Khan, Y Zhao, G Pokam, B Mozafari… - Proceedings of the 40th …, 2019 - dl.acm.org
Writing efficient multithreaded code that can leverage the full parallelism of underlying
hardware is difficult. A key impediment is insidious cache contention issues, such as false …

[HTML][HTML] Parallelization of particle-mass-transfer algorithms on shared-memory, multi-core CPUs

DA Benson, I Pribec, NB Engdahl, S Pankavich… - Advances in Water …, 2024 - Elsevier
Simulating the transfer of mass between particles is not straightforwardly parallelized
because it involves the calculation of the influence of many particles on each other. Engdahl …

Pinpointing performance inefficiencies via lightweight variance profiling

P Su, S Jiao, M Chabbi, X Liu - … of the International Conference for High …, 2019 - dl.acm.org
Execution variance among different invocation instances of the same procedure is often an
indicator of performance losses. On the one hand, instrumentation-based tools can insert …