{FlexMem}: Adaptive page profiling and migration for tiered memory
Tiered memory, combining multiple memory components with different performance and
capacity, provides a cost-effective solution to increase memory capacity and improve …
capacity, provides a cost-effective solution to increase memory capacity and improve …
Watching for software inefficiencies with witch
Inefficiencies abound in complex, layered software. A variety of inefficiencies show up as
wasteful memory operations. Many existing tools instrument every load and store instruction …
wasteful memory operations. Many existing tools instrument every load and store instruction …
Slowcoach: Mutating code to simulate performance bugs
Performance bugs are unnecessarily inefficient code chunks in software codebases that
cause prolonged execution times and degraded computational resource utilization. For …
cause prolonged execution times and degraded computational resource utilization. For …
ComDetective: a lightweight communication detection tool for threads
Inter-thread communication is a vital performance indicator in shared-memory systems. Prior
works on identifying inter-thread communication employed hardware simulators or binary …
works on identifying inter-thread communication employed hardware simulators or binary …
MemPerf: Profiling Allocator-Induced Performance Slowdowns
The memory allocator plays a key role in the performance of applications, but none of the
existing profilers can pinpoint performance slowdowns caused by a memory allocator …
existing profilers can pinpoint performance slowdowns caused by a memory allocator …
Parallelism-centric what-if and differential analyses
A Yoga, S Nagarakatte - Proceedings of the 40th ACM SIGPLAN …, 2019 - dl.acm.org
This paper proposes TaskProf2, a parallelism profiler and an adviser for task parallel
programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization …
programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization …
Precise event sampling on amd versus intel: Quantitative and qualitative comparison
Precise event sampling is a profiling feature in commodity processors that can sample
hardware events and accurately locate the instructions that trigger the events. This feature …
hardware events and accurately locate the instructions that trigger the events. This feature …
Huron: hybrid false sharing detection and repair
Writing efficient multithreaded code that can leverage the full parallelism of underlying
hardware is difficult. A key impediment is insidious cache contention issues, such as false …
hardware is difficult. A key impediment is insidious cache contention issues, such as false …
[HTML][HTML] Parallelization of particle-mass-transfer algorithms on shared-memory, multi-core CPUs
Simulating the transfer of mass between particles is not straightforwardly parallelized
because it involves the calculation of the influence of many particles on each other. Engdahl …
because it involves the calculation of the influence of many particles on each other. Engdahl …
Pinpointing performance inefficiencies via lightweight variance profiling
Execution variance among different invocation instances of the same procedure is often an
indicator of performance losses. On the one hand, instrumentation-based tools can insert …
indicator of performance losses. On the one hand, instrumentation-based tools can insert …