How much parallelism is there in irregular applications?
Irregular programs are programs organized around pointer-based data structures such as
trees and graphs. Recent investigations by the Galois project have shown that many …
trees and graphs. Recent investigations by the Galois project have shown that many …
Kremlin: Rethinking and rebooting gprof for the multicore age
Many recent parallelization tools lower the barrier for parallelizing a program, but overlook
one of the first questions that a programmer needs to answer: which parts of the program …
one of the first questions that a programmer needs to answer: which parts of the program …
Practical parallelization of scientific applications with OpenMP, OpenACC and MPI
This work aims at distilling a systematic methodology to modernize existing sequential
scientific codes with a little re-designing effort, turning an old codebase into modern code, ie …
scientific codes with a little re-designing effort, turning an old codebase into modern code, ie …
Speculative parallelization using software multi-threaded transactions
With the right techniques, multicore architectures may be able to continue the exponential
performance trend that elevated the performance of applications of all types for decades …
performance trend that elevated the performance of applications of all types for decades …
HELIX: Automatic parallelization of irregular programs for chip multiprocessing
We describe and evaluate HELIX, a new technique for automatic loop parallelization that
assigns successive iterations of a loop to separate threads. We show that the inter-thread …
assigns successive iterations of a loop to separate threads. We show that the inter-thread …
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
Multicore designs have emerged as the mainstream design paradigm for the microprocessor
industry. Unfortunately, providing multiple cores does not directly translate into performance …
industry. Unfortunately, providing multiple cores does not directly translate into performance …
HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs
Data dependences in sequential programs limit parallelization because extracted threads
cannot run independently. Although thread-level speculation can avoid the need for precise …
cannot run independently. Although thread-level speculation can avoid the need for precise …
HELIX-UP: Relaxing program semantics to unleash parallelization
S Campanoni, G Holloway, GY Wei… - 2015 IEEE/ACM …, 2015 - ieeexplore.ieee.org
Automatic generation of parallel code for general-purpose commodity processors is a
challenging computational problem. Nevertheless, there is a lot of latent thread-level …
challenging computational problem. Nevertheless, there is a lot of latent thread-level …
Dynamic trace-based analysis of vectorization potential of applications
J Holewinski, R Ramamurthi, M Ravishankar… - Proceedings of the 33rd …, 2012 - dl.acm.org
Recent hardware trends with GPUs and the increasing vector lengths of SSE-like ISA
extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical …
extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical …
Kismet: parallel speedup estimates for serial programs
Software engineers now face the difficult task of refactoring serial programs for parallel
execution on multicore processors. Currently, they are offered little guidance as to how much …
execution on multicore processors. Currently, they are offered little guidance as to how much …