Affinity-based thread and data mapping in shared memory systems
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
Characterizing communication and page usage of parallel applications for thread and data mapping
The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …
evolution of multicore processors. Current systems include several multicore and …
Adapt burstable containers to variable CPU resources
In the age of the cloud-native, container technology, referred as OS-level virtualization, is
increasingly adopted to deploy cloud applications. Compared with virtual machines …
increasingly adopted to deploy cloud applications. Compared with virtual machines …
ComDetective: a lightweight communication detection tool for threads
Inter-thread communication is a vital performance indicator in shared-memory systems. Prior
works on identifying inter-thread communication employed hardware simulators or binary …
works on identifying inter-thread communication employed hardware simulators or binary …
Using the translation lookaside buffer to map threads in parallel applications based on shared memory
EHM Cruz, M Diener… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
The communication latency between the cores in multiprocessor architectures differs
depending on the memory hierarchy and the interconnections. With the increase of the …
depending on the memory hierarchy and the interconnections. With the increase of the …
SIMD parallel MCMC sampling with applications for big-data Bayesian analytics
AS Mahani, MTA Sharabiani - Computational Statistics & Data Analysis, 2015 - Elsevier
Computational intensity and sequential nature of estimation techniques for Bayesian
methods in statistics and machine learning, combined with their increasing applications for …
methods in statistics and machine learning, combined with their increasing applications for …
Optimizing thread-to-core mapping on manycore platforms with distributed tag directories
G Liu, T Schmidt, R Dömer, A Dingankar… - The 20th Asia and …, 2015 - ieeexplore.ieee.org
With the increasing demand for parallel computing power, manycore platforms are attracting
more and more attention due to their potential to improve performance and scalability of …
more and more attention due to their potential to improve performance and scalability of …
Adaptive thread mapping strategies for transactional memory applications
Transactional Memory (TM) is a programmer friendly alternative to traditional lock-based
concurrency. Although it intends to simplify concurrent programming, the performance of the …
concurrency. Although it intends to simplify concurrent programming, the performance of the …
Toward monetary cost effective content placement in cloud centric media network
In recent years, technical challenges are emerging on how to efficiently distribute the rapid
growing user-generated contents (UGCs) with long-tailed nature. To address this issue, we …
growing user-generated contents (UGCs) with long-tailed nature. To address this issue, we …
Topology aware task stealing for on-chip NUMA multi-core processors
“The On Chip NUMA Architectures (OCNA) introduce a new challenge namely memory-
latency to the scheduling methods. The language run-times and libraries try to explore the …
latency to the scheduling methods. The language run-times and libraries try to explore the …