Affinity-based thread and data mapping in shared memory systems

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

Characterizing communication and page usage of parallel applications for thread and data mapping

M Diener, EHM Cruz, LL Pilla, F Dupros… - Performance …, 2015 - Elsevier
The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …

Adapt burstable containers to variable CPU resources

H Huang, Y Zhao, J Rao, S Wu, H Jin… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
In the age of the cloud-native, container technology, referred as OS-level virtualization, is
increasingly adopted to deploy cloud applications. Compared with virtual machines …

ComDetective: a lightweight communication detection tool for threads

MA Sasongko, M Chabbi, P Akhtar, D Unat - Proceedings of the …, 2019 - dl.acm.org
Inter-thread communication is a vital performance indicator in shared-memory systems. Prior
works on identifying inter-thread communication employed hardware simulators or binary …

Using the translation lookaside buffer to map threads in parallel applications based on shared memory

EHM Cruz, M Diener… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
The communication latency between the cores in multiprocessor architectures differs
depending on the memory hierarchy and the interconnections. With the increase of the …

SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

AS Mahani, MTA Sharabiani - Computational Statistics & Data Analysis, 2015 - Elsevier
Computational intensity and sequential nature of estimation techniques for Bayesian
methods in statistics and machine learning, combined with their increasing applications for …

Optimizing thread-to-core mapping on manycore platforms with distributed tag directories

G Liu, T Schmidt, R Dömer, A Dingankar… - The 20th Asia and …, 2015 - ieeexplore.ieee.org
With the increasing demand for parallel computing power, manycore platforms are attracting
more and more attention due to their potential to improve performance and scalability of …

Adaptive thread mapping strategies for transactional memory applications

M Castro, LFW Góes, JF Méhaut - Journal of Parallel and Distributed …, 2014 - Elsevier
Transactional Memory (TM) is a programmer friendly alternative to traditional lock-based
concurrency. Although it intends to simplify concurrent programming, the performance of the …

Toward monetary cost effective content placement in cloud centric media network

Y Jin, Y Wen, K Guan, D Kilper… - 2013 IEEE International …, 2013 - ieeexplore.ieee.org
In recent years, technical challenges are emerging on how to efficiently distribute the rapid
growing user-generated contents (UGCs) with long-tailed nature. To address this issue, we …

Topology aware task stealing for on-chip NUMA multi-core processors

B Vikranth, R Wankar, CR Rao - Procedia Computer Science, 2013 - Elsevier
“The On Chip NUMA Architectures (OCNA) introduce a new challenge namely memory-
latency to the scheduling methods. The language run-times and libraries try to explore the …