Parallel distributed breadth first search on the Kepler architecture

M Bisson, M Bernaschi… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
We present the results obtained by using an evolution of our CUDA-based solution for the
exploration, via a breadth first search, of large graphs. This latest version exploits at its best …

Low-span parallel algorithms for the binary-forking model

Z Ahmad, R Chowdhury, R Das, P Ganapathi… - Proceedings of the 33rd …, 2021 - dl.acm.org
The binary-forking model is a parallel computation model, formally defined by Blelloch et al.,
in which a thread can fork a concurrent child thread, recursively and asynchronously. The …

An OpenMP‐based breadth‐first search implementation using the bag data structure

SL Gonzaga de Oliveira, MI Santana… - Concurrency and …, 2024 - Wiley Online Library
The breadth‐first search procedure is an algorithm that traverses the vertices of a graph,
determining the distance from each vertex to the initial vertex. The distance is infinite for a …

Engineering high-performance parallel algorithms with applications to bioinformatics

JJ Tithi - 2015 - search.proquest.com
Since the beginning of the last decade, plateauing of the clock speed of computer
processors has forced us to invest more in parallelism—for both hardware and software …

[PDF][PDF] 面向众核密码处理器的高效负载均衡技术

戴紫彬, 尹安琪, 曲彤洲, 南龙梅 - 电子与信息学报, 2019 - jeit.ac.cn
工作负载分配不均是制约众核密码平台资源利用率提高的重要因素, 动态负载分配可提高平台
资源利用率, 但具有一定开销; 所以更高的负载均衡频率并不一定带来更高的负载均衡增益 …

Low-Depth Parallel Algorithms for the Binary-Forking Model without Atomics

Z Ahmad, R Chowdhury, R Das, P Ganapathi… - arXiv preprint arXiv …, 2020 - arxiv.org
The binary-forking model is a parallel computation model, formally defined by Blelloch et al.
very recently, in which a thread can fork a concurrent child thread, recursively and …

Programming models for many-core architectures: a co-design approach

JH Rutgers - 2014 - research.utwente.nl
Common many-core processors contain tens of cores and distributed memory. Compared to
a multicore system, which only has a few tightly coupled cores sharing a single bus and …

Uma implementação da busca em largura com estrutura bag e OpenMP

SLG de Oliveira, MI Santana… - Simpósio em …, 2021 - proceedings-sol.sbc.org.br
Neste artigo, são mostrados resultados de uma re-implementação da busca em largura na
linguagem C++ com estrutura bag e interface OpenMP. A implementação é baseada em …

Programming a Multicore Architecture without Coherency and Atomic Operations

JH Rutgers, MJG Bekooij, GJM Smit - Proceedings of Programming …, 2014 - dl.acm.org
It is hard to reason about the state of a multicore system-on-chip, because operations on
memory need multiple cycles to complete, since cores communicate via an interconnect like …

Efficient Workload Balance Technology on Many-core Crypto Processor

Z DAI, A YIN, T QU, L NAN - 电子与信息学报, 2019 - jeit.ac.cn
Imbalanced workload distribution results in low resource utilization of many-core crypto-
platform. Dynamic workload allocation can improve the resource utilization with some …