Hunting the overlap

T Hoefler, A Lumsdaine, W Rehm - Proceedings of the 2007 ACM/IEEE …, 2007 - dl.acm.org

Collective operations and non-blocking point-to-point operations have always been part of
MPI. Although non-blocking collective operations are an obvious extension to MPI, there …

被引用次数：280 相关文章所有 37 个版本

[PDF] ethz.ch

Message progression in parallel computing-to thread or not to thread?

T Hoefler, A Lumsdaine - 2008 IEEE International Conference …, 2008 - ieeexplore.ieee.org

Message progression schemes that enable communication and computation to be
overlapped have the potential to improve the performance of parallel applications. With …

被引用次数：150 相关文章所有 30 个版本

[PDF] academia.edu

Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications

JC Sancho, KJ Barker, DJ Kerbyson… - Proceedings of the 2006 …, 2006 - dl.acm.org

The design and implementation of a high performance communication network are critical
factors in determining the performance and cost-effectiveness of a largescale computing …

被引用次数：151 相关文章所有 12 个版本

[PDF] unixer.de

Netgauge: A network performance measurement framework

T Hoefler, T Mehlan, A Lumsdaine, W Rehm - … and Communications: Third …, 2007 - Springer

This paper introduces Netgauge, an extensible open-source framework for implementing
network benchmarks. The structure of Netgauge abstracts and explicitly separates …

被引用次数：127 相关文章所有 32 个版本

[PDF] researchgate.net

An OpenCL framework for heterogeneous multicores with local memory

J Lee, J Kim, S Seo, S Kim, J Park, H Kim… - Proceedings of the 19th …, 2010 - dl.acm.org

In this paper, we present the design and implementation of an Open Computing Language
(OpenCL) framework that targets heterogeneous accelerator multicore architectures with …

被引用次数：92 相关文章所有 6 个版本

[PDF] psu.edu

CAMP: fast and efficient IP lookup architecture

S Kumar, M Becchi, P Crowley, J Turner - Proceedings of the 2006 ACM …, 2006 - dl.acm.org

A large body of research literature has focused on improving the performance of longest
prefix match IP-lookup. More recently, embedded memory based architectures have been …

被引用次数：88 相关文章所有 20 个版本

[PDF] psu.edu

Shared memory programming for large scale machines

C Barton, CĆ Casçaval, G Almási, Y Zheng… - ACM SIGPLAN …, 2006 - dl.acm.org

This paper describes the design and implementation of a scalable run-time system and an
optimizing compiler for Unified Parallel C (UPC). An experimental evaluation on …

被引用次数：82 相关文章所有 17 个版本

[PDF] psu.edu

Optimizing the Use of Static Buffers for DMA on a CELL Chip

T Chen, Z Sura, K O'Brien, JK O'Brien - … Orleans, LA, USA, November 2-4 …, 2007 - Springer

The CELL architecture has one Power Processor Element (PPE) core, and eight Synergistic
Processor Element (SPE) cores that have a distinct instruction set architecture of their own …

被引用次数：78 相关文章所有 10 个版本

[PDF] escholarship.org

Towards ultra-high resolution models of climate and weather

M Wehner, L Oliker, J Shalf - The International Journal of …, 2008 - journals.sagepub.com

We present a speculative extrapolation of the performance aspects of an atmospheric
general circulation model to ultra-high resolution and describe alternative technological …

被引用次数：68 相关文章所有 18 个版本

[PDF] unixer.de

Optimizing non-blocking collective operations for InfiniBand

T Hoefler, A Lumsdaine - 2008 IEEE International Symposium …, 2008 - ieeexplore.ieee.org

Non-blocking collective operations have recently been shown to be a promising
complementary approach for overlapping communication and computation in parallel …

被引用次数：43 相关文章所有 32 个版本

Implementation and performance analysis of non-blocking collective operations for MPI

Message progression in parallel computing-to thread or not to thread?

Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications

Netgauge: A network performance measurement framework

An OpenCL framework for heterogeneous multicores with local memory

CAMP: fast and efficient IP lookup architecture

Shared memory programming for large scale machines

Optimizing the Use of Static Buffers for DMA on a CELL Chip

Towards ultra-high resolution models of climate and weather

Optimizing non-blocking collective operations for InfiniBand

高级搜索

引用