A performance evaluation of the Nehalem quad-core processor for scientific computing

J Charles, P Jassi, NS Ananth, A Sadat… - 2009 IEEE …, 2009 - ieeexplore.ieee.org

The Intel® Core™ i7 processor code named Nehalem has a novel feature called Turbo
Boost which dynamically varies the frequencies of the processor's cores. The frequency of a …

被引用次数：238 相关文章所有 11 个版本

Weak molecular interactions studied with parallel implementations of the local pair natural orbital coupled pair and coupled cluster methods

DG Liakos, A Hansen, F Neese - Journal of Chemical Theory and …, 2011 - ACS Publications

A parallel implementation of the recently developed local pair natural orbital coupled
electron pair approximation (LPNO-CEPA/n, n= Version 1, 2, or 3) and the corresponding …

被引用次数：170 相关文章所有 6 个版本

[PDF] ic.ac.uk

Didi: Mitigating the performance impact of tlb shootdowns using a shared tlb directory

C Villavieja, V Karakostas, L Vilanova… - 2011 International …, 2011 - ieeexplore.ieee.org

Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to
cache virtual-to-physical mappings and, as they are looked up on every memory access, are …

被引用次数：156 相关文章所有 12 个版本

[PDF] researchgate.net

Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization

G Wellein, G Hager, T Zeiser… - 2009 33rd Annual …, 2009 - ieeexplore.ieee.org

We present a pipelined wavefront parallelization approach for stencil-based computations.
Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a …

被引用次数：172 相关文章所有 7 个版本

[PDF] arxiv.org

Graph coloring algorithms for multi-core and massively multithreaded architectures

ÜV Çatalyürek, J Feo, AH Gebremedhin… - Parallel Computing, 2012 - Elsevier

We explore the interplay between architectures and algorithm design in the context of
shared-memory platforms and a specific graph problem of central importance in scientific …

被引用次数：110 相关文章所有 11 个版本

[PDF] sonic.net

Massive streaming data analytics: A case study with clustering coefficients

D Ediger, K Jiang, J Riedy… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org

We present a new approach for parallel massive graph analysis of streaming, temporal data
with a dynamic and extensible representation. Handling the constant stream of new data …

被引用次数：102 相关文章所有 15 个版本

[PDF] arxiv.org

An analysis of core-and chip-level architectural features in four generations of intel server processors

J Hofmann, G Hager, G Wellein, D Fey - High Performance Computing …, 2017 - Springer

This paper presents a survey of architectural features among four generations of Intel server
processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance …

被引用次数：32 相关文章所有 4 个版本

[PDF] academia.edu

Evaluating thread placement based on memory access patterns for multi-core processors

M Diener, F Madruga, E Rodrigues… - 2010 IEEE 12th …, 2010 - ieeexplore.ieee.org

Process placement is a technique widely used on parallel machines with heterogeneous
interconnects to reduce the overall communication time. For instance, two processes which …

被引用次数：47 相关文章所有 13 个版本

[PDF] acm.org

Exploiting compression opportunities to improve SpMxV performance on shared memory systems

K Kourtis, G Goumas, N Koziris - ACM Transactions on Architecture and …, 2010 - dl.acm.org

The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared
memory systems, due to the streaming nature of its data access pattern. To decrease …

被引用次数：34 相关文章所有 25 个版本

[PDF] sciencedirect.com

Benchmarking data and compute intensive applications on modern CPU and GPU architectures

M Ciżnicki, M Kierzynka, P Kopta, K Kurowski… - Procedia Computer …, 2012 - Elsevier

The use of graphics hardware for non-graphics applications has become popular among
many scientific programmers and researchers as we have observed a higher rate of …

被引用次数：25 相关文章所有 6 个版本