Evaluation of the intel® core™ i7 turbo boost feature

J Charles, P Jassi, NS Ananth, A Sadat… - 2009 IEEE …, 2009 - ieeexplore.ieee.org
The Intel® Core™ i7 processor code named Nehalem has a novel feature called Turbo
Boost which dynamically varies the frequencies of the processor's cores. The frequency of a …

Weak molecular interactions studied with parallel implementations of the local pair natural orbital coupled pair and coupled cluster methods

DG Liakos, A Hansen, F Neese - Journal of Chemical Theory and …, 2011 - ACS Publications
A parallel implementation of the recently developed local pair natural orbital coupled
electron pair approximation (LPNO-CEPA/n, n= Version 1, 2, or 3) and the corresponding …

Didi: Mitigating the performance impact of tlb shootdowns using a shared tlb directory

C Villavieja, V Karakostas, L Vilanova… - 2011 International …, 2011 - ieeexplore.ieee.org
Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to
cache virtual-to-physical mappings and, as they are looked up on every memory access, are …

Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization

G Wellein, G Hager, T Zeiser… - 2009 33rd Annual …, 2009 - ieeexplore.ieee.org
We present a pipelined wavefront parallelization approach for stencil-based computations.
Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a …

Graph coloring algorithms for multi-core and massively multithreaded architectures

ÜV Çatalyürek, J Feo, AH Gebremedhin… - Parallel Computing, 2012 - Elsevier
We explore the interplay between architectures and algorithm design in the context of
shared-memory platforms and a specific graph problem of central importance in scientific …

Massive streaming data analytics: A case study with clustering coefficients

D Ediger, K Jiang, J Riedy… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
We present a new approach for parallel massive graph analysis of streaming, temporal data
with a dynamic and extensible representation. Handling the constant stream of new data …

An analysis of core-and chip-level architectural features in four generations of intel server processors

J Hofmann, G Hager, G Wellein, D Fey - High Performance Computing …, 2017 - Springer
This paper presents a survey of architectural features among four generations of Intel server
processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance …

Evaluating thread placement based on memory access patterns for multi-core processors

M Diener, F Madruga, E Rodrigues… - 2010 IEEE 12th …, 2010 - ieeexplore.ieee.org
Process placement is a technique widely used on parallel machines with heterogeneous
interconnects to reduce the overall communication time. For instance, two processes which …

Exploiting compression opportunities to improve SpMxV performance on shared memory systems

K Kourtis, G Goumas, N Koziris - ACM Transactions on Architecture and …, 2010 - dl.acm.org
The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared
memory systems, due to the streaming nature of its data access pattern. To decrease …

Benchmarking data and compute intensive applications on modern CPU and GPU architectures

M Ciżnicki, M Kierzynka, P Kopta, K Kurowski… - Procedia Computer …, 2012 - Elsevier
The use of graphics hardware for non-graphics applications has become popular among
many scientific programmers and researchers as we have observed a higher rate of …