Evaluation of the intel® core™ i7 turbo boost feature
J Charles, P Jassi, NS Ananth, A Sadat… - 2009 IEEE …, 2009 - ieeexplore.ieee.org
The Intel® Core™ i7 processor code named Nehalem has a novel feature called Turbo
Boost which dynamically varies the frequencies of the processor's cores. The frequency of a …
Boost which dynamically varies the frequencies of the processor's cores. The frequency of a …
Weak molecular interactions studied with parallel implementations of the local pair natural orbital coupled pair and coupled cluster methods
A parallel implementation of the recently developed local pair natural orbital coupled
electron pair approximation (LPNO-CEPA/n, n= Version 1, 2, or 3) and the corresponding …
electron pair approximation (LPNO-CEPA/n, n= Version 1, 2, or 3) and the corresponding …
Didi: Mitigating the performance impact of tlb shootdowns using a shared tlb directory
Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to
cache virtual-to-physical mappings and, as they are looked up on every memory access, are …
cache virtual-to-physical mappings and, as they are looked up on every memory access, are …
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
We present a pipelined wavefront parallelization approach for stencil-based computations.
Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a …
Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a …
Graph coloring algorithms for multi-core and massively multithreaded architectures
We explore the interplay between architectures and algorithm design in the context of
shared-memory platforms and a specific graph problem of central importance in scientific …
shared-memory platforms and a specific graph problem of central importance in scientific …
Massive streaming data analytics: A case study with clustering coefficients
D Ediger, K Jiang, J Riedy… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
We present a new approach for parallel massive graph analysis of streaming, temporal data
with a dynamic and extensible representation. Handling the constant stream of new data …
with a dynamic and extensible representation. Handling the constant stream of new data …
An analysis of core-and chip-level architectural features in four generations of intel server processors
This paper presents a survey of architectural features among four generations of Intel server
processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance …
processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance …
Evaluating thread placement based on memory access patterns for multi-core processors
M Diener, F Madruga, E Rodrigues… - 2010 IEEE 12th …, 2010 - ieeexplore.ieee.org
Process placement is a technique widely used on parallel machines with heterogeneous
interconnects to reduce the overall communication time. For instance, two processes which …
interconnects to reduce the overall communication time. For instance, two processes which …
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared
memory systems, due to the streaming nature of its data access pattern. To decrease …
memory systems, due to the streaming nature of its data access pattern. To decrease …
Benchmarking data and compute intensive applications on modern CPU and GPU architectures
The use of graphics hardware for non-graphics applications has become popular among
many scientific programmers and researchers as we have observed a higher rate of …
many scientific programmers and researchers as we have observed a higher rate of …