Issues and challenges in the performance analysis of real disk arrays

E Varki, A Merchant, J Xu, X Qiu - IEEE Transactions on …, 2004 - ieeexplore.ieee.org
The performance modeling and analysis of disk arrays is challenging due to the presence of
multiple disks, large array caches, and sophisticated array controllers. Moreover, storage …

Understanding and optimizing asynchronous low-precision stochastic gradient descent

C De Sa, M Feldman, C Ré, K Olukotun - Proceedings of the 44th annual …, 2017 - dl.acm.org
Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in
machine learning and other domains. Since this is likely to continue for the foreseeable …

Synthetic traces for trace-driven simulation of cache memories

D Thiebaut, JL Wolf, HS Stone - IEEE Transactions on computers, 1992 - computer.org
Two techniques for producing synthetic address traces that produce good emulations of the
locality of reference of real programs are presented. The first algorithm generates synthetic …

Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks

R Bertran, A Buyuktosunoglu, MS Gupta… - 2012 45th Annual …, 2012 - ieeexplore.ieee.org
Microprocessor-based systems today are composed of multi-core, multi-threaded
processors with complex cache hierarchies and gigabytes of main memory. Accurate …

Improved automatic testcase synthesis for performance model validation

RH Bell Jr, LK John - Proceedings of the 19th annual international …, 2005 - dl.acm.org
Performance simulation tools must be validated during the design process as functional
models and early hardware are developed, so that designers can be sure of the …

Performance cloning: A technique for disseminating proprietary applications as benchmarks

A Joshi, L Eeckhout, RH Bell… - 2006 IEEE International …, 2006 - ieeexplore.ieee.org
Many embedded real world applications are intellectual property, and vendors hesitate to
share these proprietary applications with computer architects and designers. This poses a …

Synthesizing memory-level parallelism aware miniature clones for spec cpu2006 and implantbench workloads

K Ganesan, J Jo, LK John - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
We generate and provide miniature synthetic benchmark clones for modern workloads to
solve two pre-silicon design challenges, namely: 1) huge simulation time (weeks to months) …

EMISSARY: Enhanced Miss Awareness Replacement Policy for L2 Instruction Caching

NP Nagendra, BR Godala, I Chaturvedi… - Proceedings of the 50th …, 2023 - dl.acm.org
For decades, architects have designed cache replacement policies to reduce cache misses.
Since not all cache misses affect processor performance equally, researchers have also …

Synchronizing namespaces with invertible bloom filters

W Fu, HB Abraham, P Crowley - 2015 ACM/IEEE Symposium …, 2015 - ieeexplore.ieee.org
Data synchronization-long a staple in le systems-is emerging as a signicant communications
primitive. In a distributed system, data synchronization resolves di erences among …

Fast and accurate exploration of multi-level caches using hierarchical reuse distance

RKV Maeda, Q Cai, J Xu, Z Wang… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Exploring the design space of the memory hierarchy requires the use of effective
methodologies, tools, and models to evaluate different parameter values. Reuse distance is …