Shasta: A low overhead, software-only approach for supporting fine-grain shared memory

DJ Scales, K Gharachorloo, CA Thekkath - Proceedings of the seventh …, 1996 - dl.acm.org
This paper describes Shasta, a system that supports a shared address space in software on
clusters of computers with physically distributed memory. A unique aspect of Shasta …

Effects of communication latency, overhead, and bandwidth in a cluster architecture

RP Martin, AM Vahdat, DE Culler… - ACM SIGARCH Computer …, 1997 - dl.acm.org
This work provides a systematic study of the impact of communication performance on
parallel applications in a high performance network of workstations. We develop an …

Advances in the dataflow computational model

WA Najjar, EA Lee, GR Gao - Parallel computing, 1999 - Elsevier
The dataflow program graph execution model, or dataflow for short, is an alternative to the
stored-program (von Neumann) execution model. Because it relies on a graph …

[PDF][PDF] Asynchrony in parallel computing: From dataflow to multithreading

J Silc, B Robic, T Ungerer - Parallel and Distributed Computing Practices, 1998 - Citeseer
The paper presents an overview of the parallel computing models, architectures, and
research projects that are based on asynchronous instruction scheduling. It starts with pure …

Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching

D Zhang, X Ma, M Thomson, D Chiou - ACM SIGPLAN Notices, 2018 - dl.acm.org
The importance of irregular applications such as graph analytics is rapidly growing with the
rise of Big Data. However, parallel graph workloads tend to perform poorly on general …

LoPC: modeling contention in parallel algorithms

MI Frank, A Agarwal, MK Vernon - ACM SIGPLAN Notices, 1997 - dl.acm.org
Parallel algorithm designers need computational models that take first order system costs
into account, but are also simple enough to use in practice. This paper introduces the LoPC …

Coherent network interfaces for fine-grain communication

SS Mukherjee, B Falsafi, MD Hill… - ACM SIGARCH Computer …, 1996 - dl.acm.org
Historically, processor accesses to memory-mapped device registers have been marked
uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence …

Decoupled hardware support for distributed shared memory

SK Reinhardt, RW Pfile, DA Wood - ACM SIGARCH Computer …, 1996 - dl.acm.org
This paper investigates hardware support for fine-grain distributed shared memory (DSM) in
networks of workstations. To reduce design time and implementation cost relative to …

Polling watchdog: Combining polling and interrupts for efficient message handling

O Maquelin, GR Gao, HHJ Hum, KB Theobald… - ACM SIGARCH …, 1996 - dl.acm.org
Parallel systems supporting multithreading, or message passing in general, have typically
used either polling or interrupts to handle incoming messages. Neither approach is ideal; …

Compiling C for the EARTH multithreaded architecture

LJ Hendren, X Tang, Y Zhu, S Ghobrial, GR Gao… - International Journal of …, 1997 - Springer
Multithreaded architectures provide an opportunity for efficiently executing programs with
irregular parallelism and/or irregular locality. This paper presents a strategy that makes use …