Shasta: A low overhead, software-only approach for supporting fine-grain shared memory
DJ Scales, K Gharachorloo, CA Thekkath - Proceedings of the seventh …, 1996 - dl.acm.org
This paper describes Shasta, a system that supports a shared address space in software on
clusters of computers with physically distributed memory. A unique aspect of Shasta …
clusters of computers with physically distributed memory. A unique aspect of Shasta …
Effects of communication latency, overhead, and bandwidth in a cluster architecture
This work provides a systematic study of the impact of communication performance on
parallel applications in a high performance network of workstations. We develop an …
parallel applications in a high performance network of workstations. We develop an …
Advances in the dataflow computational model
The dataflow program graph execution model, or dataflow for short, is an alternative to the
stored-program (von Neumann) execution model. Because it relies on a graph …
stored-program (von Neumann) execution model. Because it relies on a graph …
[PDF][PDF] Asynchrony in parallel computing: From dataflow to multithreading
J Silc, B Robic, T Ungerer - Parallel and Distributed Computing Practices, 1998 - Citeseer
The paper presents an overview of the parallel computing models, architectures, and
research projects that are based on asynchronous instruction scheduling. It starts with pure …
research projects that are based on asynchronous instruction scheduling. It starts with pure …
Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching
The importance of irregular applications such as graph analytics is rapidly growing with the
rise of Big Data. However, parallel graph workloads tend to perform poorly on general …
rise of Big Data. However, parallel graph workloads tend to perform poorly on general …
LoPC: modeling contention in parallel algorithms
Parallel algorithm designers need computational models that take first order system costs
into account, but are also simple enough to use in practice. This paper introduces the LoPC …
into account, but are also simple enough to use in practice. This paper introduces the LoPC …
Coherent network interfaces for fine-grain communication
Historically, processor accesses to memory-mapped device registers have been marked
uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence …
uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence …
Decoupled hardware support for distributed shared memory
SK Reinhardt, RW Pfile, DA Wood - ACM SIGARCH Computer …, 1996 - dl.acm.org
This paper investigates hardware support for fine-grain distributed shared memory (DSM) in
networks of workstations. To reduce design time and implementation cost relative to …
networks of workstations. To reduce design time and implementation cost relative to …
Polling watchdog: Combining polling and interrupts for efficient message handling
O Maquelin, GR Gao, HHJ Hum, KB Theobald… - ACM SIGARCH …, 1996 - dl.acm.org
Parallel systems supporting multithreading, or message passing in general, have typically
used either polling or interrupts to handle incoming messages. Neither approach is ideal; …
used either polling or interrupts to handle incoming messages. Neither approach is ideal; …
Compiling C for the EARTH multithreaded architecture
LJ Hendren, X Tang, Y Zhu, S Ghobrial, GR Gao… - International Journal of …, 1997 - Springer
Multithreaded architectures provide an opportunity for efficiently executing programs with
irregular parallelism and/or irregular locality. This paper presents a strategy that makes use …
irregular parallelism and/or irregular locality. This paper presents a strategy that makes use …