Ginex: Ssd-enabled billion-scale graph neural network training on a single machine via provably optimal in-memory caching
Recently, Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool
that can effectively serve various inference tasks on graph structured data. As the size of real …
that can effectively serve various inference tasks on graph structured data. As the size of real …
Tdgraph: a topology-driven accelerator for high-performance streaming graph processing
Many solutions have been recently proposed to support the processing of streaming graphs.
However, for the processing of each graph snapshot of a streaming graph, the new states of …
However, for the processing of each graph snapshot of a streaming graph, the new states of …
Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement
BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org
Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …
Software's inability to observe and respond to data movement is the root cause of many …
Innersp: A memory efficient sparse matrix multiplication accelerator with locality-aware inner product processing
Sparse matrix multiplication is one of the key computational kernels in large-scale data
analytics. However, a naive implementation suffers from the overheads of irregular memory …
analytics. However, a naive implementation suffers from the overheads of irregular memory …
Dedicated hardware accelerators for processing of sparse matrices and vectors: a survey
V Isaac–Chassande, A Evans, Y Durand… - ACM Transactions on …, 2024 - dl.acm.org
Performance in scientific and engineering applications such as computational physics,
algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the …
algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the …
A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering
To alleviate the performance and energy overheads of contemporary applications with large
data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism …
data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism …
Tcor: a tile cache with optimal replacement
Cache Replacement Policies are known to have an important impact on hit rates. The OPT
replacement policy [27] has been formally proven as optimal for minimizing misses. Due to …
replacement policy [27] has been formally proven as optimal for minimizing misses. Due to …
CARE: A concurrency-aware enhanced lightweight cache management framework
Improving cache performance is a lasting research topic. While utilizing data locality to
enhance cache performance becomes more and more difficult, data access concurrency …
enhance cache performance becomes more and more difficult, data access concurrency …
LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing
In modern data centers, massive concurrent graph processing jobs are being processed on
large graphs. However, existing hardware/-software solutions suffer from irregular graph …
large graphs. However, existing hardware/-software solutions suffer from irregular graph …
ECG: Expressing Locality and Prefetching for Optimal Caching in Graph Structures
Despite state-of-the-art caching strategies, graph analytics pose a significant challenge for
prefetching and replacement policies, as their access patterns are often random with low …
prefetching and replacement policies, as their access patterns are often random with low …