Ginex: Ssd-enabled billion-scale graph neural network training on a single machine via provably optimal in-memory caching

Y Park, S Min, JW Lee - arXiv preprint arXiv:2208.09151, 2022 - arxiv.org
Recently, Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool
that can effectively serve various inference tasks on graph structured data. As the size of real …

Tdgraph: a topology-driven accelerator for high-performance streaming graph processing

J Zhao, Y Yang, Y Zhang, X Liao, L Gu, L He… - Proceedings of the 49th …, 2022 - dl.acm.org
Many solutions have been recently proposed to support the processing of streaming graphs.
However, for the processing of each graph snapshot of a streaming graph, the new states of …

Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement

BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org
Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …

Innersp: A memory efficient sparse matrix multiplication accelerator with locality-aware inner product processing

D Baek, S Hwang, T Heo, D Kim… - 2021 30th International …, 2021 - ieeexplore.ieee.org
Sparse matrix multiplication is one of the key computational kernels in large-scale data
analytics. However, a naive implementation suffers from the overheads of irregular memory …

Dedicated hardware accelerators for processing of sparse matrices and vectors: a survey

V Isaac–Chassande, A Evans, Y Durand… - ACM Transactions on …, 2024 - dl.acm.org
Performance in scientific and engineering applications such as computational physics,
algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the …

A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering

AV Jamet, G Vavouliotis, DA Jiménez… - … Symposium on High …, 2024 - ieeexplore.ieee.org
To alleviate the performance and energy overheads of contemporary applications with large
data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism …

Tcor: a tile cache with optimal replacement

D Joseph, JL Aragón, JM Parcerisa… - … Symposium on High …, 2022 - ieeexplore.ieee.org
Cache Replacement Policies are known to have an important impact on hit rates. The OPT
replacement policy [27] has been formally proven as optimal for minimizing misses. Due to …

CARE: A concurrency-aware enhanced lightweight cache management framework

X Lu, R Wang, XH Sun - 2023 IEEE International Symposium …, 2023 - ieeexplore.ieee.org
Improving cache performance is a lasting research topic. While utilizing data locality to
enhance cache performance becomes more and more difficult, data access concurrency …

LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing

J Zhao, Y Zhang, X Liao, L He, B He, H Jin… - Proceedings of the …, 2021 - dl.acm.org
In modern data centers, massive concurrent graph processing jobs are being processed on
large graphs. However, existing hardware/-software solutions suffer from irregular graph …

ECG: Expressing Locality and Prefetching for Optimal Caching in Graph Structures

AT Mughrabi, M Baradaran, A Samara… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
Despite state-of-the-art caching strategies, graph analytics pose a significant challenge for
prefetching and replacement policies, as their access patterns are often random with low …