Performance of SSE and AVX instruction sets

W Bertiger, Y Bar-Sever, A Dorsey, B Haines… - Advances in space …, 2020 - Elsevier

Abstract GipsyX/RTGx is the Jet Propulsion Laboratory's (JPL) next generation software
package for positioning, navigation, timing, and Earth science using measurements from …

被引用次数：320 相关文章所有 8 个版本

[HTML] diva-portal.org

Parallel computing of support vector machines: a survey

S Tavara - ACM Computing Surveys (CSUR), 2019 - dl.acm.org

The immense amount of data created by digitalization requires parallel computing for
machine-learning methods. While there are many parallel implementations for support …

被引用次数：61 相关文章所有 3 个版本

[PDF] anu.edu.au

Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms

G Mitra, B Johnston, AP Rendell… - … on Parallel & …, 2013 - ieeexplore.ieee.org

Augmenting a processor with special hardware that is able to apply a Single Instruction to
Multiple Data (SIMD) at the same time is a cost effective way of improving processor …

被引用次数：134 相关文章所有 15 个版本

[PDF] github.io

Bitflow: Exploiting vector parallelism for binary neural networks on cpu

Y Hu, J Zhai, D Li, Y Gong, Y Zhu… - 2018 IEEE …, 2018 - ieeexplore.ieee.org

Deep learning has revolutionized computer vision and other fields since its big bang in
2012. However, it is challenging to deploy Deep Neural Networks (DNNs) into real-world …

被引用次数：50 相关文章所有 4 个版本

[PDF] acm.org Full View

COX: Exposing CUDA warp-level functions to CPUs

R Han, J Lee, J Sim, H Kim - ACM Transactions on Architecture and …, 2022 - dl.acm.org

As CUDA becomes the de facto programming language among data parallel applications
such as high-performance computing or machine learning applications, running CUDA on …

被引用次数：9 相关文章所有 2 个版本

[PDF] acm.org Full View

Highly Efficient Self-Checking Matrix Multiplication on Tiled AMX Accelerators

CS Mummidi, VC Ferreira, S Srinivasan… - ACM Transactions on …, 2024 - dl.acm.org

General Matrix Multiplication (GEMM) is a computationally expensive operation that is used
in many applications such as machine learning. Hardware accelerators are increasingly …

被引用次数：4 相关文章

[PDF] siam.org

Multi-core k-means

C Böhm, M Perdacher, C Plant - Proceedings of the 2017 SIAM International …, 2017 - SIAM

Today's microprocessors consist of multiple cores each of which can perform multiple
additions, multiplications, or other operations simultaneously in one clock cycle. To …

被引用次数：25 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Optimizing matrix-matrix multiplication on intel's advanced vector extensions multicore processor

AM Hemeida, SA Hassan, S Alkhalaf… - Ain Shams Engineering …, 2020 - Elsevier

This paper is focused on Intel Advanced Vector Extension (AVX) which has been borne of
the modern developments in AMD processors and Intel itself. Said prescript processes a …

被引用次数：18 相关文章所有 2 个版本

ENIGMA: Low-latency and privacy-preserving edge inference on heterogeneous neural network accelerators

Q Li, J Ren, X Pan, Y Zhou… - 2022 IEEE 42nd …, 2022 - ieeexplore.ieee.org

Time-efficient artificial intelligence (AI) service has recently witnessed increasing interest
from academia and industry due to the urgent needs in massive smart applications such as …

被引用次数：7 相关文章所有 2 个版本

Performance evaluation of matrix-matrix multiplications using Intel's advanced vector extensions (AVX)

SA Hassan, AM Hemeida, MMM Mahmoud - Microprocessors and …, 2016 - Elsevier

Abstract Intel's Advanced Vector Extensions is known as single instruction multiple data
streams (SIMD), and the instruction sets is introduced in the second-generation Intel Core …

被引用次数：23 相关文章所有 3 个版本