[HTML][HTML] GipsyX/RTGx, a new tool set for space geodetic operations and research

W Bertiger, Y Bar-Sever, A Dorsey, B Haines… - Advances in space …, 2020 - Elsevier
Abstract GipsyX/RTGx is the Jet Propulsion Laboratory's (JPL) next generation software
package for positioning, navigation, timing, and Earth science using measurements from …

Parallel computing of support vector machines: a survey

S Tavara - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
The immense amount of data created by digitalization requires parallel computing for
machine-learning methods. While there are many parallel implementations for support …

Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms

G Mitra, B Johnston, AP Rendell… - … on Parallel & …, 2013 - ieeexplore.ieee.org
Augmenting a processor with special hardware that is able to apply a Single Instruction to
Multiple Data (SIMD) at the same time is a cost effective way of improving processor …

Bitflow: Exploiting vector parallelism for binary neural networks on cpu

Y Hu, J Zhai, D Li, Y Gong, Y Zhu… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
Deep learning has revolutionized computer vision and other fields since its big bang in
2012. However, it is challenging to deploy Deep Neural Networks (DNNs) into real-world …

COX: Exposing CUDA warp-level functions to CPUs

R Han, J Lee, J Sim, H Kim - ACM Transactions on Architecture and …, 2022 - dl.acm.org
As CUDA becomes the de facto programming language among data parallel applications
such as high-performance computing or machine learning applications, running CUDA on …

Highly Efficient Self-Checking Matrix Multiplication on Tiled AMX Accelerators

CS Mummidi, VC Ferreira, S Srinivasan… - ACM Transactions on …, 2024 - dl.acm.org
General Matrix Multiplication (GEMM) is a computationally expensive operation that is used
in many applications such as machine learning. Hardware accelerators are increasingly …

Multi-core k-means

C Böhm, M Perdacher, C Plant - Proceedings of the 2017 SIAM International …, 2017 - SIAM
Today's microprocessors consist of multiple cores each of which can perform multiple
additions, multiplications, or other operations simultaneously in one clock cycle. To …

[HTML][HTML] Optimizing matrix-matrix multiplication on intel's advanced vector extensions multicore processor

AM Hemeida, SA Hassan, S Alkhalaf… - Ain Shams Engineering …, 2020 - Elsevier
This paper is focused on Intel Advanced Vector Extension (AVX) which has been borne of
the modern developments in AMD processors and Intel itself. Said prescript processes a …

ENIGMA: Low-latency and privacy-preserving edge inference on heterogeneous neural network accelerators

Q Li, J Ren, X Pan, Y Zhou… - 2022 IEEE 42nd …, 2022 - ieeexplore.ieee.org
Time-efficient artificial intelligence (AI) service has recently witnessed increasing interest
from academia and industry due to the urgent needs in massive smart applications such as …

Performance evaluation of matrix-matrix multiplications using Intel's advanced vector extensions (AVX)

SA Hassan, AM Hemeida, MMM Mahmoud - Microprocessors and …, 2016 - Elsevier
Abstract Intel's Advanced Vector Extensions is known as single instruction multiple data
streams (SIMD), and the instruction sets is introduced in the second-generation Intel Core …