Coalescing adjacent gather/scatter operations
AT Forsyth, BJ Hickmann, JC Hall… - US Patent 9,348,601, 2016 - Google Patents
According to one embodiment, a processor includes an instruction decoder to decode a first
instruction to gather data elements from memory, the first instruction having a first operand …
instruction to gather data elements from memory, the first instruction having a first operand …
No-locality hint vector memory access processors, methods, systems, and instructions
CJ Hughes - US Patent 9,600,442, 2017 - Google Patents
(57) ABSTRACT A processor of an aspect includes a plurality of packed data registers, and
a decode unit to decode a no-locality hint vector memory access instruction. The no-locality …
a decode unit to decode a no-locality hint vector memory access instruction. The no-locality …
Scatter using index array and finite state machine
Z Sperber, R Valentine, S Raikin… - US Patent …, 2017 - Google Patents
Methods and apparatus are disclosed using an index array and finite state machine for
scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode …
scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode …
Scatter/gather accessing multiple cache lines in a single cache port
JC Hall, S Kottapalli, AT Forsyth - US Patent App. 13/250,223, 2012 - Google Patents
0002 Modern processors often include instructions to provide operations that are
computationally intensive, but offer a high level of data parallelism that can be exploited …
computationally intensive, but offer a high level of data parallelism that can be exploited …
Transposition operation device, integrated circuit for the same, and transposition method
T Nishimura, H Morishita - US Patent 9,201,899, 2015 - Google Patents
(57) ABSTRACT A transposition operation device includes: a register group storing a matrix
of data Such that elements are readable one at a time; an output data rearrangement unit …
of data Such that elements are readable one at a time; an output data rearrangement unit …
Hardware prefetcher for indirect access patterns
Two techniques address bottlenecking in processors. The first is indirect prefetching. The
technique can be especially useful for graph analytics and sparse matrix applications. For …
technique can be especially useful for graph analytics and sparse matrix applications. For …
Facilitating efficient prefetching for scatter/gather operations
S Kapil, DJ Gove - US Patent 9,817,762, 2017 - Google Patents
The disclosed embodiments relate to a computing system that facilitates performing
prefetching for scatter/gather operations. During operation, the system receives a …
prefetching for scatter/gather operations. During operation, the system receives a …
Gathering and scattering multiple data elements
According to a first aspect, efficient data transfer operations can be achieved by: decoding
by a processor device, a single instruction specifying a transfer operation for a plurality of …
by a processor device, a single instruction specifying a transfer operation for a plurality of …
No-locality hint vector memory access processors, methods, systems, and instructions
CJ Hughes - US Patent 10,210,091, 2019 - Google Patents
(57) ABSTRACT A processor of an aspect includes a plurality of packed data registers, and
a decode unit to decode a no-locality hint vector memory access instruction. The no-locality …
a decode unit to decode a no-locality hint vector memory access instruction. The no-locality …
Systems, apparatuses, and methods for performing a conversion of a writemask register to a list of index values in a vector register
E Ould-Ahmed-Vall, T Willhalm, GT Drysdale - US Patent 9,454,507, 2016 - Google Patents
Embodiments of systems, apparatuses, and methods for performing in a computer processor
conversion of a mask register into a list of index values in response to a single vector packed …
conversion of a mask register into a list of index values in response to a single vector packed …