On optimizing distributed non-negative tucker decomposition
Proceedings of the ACM International Conference on Supercomputing, 2019•dl.acm.org
The Tucker decomposition generalizes singular value decomposition (SVD) to high
dimensional tensors. It factorizes a given N-dimensional tensor as the product of a small
core tensor and a set of N factor matrices. Non-negative Tucker Decomposition (NTD) is a
variant that imposes the constraint that the entries of the core and the factor matrices must be
non-negative. Generalizing a classical algorithm from the domain of non-negative matrix
factorization, Mørup et al.[19] designed a procedure for NTD via the multiplicative weight …
dimensional tensors. It factorizes a given N-dimensional tensor as the product of a small
core tensor and a set of N factor matrices. Non-negative Tucker Decomposition (NTD) is a
variant that imposes the constraint that the entries of the core and the factor matrices must be
non-negative. Generalizing a classical algorithm from the domain of non-negative matrix
factorization, Mørup et al.[19] designed a procedure for NTD via the multiplicative weight …
The Tucker decomposition generalizes singular value decomposition (SVD) to high dimensional tensors. It factorizes a given N-dimensional tensor as the product of a small core tensor and a set of N factor matrices. Non-negative Tucker Decomposition (NTD) is a variant that imposes the constraint that the entries of the core and the factor matrices must be non-negative. Generalizing a classical algorithm from the domain of non-negative matrix factorization, Mørup et al. [19] designed a procedure for NTD via the multiplicative weight update paradigm. Based on the above procedure, we present a distributed implementation of NTD for sparse tensors. We develop three algorithms for efficiently executing the procedure. The first is a baseline algorithm that adapts strategies from prior work on the Tucker decomposition. The other two are improved algorithms that are optimized based on properties unique to the NTD procedure. We present an experimental evaluation on a benchmark of large real-life tensors on a system with 32 to 512 MPI ranks. The study shows that the optimized algorithms outperform the baseline by a factor of up to 6x in execution time. The distributed implementation scales well with speedup up to 12x (as against an ideal factor of 16x).
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果