[HTML][HTML] Strategies and principles of distributed machine learning on big data

EP Xing, Q Ho, P Xie, D Wei - Engineering, 2016 - Elsevier
The rise of big data has led to new demands for machine learning (ML) systems to learn
complex models, with millions to billions of parameters, that promise adequate capacity to …

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

H Jelodar, Y Wang, C Yuan, X Feng, X Jiang… - Multimedia tools and …, 2019 - Springer
Topic modeling is one of the most powerful techniques in text mining for data mining, latent
data discovery, and finding relationships among data and text documents. Researchers …

Distributionally robust language modeling

Y Oren, S Sagawa, TB Hashimoto, P Liang - arXiv preprint arXiv …, 2019 - arxiv.org
Language models are generally trained on data spanning a wide range of topics (eg, news,
reviews, fiction), but they might be applied to an a priori unknown target distribution (eg …

Pipedream: Fast and efficient pipeline parallel dnn training

A Harlap, D Narayanan, A Phanishayee… - arXiv preprint arXiv …, 2018 - arxiv.org
PipeDream is a Deep Neural Network (DNN) training system for GPUs that parallelizes
computation by pipelining execution across multiple machines. Its pipeline parallel …

Petuum: A new platform for distributed machine learning on big data

EP Xing, Q Ho, W Dai, JK Kim, J Wei, S Lee… - Proceedings of the 21th …, 2015 - dl.acm.org
How can one build a distributed framework that allows efficient deployment of a wide
spectrum of modern advanced machine learning (ML) programs for industrial-scale …

Federated latent dirichlet allocation: A local differential privacy based framework

Y Wang, Y Tong, D Shi - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org
Abstract Latent Dirichlet Allocation (LDA) is a widely adopted topic model for industrial-
grade text mining applications. However, its performance heavily relies on the collection of …

[PDF][PDF] Docchat: An information retrieval approach for chatbot engines using unstructured documents

Z Yan, N Duan, J Bao, P Chen, M Zhou… - Proceedings of the …, 2016 - aclanthology.org
Most current chatbot engines are designed to reply to user utterances based on existing
utterance-response (or QR) 1 pairs. In this paper, we present DocChat, a novel information …

Heterogeneous latent topic discovery for semantic text mining

Y Li, D Jiang, R Lian, X Wu, C Tan… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In order to mine latent semantics from text data, word embedding and topic modeling are two
major methodologies in the industry. From a pragmatic perspective, each of these two lines …

[PDF][PDF] 分布式训练系统及其优化算法综述

王恩东, 闫瑞栋, 郭振华, 赵雅倩 - 计算机学报, 2024 - cjc.ict.ac.cn
摘要人工智能利用各种优化技术从海量训练样本中学习关键特征或知识以提高解的质量,
这对训练方法提出了更高要求. 然而, 传统单机训练无法满足存储与计算性能等方面的需求 …

Toward understanding the impact of staleness in distributed machine learning

W Dai, Y Zhou, N Dong, H Zhang, EP Xing - arXiv preprint arXiv …, 2018 - arxiv.org
Many distributed machine learning (ML) systems adopt the non-synchronous execution in
order to alleviate the network communication bottleneck, resulting in stale parameters that …