Topic discovery from heterogeneous texts
2016 IEEE 28th International Conference on Tools with Artificial …, 2016•ieeexplore.ieee.org
Recently many topic models such as Latent Dirich-let Allocation (LDA) have made important
progress towards generating high-level knowledge from a large corpus. They assume that a
text consists of a mixture of topics, which is usually the case for regular articles but may not
hold for a short text that usually contains only one topic. In practice, a corpus may include
both short texts and long texts, in this case neither methods developed for only long texts nor
methods for only short texts can generate satisfying results. In this paper, we present an …
progress towards generating high-level knowledge from a large corpus. They assume that a
text consists of a mixture of topics, which is usually the case for regular articles but may not
hold for a short text that usually contains only one topic. In practice, a corpus may include
both short texts and long texts, in this case neither methods developed for only long texts nor
methods for only short texts can generate satisfying results. In this paper, we present an …
Recently many topic models such as Latent Dirich-let Allocation (LDA) have made important progress towards generating high-level knowledge from a large corpus. They assume that a text consists of a mixture of topics, which is usually the case for regular articles but may not hold for a short text that usually contains only one topic. In practice, a corpus may include both short texts and long texts, in this case neither methods developed for only long texts nor methods for only short texts can generate satisfying results. In this paper, we present an innovative method to discover latent topics from a heterogeneous corpus including both long and short texts. A new topic model based on collapsed Gibbs sampling algorithm is developed for modeling such heterogeneous texts. The experiments on real-world datasets validate the effectiveness of the proposed model in comparison with other state-of-the-art models.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果