作者
Girish Maskeri, Santonu Sarkar, Kenneth Heafield
发表日期
2008/2/19
图书
Proceedings of the 1st India software engineering conference
页码范围
113-120
简介
One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionality of the system. Latent Dirichlet Allocation (LDA), a statistical model, has emerged as a popular technique for discovering topics in large text document corpus. But its applicability in extracting business domain topics from source code has not been explored so far. This paper investigates LDA in the context of comprehending large software systems and proposes a human assisted approachbased on LDA for extracting domain topics from source code. This method has been applied on a number of open source and proprietary systems. Preliminary results indicate that LDA is able to identify some of the domain topics …
引用总数
200820092010201120122013201420152016201720182019202020212022202320241721161817202127242471714646
学术搜索中的文章
G Maskeri, S Sarkar, K Heafield - Proceedings of the 1st India software engineering …, 2008