作者
Shivani Rao, Avinash Kak
发表日期
2011/5/21
图书
Proceedings of the 8th Working Conference on Mining Software Repositories
页码范围
43-52
简介
From the standpoint of retrieval from large software libraries for the purpose of bug localization, we compare five generic text models and certain composite variations thereof. The generic models are: the Unigram Model (UM), the Vector Space Model (VSM), the Latent Semantic Analysis Model (LSA), the Latent Dirichlet Allocation Model (LDA), and the Cluster Based Document Model (CBDM). The task is to locate the files that are relevant to a bug reported in the form of a textual description by a software developer. We use for our study iBUGS, a benchmarked bug localization dataset with 75 KLOC and a large number of bugs (291). A major conclusion of our comparative study is that simple text models such as UM and VSM are more effective at correctly retrieving the relevant files from a library as compared to the more sophisticated models such as LDA. The retrieval effectiveness for the various models was …
引用总数
2011201220132014201520162017201820192020202120222023202421227402837374230262622246