A Novel Two-Step Classification Approach for Runtime Performance Improvement of Duplicate Bug Report Detection

نویسندگانبهزاد سلیمانی نیسیانی,سید مرتضی بابامیر
نشریهThe journal of Computer and Knowledge Engineering (CKE)
ضریب تاثیر (IF)ثبت نشده
نوع مقالهFull Paper
تاریخ انتشار2022-11-22
رتبه نشریهعلمی - پژوهشی
نوع نشریهالکترونیکی
کشور محل چاپایران
نمایه نشریهISC

چکیده مقاله

Duplicate Bug Report Detection (DBRD) is one of the famous problems in software triage systems like Bugzilla. There are two main approaches to this problem, including (1) Information Retrieval and (2) Machine Learning, in which the second one is more effective for validation performance. Duplicate Detection needs feature extraction, which is a time-consuming process. Both approaches suffer runtime issues because they should check the new bug report to all bug reports in the repository, and it takes a long time for feature extraction and duplicate detection. This study proposes a new two-step classification approach which tries to reduce the search space of the bug repository search space in the first step and then check the duplicate detection using textual features. The Mozilla and Eclipse datasets are used for experimental evaluation. The results show that overall, 87.70% and 89.01% validation performance achieved averagely for accuracy and F1-measure, respectively. In addition, 95.85% and 87.65% of bug reports can be classified in step one very fast for Eclipse and Mozilla datasets, respectively, and the other one needs textual feature extraction until it can be checked by the traditional DBRD approach. Also, an average 90% runtime improvement is achieved using the proposed method.

tags: Duplicate Detection, Bug Report, Machine Learning, Runtime Performance, Search Space Reduction