Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches

AuthorsBehzad Soleimani Neysiani
Conference TitleIEEE 6th International Conference on Web Research (ICWR)
Holding Date of Conference2020-04-22 - 2020-04-23
Event Place1 - تهران
Presented byعلم و فرهنگ
PresentationSPEECH
Conference LevelInternational Conferences

Abstract

Nowadays, there are many software repositories, especially on the web, which have many challenges to be automated. Duplicate bug report detection (DBRD) is an excellent problem of software triage systems like Bugzilla since 2004 as an essential online software repository. There are two main approaches for automatic DBRD, including information retrieval (IR)-based and machine learning (ML)-based. Many related works are using both approaches, but it is not clear which one is more useful and has better performance. This study focuses on introducing a methodology for comparing the validation performance of both approaches in a particular condition. The Android dataset is used for evaluation, and about 2 million pairs of bug reports are analyzed for 59 bug reports, which were duplicate. The results show that the ML-based approach has better validation performance, incredibly about 40%. Besides, the ML-based approach has a more reliable criterion for evaluation like accuracy, precision, and recall versus an IR-based approach, which has just mean average precision (MAP) or rank metrics.

Paper URL

tags: Duplicate Detection; Bug Report; Information Retrieval; Machine Learning;