Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches

نویسندگانBehzad Soleimani Neysiani
همایشIEEE 6th International Conference on Web Research (ICWR)
تاریخ برگزاری همایش2020-04-22 - 2020-04-23
محل برگزاری همایش1 - تهران
ارائه به نام دانشگاهعلم و فرهنگ
نوع ارائهسخنرانی
سطح همایشبین المللی

چکیده مقاله

Nowadays, there are many software repositories, especially on the web, which have many challenges to be automated. Duplicate bug report detection (DBRD) is an excellent problem of software triage systems like Bugzilla since 2004 as an essential online software repository. There are two main approaches for automatic DBRD, including information retrieval (IR)-based and machine learning (ML)-based. Many related works are using both approaches, but it is not clear which one is more useful and has better performance. This study focuses on introducing a methodology for comparing the validation performance of both approaches in a particular condition. The Android dataset is used for evaluation, and about 2 million pairs of bug reports are analyzed for 59 bug reports, which were duplicate. The results show that the ML-based approach has better validation performance, incredibly about 40%. Besides, the ML-based approach has a more reliable criterion for evaluation like accuracy, precision, and recall versus an IR-based approach, which has just mean average precision (MAP) or rank metrics.

لینک ثابت مقاله

کلید واژه ها: Duplicate Detection; Bug Report; Information Retrieval; Machine Learning;