Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches

Behzad Soleimani Neysiani

نویسندگان	Behzad Soleimani Neysiani
همایش	IEEE 6th International Conference on Web Research (ICWR)
تاریخ برگزاری همایش	2020-04-22 - 2020-04-23
محل برگزاری همایش	1 - تهران
ارائه به نام دانشگاه	علم و فرهنگ
نوع ارائه	سخنرانی
سطح همایش	بین المللی

چکیده مقاله

Nowadays, there are many software repositories, especially on the web, which have many challenges to be automated. Duplicate bug report detection (DBRD) is an excellent problem of software triage systems like Bugzilla since 2004 as an essential online software repository. There are two main approaches for automatic DBRD, including information retrieval (IR)-based and machine learning (ML)-based. Many related works are using both approaches, but it is not clear which one is more useful and has better performance. This study focuses on introducing a methodology for comparing the validation performance of both approaches in a particular condition. The Android dataset is used for evaluation, and about 2 million pairs of bug reports are analyzed for 59 bug reports, which were duplicate. The results show that the ML-based approach has better validation performance, incredibly about 40%. Besides, the ML-based approach has a more reliable criterion for evaluation like accuracy, precision, and recall versus an IR-based approach, which has just mean average precision (MAP) or rank metrics.

لینک ثابت مقاله

سیدمرتضی بابامیر

استاد سیدمرتضی بابامیر

My affiliation

Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches

چکیده مقاله