Fast Duplicate Bug Reports Detector Training using Sampling for Dimension Reduction

نویسندگانBehzad Soleimani Neysiani,Saeed Doostali,زهرا امین الرعایایی
همایش11th International (Virtual) Conference on Information and Knowledge Technology (IKT2020)
تاریخ برگزاری همایش2020-12-22 - 2020-12-23
محل برگزاری همایش1 - تهران
ارائه به نام دانشگاهدانشگاه شهید بهشتی
نوع ارائهسخنرانی
سطح همایشبین المللی

چکیده مقاله

Duplicate bug report detection (DBRD) is a famous problem in software triage systems like Bugzilla. It is vital to update the internal machine learning (ML) models of DBRD for real-world usage and continuous query of new bug reports. The training phase of ML algorithms is timeconsumable and dependent on the training dataset volume. Instance-based learning (IbL) is an ML technique that reduces the number of samples in the training dataset to achieve fast learning for the incremental database. This research introduces a hybrid approach using clustering and straight forward sampling to improve the runtime and validation performance of DBRD. Two bug report datasets of Android and Mozilla Firefox are used to evaluate the proposed approach. The experimental evaluation shows acceptable results and improvement in both runtime and validation performance of DBRD versus the traditional approach without IbL.

کلید واژه ها: Information Retrieval, Natural Language Processing, Duplicate Detection, Bug Reports, Instance-based Learning, Online Query, Continuous Query, Incremental Learning