Fast Duplicate Bug Reports Detector Training using Sampling for Dimension Reduction

AuthorsBehzad Soleimani Neysiani,Saeed Doostali,زهرا امین الرعایایی
Conference Title11th International (Virtual) Conference on Information and Knowledge Technology (IKT2020)
Holding Date of Conference2020-12-22 - 2020-12-23
Event Place1 - تهران
Presented byدانشگاه شهید بهشتی
PresentationSPEECH
Conference LevelInternational Conferences

Abstract

Duplicate bug report detection (DBRD) is a famous problem in software triage systems like Bugzilla. It is vital to update the internal machine learning (ML) models of DBRD for real-world usage and continuous query of new bug reports. The training phase of ML algorithms is timeconsumable and dependent on the training dataset volume. Instance-based learning (IbL) is an ML technique that reduces the number of samples in the training dataset to achieve fast learning for the incremental database. This research introduces a hybrid approach using clustering and straight forward sampling to improve the runtime and validation performance of DBRD. Two bug report datasets of Android and Mozilla Firefox are used to evaluate the proposed approach. The experimental evaluation shows acceptable results and improvement in both runtime and validation performance of DBRD versus the traditional approach without IbL.

tags: Information Retrieval, Natural Language Processing, Duplicate Detection, Bug Reports, Instance-based Learning, Online Query, Continuous Query, Incremental Learning