Authors | Behzad Soleimani Neysiani,Saeed Doostali,زهرا امین الرعایایی |
---|---|
Conference Title | 11th International (Virtual) Conference on Information and Knowledge Technology (IKT2020) |
Holding Date of Conference | 2020-12-22 - 2020-12-23 |
Event Place | 1 - تهران |
Presented by | دانشگاه شهید بهشتی |
Presentation | SPEECH |
Conference Level | International Conferences |
Abstract
Duplicate bug report detection (DBRD) is a famous problem in software triage systems like Bugzilla. It is vital to update the internal machine learning (ML) models of DBRD for real-world usage and continuous query of new bug reports. The training phase of ML algorithms is timeconsumable and dependent on the training dataset volume. Instance-based learning (IbL) is an ML technique that reduces the number of samples in the training dataset to achieve fast learning for the incremental database. This research introduces a hybrid approach using clustering and straight forward sampling to improve the runtime and validation performance of DBRD. Two bug report datasets of Android and Mozilla Firefox are used to evaluate the proposed approach. The experimental evaluation shows acceptable results and improvement in both runtime and validation performance of DBRD versus the traditional approach without IbL.
tags: Information Retrieval, Natural Language Processing, Duplicate Detection, Bug Reports, Instance-based Learning, Online Query, Continuous Query, Incremental Learning