IGICA: A Hybrid Feature Selection Approach in Text Categorization

نویسندگانمحمد مجاوریان کرمانی-حسین ابراهیم پور کومله-سید جلال الدین موسوی راد
تاریخ انتشار۰-۰-۰۱

چکیده مقاله

Abstract—Feature selection problem is one of the most important issues in machine learning and statistical pattern recognition. This problem is important in many applications such as text categorization because there are many redundant and irrelevant features in these applications which may reduce the classification performance. Indeed, feature selection is a method to select an appropriate subset of features for increasing the performance of learning algorithms. In the text categorization, there are many features which most of them are redundant. In this paper, a two-stage feature selection method-IGICA- based on imperialist competitive algorithm (ICA) is proposed. ICA is a new metaheuristic which is inspired by imperialist competition among countries. At the first stage of the proposed algorithm, a filtering technique using the information gain is applied and features are ranked based on their values. The top ranking features are then selected. In the second stage, ICA is applied to the select the efficient features. The presented method is evaluated on Retures-21578 dataset. The experimental results showed that the proposed method has a good ability to select efficient features compared to other methods.