著者
Prachuabsupakij Wanthanee Nuanwan Soonthornphisaj
出版者
人工知能学会
雑誌
人工知能学会全国大会論文集 (ISSN:13479881)
巻号頁・発行日
vol.26, 2012

The aim of this paper is to improve the classification performance based on the multiclass imbalanced datasets. In this paper, we introduce a new classification technique based on Clustering approach for Imbalanced Multiclass datasets (CIM). CIM uses the clustering approach to create a new training set for each cluster and apply two re-sampling technique to re-balance the class distribution. CIM improves the classification performance based on the multiclass imbalanced datasets in three ways. Firstly, k-means is used to split the set of instances into two clusters. Then, for each cluster, two re-sampling tehcnique (oversampling and undersampling) are applied on the the training set in order to balance the class distribution. Finally, ensemble approaches are used to combine the models obtained with our method through a majority vote. We have conducted experiments on many multiclass datasets from the UCI. These datasets consist of two types of class distribution; balance and imbalance. We use different classifiers in order to observe the performance and suitability of our purpose within each classifier. We carry out the experimental study with the several well-know algorithms such as Decision Trees, Naïve Bayes, and K-Nearest Neighbors . The performance is measured based on G-mean and F-measure. The experimental results show that the proposed method achieved higher performance than the baseline algorithms; One-Against-One, One-Against-All, and Error-Correcting-Output-Coding (ECOC), and the baseline with oversampling algorithms in many classifiers. Moreover, the empirical results show that CIM algorithm is a practical algorithm since it can be applied to both balance and imbalance datasets. The proposed method was successfully applied to many datasets. Since CIM creates the new training sets that consist of the instances with similar characteristics and these instances are relabeled.