Xangô: a framework for robust attribute selection to the problem of unbalance between classes in text classification tasks
framework
Attribute selection is a widely used dimensionality reduction technique to deal with the difficulties associated with the "dimensionality curse" in text classification tasks. The most common attribute selection approach for textual databases is to weigh the relevance of each attribute to the learning process and to select the best-valued N's, where N is generally an empirically defined number. Although this strategy is widely applied, it can lead to the partial or complete exclusion of attributes essential for learning In this sense, the research presented in this paper aims to foster the generation of more reliable text classifiers through the improved Xangô, a framework for selecting attributes in tasks of classifying texts, whose selection process seeks to construct a reduced dimensional space where all classes are represented in a balanced way by the its most discriminatory terms. Experimental results indicate that Xangô is a framework that is adaptable to different state-of-art attribute selection methods, surpassing its individual performances in learning tasks performed under varied conditions, including multi-class problems, imbalance between classes, different classification algorithms and drastic dimensionality reductions.