Nikiforov
Vladimir O.
D.Sc., Prof.
doi: 10.17586/2226-1494-2017-17-3-498-505
GENERATING DATASETS FOR THE BINARY CLASSIFICATION TASK BASED ON THEIR CHARACTERISTIC DESCRIPTIONS
Read the full article ';
For citation: Zabashta A.S., Filchenkov A.A. Generating datasets for the binary classification task based on their characteristic descriptions. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2017, vol. 17, no. 3, pp. 498–505 (in Russian). doi: 10.17586/2226-1494-2017-17-3-498-505
Abstract
Subject of Study. We present a method for generating instances of the binary classification task by (according to, based on) their characteristic descriptions in the form of a meta-feature vector. We propose a naïve method for the same problem solution to be used as a referral one. We study the characteristic space of the binary classification task instances, as well as the methods for this space traversal. Method. The proposed method is based on genetic algorithm, where the distance in the characteristic space from the description vector of the generated instance for the binary classification task to the specified one is used as the minimized objective function. We developed the crossover and mutation operators for the genetic algorithm. These operators are based on such transformations as addition or removal of features and objects from datasets. Main Results. In order to validate the proposed method, we chose several non-trivial two-dimensional meta-feature spaces that were generated from statistical, information-theoretical and structural characteristics of classification task instance. We used the baseline method to evaluate the relative error of the proposed method. Both methods used the same number of classification tasks instances. The proposed method outperformed the naïve method and reduced average error by 30 times. Practical Relevance. The proposed method for generating instances for classification task based on their characteristic description allows obtaining unknown instances that are required to evaluate the performance of classifiers in certain areas of the meta-features space for design of automatic algorithm selection systems
Acknowledgements. This work was financially supported by the Government of the Russian Federation, Grant 074-U01, and the Russian Foundation for Basic Research, Grant 16-37-60115 mol_a_dk.
References
22–31. doi: 10.1007/3-540-63890-3_4