## Menu

## Publications

## Editor-in-Chief

**Nikiforov**

Vladimir O.

D.Sc., Prof.

Vladimir O.

D.Sc., Prof.

## Partners

DOI: 10.17586/2226-1494-2017-17-3-498-505

DOI: 10.17586/2226-1494-2017-17-3-498-505

## GENERATING DATASETS FOR THE BINARY CLASSIFICATION TASK BASED ON THEIR CHARACTERISTIC DESCRIPTIONS

**Read the full article**

**Article in**Russian

**For citation:**Zabashta A.S., Filchenkov A.A. Generating datasets for the binary classification task based on their characteristic descriptions.

*Scientific and Technical Journal of Information Technologies, Mechanics and Optics*, 2017, vol. 17, no. 3, pp. 498–505 (in Russian). doi: 10.17586/2226-1494-2017-17-3-498-505

**Abstract**

**Subject of Study**. We present a method for generating instances of the binary classification task by (according to, based on) their characteristic descriptions in the form of a meta-feature vector. We propose a naïve method for the same problem solution to be used as a referral one. We study the characteristic space of the binary classification task instances, as well as the methods for this space traversal. **Method**. The proposed method is based on genetic algorithm, where the distance in the characteristic space from the description vector of the generated instance for the binary classification task to the specified one is used as the minimized objective function. We developed the crossover and mutation operators for the genetic algorithm. These operators are based on such transformations as addition or removal of features and objects from datasets. **Main Results.** In order to validate the proposed method, we chose several non-trivial two-dimensional meta-feature spaces that were generated from statistical, information-theoretical and structural characteristics of classification task instance. We used the baseline method to evaluate the relative error of the proposed method. Both methods used the same number of classification tasks instances. The proposed method outperformed the naïve method and reduced average error by 30 times. **Practical Relevance**. The proposed method for generating instances for classification task based on their characteristic description allows obtaining unknown instances that are required to evaluate the performance of classifiers in certain areas of the meta-features space for design of automatic algorithm selection systems

**Keywords:**machine learning, meta-learning, classification problem, evolutionary computation, genetic algorithm

**Acknowledgements.**This work was financially supported by the Government of the Russian Federation, Grant 074-U01, and the Russian Foundation for Basic Research, Grant 16-37-60115 mol_a_dk.

**References**

*DIMACS Series in Discrete Mathematics and Theoretical Computer Science*, 1997, vol. 35, pp. 1–17. doi: 10.1090/dimacs/035/01

*Lecture Notes in Computer Science*, 1997, vol. 1350, pp.

22–31. doi: 10.1007/3-540-63890-3_4

*Artificial Intelligence*, 1996, vol. 81, no. 1-2, pp. 17–29.

*Artificial Intelligence*, 2007, vol. 171, no. 8, pp. 514–534. doi: 10.1016/j.artint.2007.04.001

*Evolutionary Computation*, 2006, vol. 14, no. 4, pp. 433–462. doi: 10.1162/evco.2006.14.4.433

*Lecture Notes in Computer Science*, 1999, vol. 1644, pp. 1–9.

*Scientific and Technical Journal of Information Technologies, Mechanics and Optics*, 2011, no. 2, pp. 72–77. (In Russian)

*Scientific and Technical Journal of Information Technologies, Mechanics and Optics*, 2011, no. 6, pp. 123–127. (In Russian)

*IEEE Congress on Evolutionary Computation*. Brisbane, Australia, 2012, pp. 3446–3453. doi: 10.1109/CEC.2012.6252992

*DIMACS Series in Discrete Mathematics and Theoretical Computer Science*, 1996, vol. 26, pp. 377–393. doi: 10.1090/dimacs/026/18

*Studies in Computational Intelligence*, 2013, vol. 434, pp. 417–432. doi: 10.1007/978-3-642-30671-6-16

*Computers and Operations Research*, 2014, vol. 45, pp. 12–24. doi: 10.1016/j.cor.2013.11.015

*Tutorial at 7*. San Diego, California, 2008, pp. 11–13.

^{th}Int. Conf. on Machine Learning and Applications, ICMLA*Metalearning. Applications to Data Mining*. Springer, 2009, 176 p. doi: 10.1007/978-3-540-73263-1

*IEEE Transactions on Evolutionary Computation*, 1997, vol. 1, no. 1, pp. 67–82. doi: 10.1109/4235.585893

*Soft Computing and Industry*, 2002, pp. 25–42. doi: 10.1007/978-1-4471-0123-9_3

*Matematicheskie metody obucheniya po pretsedentam (teoriya obucheniya mashin)*. Available at: http://docplayer.ru/2064-K-v-voroncov-http-www-ccas-ru-voron-voron-ccas-ru.html (accessed 24.03.2017).

*Self-Learning Systems*. Moscow, MTsNMO Publ., 2009, 287 p. (In Russian)

*IEEE Transactions on Evolutionary Computation*, 2005, vol. 9, no. 3, pp. 303–317. doi: 10.1109/TEVC.2005.846356

*Genetic Algorithms*.Moscow, Fizmatlit Publ., 2006, 319 p. (In Russian)

*Science*, 1983, vol. 220, no. 4598, pp. 671–680.

*Fundamentals of Evolutionary Computation*. Donetsk, DonNTU Publ., 2008, 326 p. (in Russian)

*ACM*

*SIGKDD*

*Explorations*

*Newsletter*, 2014, vol. 15, pp. 49–60. doi: 10.1145/2641190.2641198

*Proc. Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT*. St. Petersburg, Russia, 2015, pp. 11–18. doi: 10.1109/AINL-ISMW-FRUCT.2015.7382962

*Proc. 13*. Dallas, USA, 1998, pp. 604–613.

^{th}Annual ACM Symposium on Theory of Computing