FEATURE SELECTION PARALLELIZATION BASED ON PRIORITY QUEUE

Smetannikov Ivan  B.

2017 , VOLUME 17, NUMBER 4 ( July-August )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2017-17-4-664-669

FEATURE SELECTION PARALLELIZATION BASED ON PRIORITY QUEUE

I. B. Smetannikov

Read the full article

Article in Russian

For citation: Smetannikov I.B. Feature selection parallelization based on priority queue. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2017, vol. 17, no. 4, pp. 664–669 (in Russian). doi: 10.17586/2226-1494-2017-17-4-664-669

Abstract

Subject of Research.The paper deals with feature selection algorithms in machine learning and, particularly, in classification. А method for fast feature selection is proposed. This method combines several other feature selection methods into one linear combination (ensemble) and then optimizes their coefficients. Method. Proposed method is a priority queue based method for feature selection. It is an improvement of measure linear form (MeLiF) algorithm. This method uses priority queue for parallelization, and basically is a parallel version of the MeLiF algorithm. Main Results. Proposed and original algorithms were compared by classification quality and computation time. Comparison was performed on 36 open DNA-microarrays. It was shown that both methods had approximately the same classification quality but computation time of the new method is 4.2 to 22 times lower on a 24-core processor with 50 threads. Practical Relevance. Proposed algorithm could be used as one of the main steps in data preprocessing for high dimensional data in machine learning. Therefore, it could be used in a wide specter of classification problems on high-dimensional datasets.

Keywords: machine learning, feature selection, ensemble feature selection, ranking filters, metric aggregation, MeLiF, parallel computing

Acknowledgements. This work was financially supported by the Government of the Russian Federation, Grant 074-U01, and the Russian Foundation for Basic Research, Grant 16-37-60115 mol_a_dk.

References

1. Fan J., Samworth R., Wu Y. Ultrahigh dimensional feature selection: beyond the linear model // Journal of Machine Learning Research. 2009. V. 10. P. 2013–2038.
2. Bolon-Canedo V., Sanchez-Marono N., Alonso-Betanzos A. et.al. A review of microarray datasets and applied feature selection methods // Information Sciences. 2014. V. 282. P. 111–135. doi: 10.1016/j.ins.2014.05.042
3. Saeys Y., Inza I., Larranaga P. A review of feature selection techniques in bioinformatics // Bioinformatics. 2007. V. 23. N 19. P. 2507–2517. doi: 10.1093/bioinformatics/btm344
4. Jiliang T., Salem A., Huan L. Feature Selection for Classification: A Review. CRC Press, 2014. 37 p.
5. Dietterich G. Ensemble methods in machine learning // Lecture Notes in Computer Science. 2000. V. 1857. P. 1–15.
6. Bolon-Canedo V., Sanchez-Marono N., Alonso-Betanzos A. An ensemble of filters and classifiers for microarray data classification // Pattern Recognition. 2012. V. 45. N 1. P. 531–539. doi: 10.1016/j.patcog.2011.06.006
7. DeConde R.P., Hawley S., Falcon S. et.al. Combining results of microarray experiments: a rank aggregation approach // Statistical Applications in Genetics and Molecular Biology. 2006. V. 5. P. i-23.
8. Dwork C. et.al. Rank aggregation methods for the web // Proc. 10th Int. Conf. on World Wide Web. 2001. P. 613–622.
9. Filchenkov A. et. al. PCA-based algorithm for constructing ensembles of feature ranking filters // Proc. ESANN. Bruges, Belgium, 2015. P. 201–206.
10. Smetannikov I., Filchenkov A. MeLiF: filter ensemble learning algorithm for gene selection // Advanced Science Letters. 2016. V. 22. N 10. P. 2982–2986. doi: 10.1166/asl.2016.7078
11. Isaev I., Smetannikov I. MeLiF+: Optimization of filter ensemble algorithm with parallel computing // IFIP Advances in Information and Communication Technology. 2016. V. 475. P. 341–347. doi: 10.1007/978-3-319-44944-9_29

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License