Menu
Publications
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief
Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2021-21-5-702-708
A meta-feature selection method based on the Auto-sklearn framework
Read the full article ';
Article in English
For citation:
Abstract
For citation:
Kulin N.I., Muravyov S.B. A meta-feature selection method based on the Auto-sklearn framework. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2021, vol. 21, no. 5, pp. 702–708. doi: 10.17586/2226-1494-2021-21-5-702-708
Abstract
In recent years, the task of selecting and tuning machine learning algorithms has been increasingly solved using automated frameworks. This is motivated by the fact that when dealing with large amounts of data, classical methods are not efficient in terms of time and quality. This paper discusses the Auto-sklearn framework as one of the best solutions for automated selection and tuning machine learning algorithms. The problem of Auto-sklearn 1.0 solution based on Bayesian optimization and meta-learning is investigated. A solution to this problem is presented. A new method of operation based on meta-database optimization is proposed. The essence of the method is to use the BIRCH clustering algorithm to separate datasets into different groups. The selection criteria are the silhouette measure and the minimum number of initial Bayesian optimization configurations. The next step uses a random forest model, which is trained on a set of meta-features and the resulting labels. Important meta-features are selected from the entire set. As a result, an optimal set of important meta-features is obtained, which is used to find the initial Bayesian optimization configurations. The described method significantly speeds up the search for the best machine learning algorithm for classification tasks. The experiments were conducted with datasets from OpenML to compare Auto-sklearn 1.0, 2.0 and a new version that uses the proposed method. According to the results of the experiment and statistical Wilcoxon T-criterion tests, the new method was able to outperform the original versions in terms of time, outperforms Auto-sklearn 1.0 and competes with Auto-sklearn 2.0. The proposed method will help to speed up the time to find the best solution for machine learning tasks. Optimization of such frameworks is reasonable in terms of saving time and other resources, especially when working with large amounts of data.
Keywords: AutoML, automated machine learning, machine learning, meta-learning, classification
References
References
1. Nagarajah T., Guhanathan P. A Review on Automated Machine Learning (AutoML) Systems. Proc. IEEE 5th International Conference for Convergence in Technology (I2CT), 2019, pp. 9033810. https://doi.org/10.1109/I2CT45611.2019.9033810
2. Ge P. Analysis on approaches and structures of automated machine learning frameworks. Proc. 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), 2020, pp. 474–477. https://doi.org/10.1109/CISCE50729.2020.00106
3. Chauhan K., Jani S., Thakkar D., Dave R., Bhatia J., Tanwar S., Obaidat M.S. Automated machine learning: The new wave of machine learning. Proc. 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 2020, pp. 205–212. https://doi.org/10.1109/ICIMIA48430.2020.9074859
4. Ebadi A., Gauthier Y., Tremblay S., Paul P. How can automated machine learning help business data science teams? Proc. 18th IEEE International Conference on Machine Learning and Applications (ICMLA), 2019, pp. 1186–1191. https://doi.org/10.1109/ICMLA.2019.00196
5. Snoek J., Larochelle H., Adams R.P. Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 2012, vol. 4, pp. 2951–2959.
6. Jiang M., Chen Y. Research on bayesian optimization algorithm selection strategy. Proc. IEEE International Conference on Information and Automation (ICIA), 2010, pp. 2424–2427. https://doi.org/10.1109/ICINFA.2010.5512281
7. Feurer M., Hutter F. Hyperparameter optimization. Automated Machine Learning. Springer, 2019, pp. 3–33. https://doi.org/10.1007/978-3-030-05318-5_1
8. Brazdil P., Giraud Carrier C., Soares C., Vilalta R. Metalearning: Applications to Data Mining. Springer Science & Business Media, 2009, XI, 176 p. https://doi.org/10.1007/978-3-540-73263-1
9. Hospedales T.M., Antoniou A., Micaelli P., Storkey A.J. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, in press. https://doi.org/10.1109/TPAMI.2021.3079209
10. Abdulrhaman S.M., Brazdil P. Measures for combining accuracy and time for meta-learning. CEUR Workshop Proceedings, 2014, vol. 1201, pp. 49–50.
11. Feurer M., Springenberg J., Hutter F. Initializing bayesian hyperparameter optimization via meta-learning. Proc. 29th AAAI Conference on Artificial Intelligence, 2015, pp. 1128–1135.
12. Feurer M., Klein A., Eggensperger K., Springenberg J.T., Blum M., Hutter F. Auto-sklearn: efficient and robust automated machine learning. Automated Machine Learning. Springer, 2019, pp. 113–134. https://doi.org/10.1007/978-3-030-05318-5_6
13. Feurer M., Eggensperger K., Falkner S., Lindauer M., Hutter F. Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning. arXiv.org. arXiv:2007.04074. 2020.
14. Alcobaça E., Siqueira F., Rivolli A., Garcia L.P.F., Oliva J.T., de Carvalho A.C.P.L.F. MFE: Towards reproducible meta-feature extraction. Journal of Machine Learning Research, 2020, vol. 21, pp. 1–5.
15. Zhang T., Ramakrishnan R., Livny M. BIRCH: an efficient data clustering method for very large databases. ACM Sigmod Record, 1996, vol. 25, no. 2, pp. 103–114. https://doi.org/10.1145/235968.233324