A Comparative Analysis on Improving Covid-19 Prediction by Using Ensemble Learning Methods

KARTAL E.

21st International Symposium on Production Research (ISPR) - Digitizing Production System, ELECTR NETWORK, 7 - 09 Ekim 2021, ss.3-14, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1007/978-3-030-90421-0_1
Basıldığı Ülke: ELECTR NETWORK
Sayfa Sayıları: ss.3-14
İstanbul Üniversitesi Adresli: Evet

Özet

In this study, it is aimed to improve the Covid-19 predictions in terms of the distinction between Covid-19 and Flu by using several well-known ensemble learning methods namely, majority voting, bagging, boosting, and stacking. In this scope, the performance of base machine learning models was compared with the ensemble ones (majority voting, C5.0, stochastic gradient boosting, bagged CART, random forest, and stacking models) on a public Covid-19 dataset in which observations are labelled as Covid-19 and Flu. Since the task belongs to a classification problem, supervised machine learning algorithms (logistic regression (via generalized linear model), classification and regression trees, artificial neural networks, and support vector machines) are used as base learners. The Cross-Industry Standard Process Model for Data Mining, which is consisted of six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment, is used as the study method. In the model performance evaluation stage, an additional metric was proposed by considering the accuracy and its change interval (max-min). The performance of the models was discussed in terms of accuracy and the proposed metric. A Shiny application is developed by using the best performing model. The application enables users to predict Covid-19 status through a web interface and to use it interactively. Analyses are performed with R and RStudio.