A Comparative Study of Preprocessing Techniques for Stroke Prediction using XGBoost Classifier

Nizam Özoğur H., Orman Z.

3rd International Conference On Advanced Engineering, Technology And Applications, Catania, Italy, 24 - 25 May 2024

  • Publication Type: Conference Paper / Full Text
  • City: Catania
  • Country: Italy
  • Istanbul University Affiliated: Yes


Stroke is a condition characterized by the cessation of blood flow to a region of the brain or bleeding within the brain. Early diagnosis and treatment not only reduce the risks of permanent damage and mortality but also enhance the likelihood of recovery. Hence, timely diagnostic interventions are essential for formulating effective treatment strategies and preventing potential complications. Machine learning models are frequently used in the literature as powerful tools in stroke diagnosis. In this study, a comparative analysis of the effectiveness of methodologies used successfully in the literature with the Extreme Gradient Boosting (XGBoost) machine learning method was conducted to overcome the challenges caused by missing values and imbalanced datasets in stroke prediction. In the experiments, the Cerebral Stroke Prediction (CSP) dataset was employed to evaluate the performance of these methodologies using model evaluation metrics. The study findings emphasize the effectiveness of SMOTEENN in addressing class imbalance and missing data challenges across various imputation methods. This underlines the importance of employing suitable sampling and imputation strategies to improve the performance of stroke prediction models.