A Multi-Algorithmic Approach to Stroke Risk Prediction Using Machine Learning
Ebele Grace Onyedinma *
Department of Computer Science, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria.
Doris Chinedu Asogwa
Department of Computer Science, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria.
Tochukwu Sunday Belonwu
Department of Computer Science, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria.
Chinedu Emmanuel Mbonu
Department of Computer Science, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria and Department of Computer Science, Nazarbayev University, Astana, Kazahkstan.
*Author to whom correspondence should be addressed.
Abstract
Stroke remains a major public health concern and a leading cause of death and long-term disability worldwide. Early prediction of stroke risk can greatly improve preventive care and patient outcomes. This study employed a retrospective analysis using the Kaggle Stroke Prediction Dataset to develop effective machine learning models for stroke prediction. The dataset underwent thorough preprocessing, including data cleaning, transformation of categorical variables into numerical format, and exploratory data analysis. Feature selection was performed using the ANOVA F-test and Chi-squared test to identify the most significant predictors. To address the class imbalance problem, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. Four classification algorithms, Logistic Regression, Decision Tree, Random Forest, and XGBoost were trained and evaluated using performance metrics such as accuracy, precision, recall, and F1-score. Hyperparameter tuning was conducted using Grid Search to optimize each model. Among the models, Random Forest and XGBoost achieved the highest accuracy of 91%, outperforming the others. The findings underscore the effectiveness of ensemble learning methods in predicting stroke risk and demonstrate the potential of integrating machine learning into healthcare systems for early detection and improved clinical decision-making.
Keywords: Stroke prediction, machine learning, random forest, XGBoost, SMOTE, feature selection, logistic regression, medical diagnosis, class imbalance