SIMPLIFYING CARDIOVASCULAR RISK PREDICTION: COST-AWARE MACHINE LEARNING AND INTERPRETABILITY FOR RESOURCE-CONSTRAINED HEALTHCARE
Abstract
Heart disease, a leading global cause of mortality, underscores the need for early and accurate detection. This study evaluates five machine learning algorithms, Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN), on the UCI Heart Disease dataset. Preprocessing included normalization, missing value imputation, and cost-aware feature selection via Recursive Feature Elimination (RFE). Models were assessed using accuracy, precision, recall, F1-score, and ROC-AUC metrics. Logistic Regression achieved the highest accuracy (90%), followed closely by SVM and ANN. A novel lightweight hybrid model, combining Logistic Regression with pruned Random Forest feature importance, was developed for resource-constrained settings, ensuring computational efficiency and interpretability. These results highlight the potential of simplified machine learning models as non-invasive tools for clinical decision support in low-resource environments.
Keywords: Feature Selection, Clinical Decision Support, Predictive Modeling, Resource-Constrained Deployment