Optimization of Random Forest Algorithm with Ensemble and Hyper Parameter Tuning Techniques for Multiple Heart Diseases

Authors

  • Mahesh V. Sonth, Sateesh Ambesange, D. Sreekanth, Sateesh Tulluri

Abstract

Heart disease has become one of the most common reasons for the death of people because of our living style in the last 15 years. The accurate diagnosis and ability to predict the possibility of heart disease beforehand, so that timely consulting the doctor with mild symptoms may prevent death caused by heart failure. In this paper we have given a step by step approach in improving the existing random forest machine learning algorithm (ML) for the heart failure clinical records published by UCI data set. The approach involves the univariate and multivariate analysis on these dataset is carried out using statistical methods and correlation among the various heart disease related features. The skewness present in data is normalized using power transformation techniques. Turkey Fence algorithm is used to remove the unwanted outliers observed in univariate analysis before applying the transformation. The few not important features are left out and the grid search method used in fine tuning various parameters of random forest algorithm. Performance of the ML model has been evaluated using standard metrics such as confusion matrix, accuracy score, precision-recall curve (PRC), and Receiver operating curve (ROC). Ensemble and dimension reduction techniques also used to improve the performance of models based through kernel principal component analysis (PCA) on the dataset to achieve nearly 100% accuracy for standard tested UCI data set.

Published

2020-11-01

Issue

Section

Articles