Improved K-means Using an Integration of Density based and Genetic Algorithm based Model

Authors

  • Pradeep kumar D, Anita Kanavalli , Shashank Nagesh, Pauline Joseph, Puneet Udhayan , Shrey Naik

Abstract

In the data oriented world that we live in, many clustering algorithms exist but one that has been widely studied and used is K-means. K-means is a well-known clustering algorithm, however, it is not without its limitations. A few of its limitations have been identified, as well as ways to improve it so as to make it more efficient. The first problem identified is that the optimal K value must be known beforehand. If a bad value is selected, it will result in poor clustering. This is achieved by a method that uses density plots to determine the best K value. The next problem that is looked into, is ways to optimize the initial centroid selection process and thus improve efficiency of the algorithm as a whole. This is done by using an evolutionary algorithm called genetic algorithm. This enhanced version is then tested on benchmark clustering datasets and artificially generated datasets. Afterwards, it is applied to a dataset consisting of geographical coordinates of taxi pickup locations to identify hotspots. 

Published

2020-11-01

Issue

Section

Articles