Document Clustering Approach using Fuzzy WordNet and Adaptive PSO-K-Means Algorithm

Authors

  • Venkata Nagaraju Thatha , A. Sudhir Babu , D. Haritha

Abstract

Recently, document clustering has gained a lot of prominence as an effective technique to organize
unsupervised documents, automatically extract topics and help retrieve or filter information quickly.
One of the simplest clustering algorithms is K-mean. However, in the existing k-means clustering
algorithm, the earliest step specifies the quantum of clusters in the initial stages; wherein there is
variations noticed in the convergence centroids after partitions are done in the earlier stages. In most
of the applications, it is difficult to find clusters, may be due to lack of prior knowledge. In this
context, repeated running of the algorithm must be done with varied k values, after which clustering
results are compared and the optimal number of clusters are ascertained. To overcome such a
problem, adaptive particle swarm optimization (PSO) which automatically determines the
appropriate number of clusters and their centers is applied here. Here, formulating the k-means
clustering in the format of regularization is the prime focus, which is done by applying a suitable
group lasso penalty term to cluster centers. After applying the regularized k-means clustering to
document databases, it manages to produce promising results as compared to the standard k-mean.

Published

2020-10-17

Issue

Section

Articles