A Study on Application of Map Reduce in Document Clustering

Authors

  • Yathesh L, Ansh Jain, Nandipati Jaswanth Sai, Sahil Sujit, Gopichand G*

Abstract

Big Data is one of the leading streams in Computer Science. The normal database management tools have
significantly failed in processing and managing the enormous and complex data. The issue becomes serious with
the working of large search engines. The Web is the collection of documents and resources interlinked through
dense connections known as the Internet. Document Clustering is one of the important aspects of Web to analyse
and cluster the documents, or in simple words group the related documents so as to reduce the size of Big Data.
[1]
To efficiently retrieve data from a large number of documents, we use a Parallel Algorithm paradigm known as
MapReduce. The MapReduce application in document clustering will allow us to understand the parallel
computation of <key, value> pairs using the Mapper function and correspondingly retrieving data using
Reducer algorithm. [2]

Published

2020-02-28

Issue

Section

Articles