EXTRACTION OF INTERESTING VARIABLES FROM DISPARATE DATA IN BIG DATA ANALYTICS

Authors

  • Satuluri Naganjaneyulu, Sambasivarao Chindam, .K.Raviteja

Abstract

The disparate data refers to a range of dissimilar data having varied data formats. Such data is
characterised by diversity in dimensionality and often regarded as of inferior quality. In recent times,
the data science realm is witnessing generation of humongous amounts of disparate data globally that
demand efficient management of such data through various collaborative methods. Typically, Big
Data is defined as a loose ensemble disparate, vibrant, unreliable and interconnected data. Big Data
analysis is applied to examine the association between factors and patterns of detection. In disparate
data, the hitherto unknown trends can be located through Big Data analysis. The disparate data
carries a tag of being inferior or being "low quality" as it suffers from missing values and high data
redundancy besides being inconsistent, vague, and noisy. This present review article discusses some
of the challenges and problems found in modern big data models that deal with disparate data and
proposes a novel approach to variable selection in high-dimensional data. The projected algorithm
will help in obtaining a good subset of features by efficiently treating irrelevant and superfluous
features.

Published

2020-10-17

Issue

Section

Articles