In particular, remote sensors continuously produce much heterogeneous data that are either structured or unstructured.
This data is known as Big Data.
Big Data is characterized by three aspects:
(a) the data are numerous,
(b) the data cannot be categorized into regular relational databases, and
(c) data are generated, captured, and processed very quickly.
Big Data is promising for business application and is rapidly increasing as a segment of the IT industry.
It has generated significant interest in various fields, including the manufacture of healthcare machines, banking
transactions, social media, and satellite imaging. Traditionally, data is stored in a highly structured format to
maximize its informational contents. However, current data volumes are driven by both unstructured and semistructured
data. Therefore, end-to-end processing can be impeded by the translation between structured data in relational systems
of database management and unstructured data for analytics.
One of the most pressing challenges of Big Data is storing all these huge sets of data properly. The amount of data being stored in data centers and databases of companies is increasing rapidly. As these data sets grow exponentially with time, it gets extremely difficult to handle.
Variety
The term variety indicates the different types of data that we collect for performing analysis.
And this variety can be even structured, semi structured and at sometimes unstructured.
This variety of data are collected from the social networking sites, customer feedback surveys,
web logs and it may be in the form of text, image, audio and at times video.
If the data format is incompatible, or if it is incomplete then it will lead to significant challenge
while doing the analysis.
Velocity
As a term denotes velocity in network it refers to the rate of data flow around the system.
It means the amount of data entered as input and the extraction of data as output. Especially it
becomes tedious process while we are dealing with banking transaction, social networking sites such
as twitter, Facebook etc. Traditional method of data analytics tool will not perform this analysis
process efficiently. Hence the modern analytics tools are introduced here to perform gathering and
analyzing the data.
Volume
It is most important feature of big data which refers to the quantity of data gathered by the
particular organization. And the data acquisition can be any of the form like sensor, Reviews in
Social network sites, Internet of Things, Web Pages etc.
Veracity
It deals with the degree of integrity. Since we are collecting data from various source we may not sure
that all the collected data is accurate. Sometimes the data may be less accurate, low quality, less
reliable and it may not be consistent all the time. But the introduction of modern analytics tools helps
to achieve all these.
Value
Growth of the organization will be predicted based on the good delivered with high quality. And it also
refers the usefulness of data while making decisions. And it will be carried out by analyzing the
organizations data which in turn increases the profit of the particular organization.
Data Storage Tools
1. HDInsight
2. Hadoop
3. NoSQL
4. Hive
5. Sqoop
6. Presto
Data Visualization Tools
1. Data Wrapper
2. Solver
3. Tablue