What is big data?

In particular, remote sensors continuously produce much heterogeneous data that are either structured or unstructured. This data is known as Big Data.

Big Data is characterized by three aspects:
(a) the data are numerous,
(b) the data cannot be categorized into regular relational databases, and
(c) data are generated, captured, and processed very quickly.

Big Data is promising for business application and is rapidly increasing as a segment of the IT industry. It has generated significant interest in various fields, including the manufacture of healthcare machines, banking transactions, social media, and satellite imaging. Traditionally, data is stored in a highly structured format to maximize its informational contents. However, current data volumes are driven by both unstructured and semistructured data. Therefore, end-to-end processing can be impeded by the translation between structured data in relational systems of database management and unstructured data for analytics.

Data Growth Issue - A big problem with big data

One of the most pressing challenges of Big Data is storing all these huge sets of data properly. The amount of data being stored in data centers and databases of companies is increasing rapidly. As these data sets grow exponentially with time, it gets extremely difficult to handle.

FIVE V'S OF BIG DATA

Variety

The term variety indicates the different types of data that we collect for performing analysis. And this variety can be even structured, semi structured and at sometimes unstructured. This variety of data are collected from the social networking sites, customer feedback surveys, web logs and it may be in the form of text, image, audio and at times video. If the data format is incompatible, or if it is incomplete then it will lead to significant challenge while doing the analysis.

Velocity

As a term denotes velocity in network it refers to the rate of data flow around the system. It means the amount of data entered as input and the extraction of data as output. Especially it becomes tedious process while we are dealing with banking transaction, social networking sites such as twitter, Facebook etc. Traditional method of data analytics tool will not perform this analysis process efficiently. Hence the modern analytics tools are introduced here to perform gathering and analyzing the data.

Volume

It is most important feature of big data which refers to the quantity of data gathered by the particular organization. And the data acquisition can be any of the form like sensor, Reviews in Social network sites, Internet of Things, Web Pages etc.

Veracity

It deals with the degree of integrity. Since we are collecting data from various source we may not sure that all the collected data is accurate. Sometimes the data may be less accurate, low quality, less reliable and it may not be consistent all the time. But the introduction of modern analytics tools helps to achieve all these.

Value

Growth of the organization will be predicted based on the good delivered with high quality. And it also refers the usefulness of data while making decisions. And it will be carried out by analyzing the organizations data which in turn increases the profit of the particular organization.

Tools for Big Data

Data Storage Tools

1. HDInsight
2. Hadoop
3. NoSQL
4. Hive
5. Sqoop
6. Presto

Data Visualization Tools

1. Data Wrapper
2. Solver
3. Tablue