0% found this document useful (0 votes)
59 views

Big Data

This document discusses key aspects of big data analytics. It defines big data as extremely large data sets that cannot be processed by traditional data management tools. Examples of big data sources include stock exchange data, social media data, and data from jet engines. The document outlines the types of data as structured, unstructured, and semi-structured. It then describes the five V's of big data: volume, variety, velocity, veracity, and value.

Uploaded by

VISHNUNATH MS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Big Data

This document discusses key aspects of big data analytics. It defines big data as extremely large data sets that cannot be processed by traditional data management tools. Examples of big data sources include stock exchange data, social media data, and data from jet engines. The document outlines the types of data as structured, unstructured, and semi-structured. It then describes the five V's of big data: volume, variety, velocity, veracity, and value.

Uploaded by

VISHNUNATH MS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Module 4

BIG DATA ANALYTICS


BIG DATA OVERVIEW
● Big Data is a collection of data that is huge in
volume, yet growing exponentially with time. It
is a data with so large size and complexity that
none of traditional data management tools can
store it or process it efficiently.
● Big data is also a data but with huge size.
Example of Big Data
● The New York Stock Exchange is an example of Big Data
that generates about one terabyte of new trade data per
day.
● Social Media : The statistic shows that 500+terabytes of
new data get ingested into the databases of social media
site Facebook, every day. This data is mainly generated in
terms of photo and video uploads, message exchanges,
putting comments etc.
● A single Jet engine can generate 10+terabytes of data in
30 minutes of flight time. With many thousand flights per
day, generation of data reaches up to many Petabytes.
Types Of Big Data

● Following are the types of Big Data:

● Structured : Any data that can be stored, accessed and


processed in the form of fixed format is termed as a
‘structured’ data.
● Unstructured : Any data with unknown form or the structure is
classified as unstructured data.
● Semi-structured : Semi-structured data can contain both the
forms of data. Example of semi-structured data is a data
represented in an XML file.
Examples Of Structured Data

An ‘Employee’ table in a database is an example of Structured Data


Examples Of Un-structured Data

The output returned by ‘Google Search’


Examples Of Semi-structured Data

Personal data stored in an XML file.


Data Growth over the years
Characteristics Of Big Data
(5 V’s of Big Data)
● Big data can be described by the following
characteristics:

● Volume
● Variety
● Velocity
● Veracity
● Value
● Volume
● The name Big Data itself is related to a size
which is enormous. Size of data plays a very
crucial role in determining value out of data.
Also, whether a particular data can actually be
considered as a Big Data or not, is dependent
upon the volume of data.
● Hence, ‘Volume’ is one characteristic which
needs to be considered while dealing with Big
Data solutions.
Variety
● Variety refers to heterogeneous sources and the
nature of data, both structured and unstructured.
● During earlier days, spreadsheets and databases
were the only sources of data considered by most
of the applications.
● Nowadays, data in the form of emails, photos,
videos, monitoring devices, PDFs, audio, etc. are
also being considered in the analysis applications.
● This variety of unstructured data poses certain
issues for storage, mining and analyzing data.
Velocity
● The term ‘velocity’ refers to the speed of
generation of data. How fast the data is
generated and processed to meet the
demands, determines real potential in the data.
● Big Data Velocity deals with the speed at which
data flows in from sources like business
processes, application logs, networks, and
social media sites, sensors, Mobile devices,
etc.
● The flow of data is massive and continuous.
Veracity
− It refers to inconsistencies and uncertainty in data,
that is data which is available can sometimes get
messy and quality and accuracy are difficult to
control.
− Big Data is also variable because of the multitude of
data dimensions resulting from multiple disparate
data types and sources.
− Example: Data in bulk could create confusion
whereas less amount of data could convey half or
Incomplete Information.
Value
● After having the 4 V’s into account there comes
one more V which stands for Value!. The bulk of
Data having no Value is of no good to the
company, unless you turn it into something useful.
● Data in itself is of no use or importance but it
needs to be converted into something valuable to
extract Information. Hence, you can state that
Value! is the most important V of all the 5V’s.

You might also like