Big Data
Big Data
Big Data
Delivered by:
Daw Zin Mar Soe
Institute of International Professionalism
What is Big Data?
• Big Data is also data but with a huge size. Big Data is a term
used to describe a collection of data that is huge in volume
and yet growing exponentially with time.
• In short such data is so large and complex that none of the
traditional data management tools are able to store it or
process it efficiently.
Examples Of Big Data
• The New York Stock Exchange generates about one
terabyte of new trade data per day.
• 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day.
• A single Jet engine can generate 10+terabytes of data
in 30 minutes of flight time. With many thousand
flights per day, generation of data reaches up to
many Petabytes.
Types Of Big Data
BigData' could be found in three forms:
Structured
Unstructured
Semi-structured
Structured
• Any data that can be stored, accessed and processed in the form
of fixed format is termed as a 'structured' data.
• Examples Of Structured Data. An 'Employee' table in a database is
an example of Structured Data
Unstructured
• Any data with unknown form or the structure is classified as
unstructured data. In addition to the size being huge, un-structured
data poses multiple challenges in terms of its processing for deriving
value out of it.
• A typical example of unstructured data is a heterogeneous data
source containing a combination of simple text files, images, videos
etc.
Unstructured
• Examples Of Un-structured Data. The output returned by 'Google
Search’
Semi-structured
• Semi-structured data can contain both the forms of data. We can see
semi-structured data as a structured in form but it is actually not defined
with e.g. a table definition in relational DBMS.
• Example of semi-structured data is a data represented in an XML file.
• Examples Of Semi-structured Data
• Personal data stored in an XML file-
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
Characteristics Of Big Data
• Volume; how big must a data set be before traditional data handling
methods cannot cope. [Speed of collection]
• Velocity; how quickly will data arrive, how quickly must it be evaluated
and acted upon. Including the problems of real time data. [Volume of
data]
• Variety; the problems of dealing with unstructured data, including
additional processing and the use of metadata. [Range of data types
collected.]
• Veracity; how reliable is the data. Costs of finding errors v costs of
accepting errors. Legal consequences could also be looked at. [Accuracy
or quality of data]
• Value; it is often said that all data has value, but there are costs in
collecting, storing, processing, analysing. Students should look at the cost-
benefit equation.[Actual or potential usefulness of analyzing the data]
Benefits of Big Data Processing
There are some interesting examples of real life uses of Big Data here:
https://www.datapine.com/blog/big-data-examples-in-real-life/
One characteristic of Big Data is the
volume of data collected. Give two other
characteristics of Big Data. (2021 Oct)
• Velocity / Speed of collection
• Variety / Range of data types collected / Mix of
structured and unstructured data
• Veracity / Accuracy or quality
• Value / Actual or potential usefulness of analysing
the data
High capacity storage devices are required to
handle Big Data.
State one other infrastructure requirement
for Big Data. (2021 Oct)