0% found this document useful (0 votes)
15 views22 pages

Big Data Chapter 1

The document provides an introduction to Big Data, defining it as large and complex data sets that are challenging to process with traditional tools. It outlines the differences between small data and big data, the factors contributing to the rise of big data, its types (structured, semi-structured, unstructured), and characteristics known as the 5 V's: Volume, Variety, Velocity, Value, and Veracity. The lecture emphasizes the importance of understanding these concepts for effective data management and analysis.

Uploaded by

rahman2312091037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views22 pages

Big Data Chapter 1

The document provides an introduction to Big Data, defining it as large and complex data sets that are challenging to process with traditional tools. It outlines the differences between small data and big data, the factors contributing to the rise of big data, its types (structured, semi-structured, unstructured), and characteristics known as the 5 V's: Volume, Variety, Velocity, Value, and Veracity. The lecture emphasizes the importance of understanding these concepts for effective data management and analysis.

Uploaded by

rahman2312091037
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction to Big Data

Chapter 1: Lecture 1
Mr. ISRAFIL
Lecturer, Dept.of CIS
Daffodil International University

1
Data and Information 01

Data Units 02

What is Small Data 03

What is Big Data 04

Small data vs Big Data 05

Outline
How Big data comes 06

Types of Big Data 07

Big data Characteristics 08

2
Data and information
Data is a raw and unorganized fact that required to be processed to make it meaningful

Information is a set of data which is processed in a meaningful way according to the given requirement.

3
Data Units

4
What is Small Data?

Small data is data that is 'small' enough for human comprehension. It is data in a volume and format that
makes it accessible, informative and actionable.

5
What is Big Data?

Big data is the term for collection of data sets so large and complex that it becomes difficult to
process using on-hand database system tools or traditional data processing applications

6
Small data vs Big data

The term "big data" is about machines and "small data" is about people. 7
How Big Data Comes

The below are the reasons behind the big data comes in picture:

1. Evolution of technology
2. IOT(Internet Of Things)
3. Social Media
4. Other factors

8
How Big Data Comes
1)Evolution of technology:

❏ Earlier we had land line phones, But nowadays,we


have android,IOS smart phones, to make our life
smarter. so just think, for each operation which we
perform on smart phones, generates a data, that
resides somewhere
❏ Desktops are the source to handle operations, i mean
to store and process using storage devices like
floppy,discs,taps,..etc. But in these days, Hard
disks,cloud storage plays a vital role.
❏ Earlier , we are in the hand of Analog storage, but
these days almost of Digital storage. and also about
the evolution of car, self driving car,

9
How Big Data Comes

2)IOT(Internet Of Things):

IOT connects physical device to Internet


and makes device smarter.

Example:
Smart TV's, Smart Ac's, Smart Car's etc.,

10
How Big Data Comes

3)Social Media:

Data generation on social media sites,

● Facebook likes,videos,photos,tags,comments etc.,


● Tweeter tweets,
● Youtube video uploads
● Instagram pics,
● Emails

11
How Big Data Comes

4)Other Factors:

● Retail
● Banking & Finance,
● Media & Entertainment
● Health care,
● Education areas,
● Government,
● Transportation, Insurance etc.

12
Types of Big Data

Big Data could be of three types:

● Structured
● Semi-Structured
● Unstructured

13
Types of Big Data
1. Structured data
❏ As the name suggests, this kind of data is structured and is well-defined. It has a consistent order that
can be easily understood by a computer or a human. This data can be stored, analyzed, and processed
using a fixed format. Usually, this kind of data has its own data model.
❏ You will find this kind of data in databases, where it is neatly stored in columns and rows. Two sources of
structured data are:
❏ Machine-generated data – This data is produced by machines such as sensors, network servers,
weblogs, GPS, etc.
❏ Human-generated data – This type of data is entered by the user in their system, such as personal
details, passwords, documents, etc. A search made by the user, items browsed online, and games
played are all human-generated information.
❏ For example, a database consisting of all the details of employees of a company is a type of structured
data set.

14
Types of Big Data
2. Unstructured data
❏ Any set of data that is not structured or well-defined is called unstructured data. This kind of data is
unorganized and difficult to handle, understand and analyze. It does not follow a consistent format and
may vary at different points of time. Most of the data you encounter comes under this category.
❏ For example, unstructured data are your comments, tweets, shares, posts, and likes on social media.
The videos you watch on YouTube and text messages you send via WhatsApp all pile up as a huge
heap

3. Semi-structured data
❏ This kind of data is somewhat structured but not completely. This may seem to be unstructured at first
and does not obey any formal structures of data models such as RDBMS. For example, NoSQL
documents have keywords that are used to process the document.
❏ CSV files are also considered semi-structured data.of unstructured data.

15
Big Data Characteristics

5 V's of Big Data

● Volume
● Veracity
● Variety
● Value
● Velocity

16
Big Data Characteristics

● VOLUME - Volume refers to the ‘amount of data’, which is growing day by day at a very fast pace.
The size of data generated by humans, machines and their interactions on social media itself is massive.
Facebook can generate approximately a billion messages, 4.5 billion times that the "Like" button is
recorded, and more than 350 million new posts are uploaded each day. Big data technologies can
handle large amounts of data

Amount of data being generating and generated.

17
Big Data Characteristics

● Variety - As there are many Different kinds of data , that is being generated from various sources
sources which are contributing to Big
Data, the type of data they are
generating is different. It can be
structured, semi-structured or
unstructured. Hence, there is a variety
of data which is getting generated
every day. Earlier, we used to get the
data from excel and databases, now
the data are coming in the form of
images, audios, videos, sensor data
etc. as shown in image. Hence, this
variety of unstructured data creates
problems in capturing, storage, mining
and analyzing the data.

18
Big Data Characteristics

● Velocity- The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and
processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business processes, application
logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.

Data is being generated in an alarming rate

19
Big Data Characteristics

● Value- Among the characteristics of Big Data, value is perhaps the most important. No matter how fast the
data is produced or its amount, it has to be reliable and useful. Otherwise, the data is not good enough for
processing or analysis. Research says that poor quality data can lead to almost a 20% loss in a company’s
revenue.
Data scientists first convert raw data into information. Then this data set is cleaned to retrieve the most useful
data. Analysis and pattern identification is done on this data set. If the process is a success, the data can be
considered to be valuable.

Mechanism to bring the correct meaning out of data

20
Big Data Characteristics

● Veracity - Veracity refers to the data in doubt or uncertainty of data available due to data
inconsistency and incompleteness. In the image below, you can see that few values are missing in
the table. Also, a few values are hard to accept, for example – 15000 minimum value in the 3rd row,
it is not possible. This inconsistency and incompleteness is Veracity.

Uncertainty and inconsistencies in the data, i.e., The quality of captured data can vary greatly, affecting accurate analysis.

21
Thank You

23

You might also like