Big Data Lecture 1
Big Data Lecture 1
Lecture 1
1
Data and Information 01
Data Units 02
Outline
How Big data comes 06
Information is a set of data which is processed in a meaningful way according to the given requirement.
3
Data Units
4
What is Big Data?
Big data is the term for collection of data sets so large and complex that it becomes difficult to
process using on-hand database system tools or traditional data processing applications
5
Small data vs Big data
The term "big data" is about machines and "small data" is about people. 6
How Big Data Comes
The below are the reasons behind the big data comes in picture:
1. Evolution of technology
2. IOT(Internet Of Things)
3. Social Media
4. Other factors
7
How Big Data Comes
1)Evolution of technology:
8
How Big Data Comes
2)IOT(Internet Of Things):
Example:
Smart TV's, Smart Ac's, Smart Car's etc.,
9
How Big Data Comes
3)Social Media:
10
How Big Data Comes
4)Other Factors:
● Retail
● Banking & Finance,
● Media & Entertainment
● Health care,
● Education areas,
● Government,
● Transportation, Insurance etc.
11
Types of Big Data
1. Structured data
❏ As the name suggests, this kind of data is structured and is well-defined. It has a consistent order that
can be easily understood by a computer or a human. This data can be stored, analyzed, and processed
using a fixed format. Usually, this kind of data has its own data model.
❏ You will find this kind of data in databases, where it is neatly stored in columns and rows. Two sources
of structured data are:
❏ Machine-generated data – This data is produced by machines such as sensors, network servers,
weblogs, GPS, etc.
❏ Human-generated data – This type of data is entered by the user in their system, such as personal
details, passwords, documents, etc. A search made by the user, items browsed online, and games
played are all human-generated information.
❏ For example, a database consisting of all the details of employees of a company is a type of structured
data set.
12
Types of Big Data
● Structured
● Semi-Structured
● Unstructured
13
Types of Big Data
2. Unstructured data
❏ Any set of data that is not structured or well-defined is called unstructured data. This kind of data is
unorganized and difficult to handle, understand and analyze. It does not follow a consistent format and
may vary at different points of time. Most of the data you encounter comes under this category.
❏ For example, unstructured data are your comments, tweets, shares, posts, and likes on social media.
The videos you watch on YouTube and text messages you send via WhatsApp all pile up as a huge
heap.
3. Semi-structured data
❏ This kind of data is somewhat structured but not completely. This may seem to be unstructured at first
and does not obey any formal structures of data models such as RDBMS. For example, NoSQL
documents have keywords that are used to process the document.
❏ CSV files are also considered semi-structured data.of unstructured data.
14
Big Data Characteristics
● Volume
● Veracity
● Variety
● Value
● Velocity
15
Big Data Characteristics
● Veracity - Veracity refers to the data in doubt or uncertainty of data available due to data
inconsistency and incompleteness. In the image below, you can see that few values are missing in
the table. Also, a few values are hard to accept, for example – 15000 minimum value in the 3rd
row, it is not possible. This inconsistency and incompleteness is Veracity.
Uncertainty and inconsistencies in the data, i.e., The quality of captured data can vary greatly, affecting accurate analysis.
16
Big Data Characteristics
● VOLUME - Volume refers to the ‘amount of data’, which is growing day by day at a very fast pace.
The size of data generated by humans, machines and their interactions on social media itself is
massive. Facebook can generate approximately a billion messages, 4.5 billion times that the "Like" button is
recorded, and more than 350 million new posts are uploaded each day. Big data technologies can handle large
amounts of data
17
Big Data Characteristics
● Variety - As there are many sources Different kinds of data , that is being generated from various sources
18
Big Data Characteristics
● Value- Among the characteristics of Big Data, value is perhaps the most important. No matter how fast the data is
produced or its amount, it has to be reliable and useful. Otherwise, the data is not good enough for processing or analysis.
Research says that poor quality data can lead to almost a 20% loss in a company’s revenue.
Data scientists first convert raw data into information. Then this data set is cleaned to retrieve the most useful data. Analysis
and pattern identification is done on this data set. If the process is a success, the data can be considered to be valuable.
19
Big Data Characteristics
● Velocity- The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and processed to
meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs,
networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.
20
Examples of Big Data
Daily we upload millions of bytes of data. 90 % of the world’s data has been created in last two years.
21
Thank You
22