0% found this document useful (0 votes)

52 views7 pages

BDA Assignment L9

This document discusses big data and cloud computing. It defines big data and its key characteristics of volume, variety, velocity and variability. It outlines six major challenges of big data including lack of understanding, data growth issues, tool selection confusion, lack of professionals, data security, and data integration. The document then distinguishes between IoT and big data. Finally, it provides an in-depth explanation of cloud computing preliminaries including its types, features, advantages and disadvantages.

Uploaded by

Bharath Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views7 pages

BDA Assignment L9

Uploaded by

Bharath Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

BIG DATA ASSIGNMENT -1

18131A05K4
Voonna Manideep
1. Explain Big Data. What are the characteristics of big data?

A) Big Data: Big Data is a collection of data that is huge in volume, yet growing
exponentially with time. It is a data with so large size and complexity that none of
traditional data management tools can store it or process it efficiently. Big data is also a
data but with huge size.
Examples: New York Stock Exchange, Social Media
Types of Big Data:
1. Structured: Any data that can be stored, accessed and processed in the form of fixed
sformat is termed as a ‘structured’ data.
Example: An employee table in a data base.
2. Unstructured: Any data with unknown form or the structure is classified as
unstructured data.
Example: Output generated by Google search.
3. Semi-structured: Semi-structured data can contain both the forms of data.
Example: Personal data stored in an XML file

Characteristics of Big Data:

Big data can be described by the following characteristics:

• Volume
• Variety
• Velocity
• Variability

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of
data plays a very crucial role in determining value out of data. Also, whether a particular
data can actually be considered as a Big Data or not, is dependent upon the volume of
data. Hence, ‘Volume’ is one characteristic which needs to be considered while dealing
with Big Data solutions.

(ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and
unstructured. During earlier days, spreadsheets and databases were the only sources of
data considered by most of the applications. Nowadays, data in the form of emails,

1|Page
photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the
analysis applications. This variety of unstructured data poses certain issues for storage,
mining and analyzing data.

(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the
data is generated and processed to meet the demands, determines real potential in the
data.

Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices,
etc. The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at
times, thus hampering the process of being able to handle and manage the data
effectively.
2. What are the challenges of big data?
A) There are 6 major challenges of big data.
They are:
1. Lack of proper understanding of big data:
Companies fail in their Big Data initiatives due to insufficient understanding.
Employees may not know what data is, its storage, processing, importance, and
sources. Data professionals may know what is going on, but others may not have a
clear picture.
For example, if employees do not understand the importance of data storage, they
might not keep the backup of sensitive data. They might not use databases properly
for storage. As a result, when this important data is required, it cannot be retrieved
easily.
2. Data growth issues:
One of the most pressing challenges of Big Data is storing all these huge sets of data
properly. The amount of data being stored in data centers and databases of companies
is increasing rapidly. As these data sets grow exponentially with time, it gets
extremely difficult to handle.
Most of the data is unstructured and comes from documents, videos, audios, text files
and other sources. This means that you cannot find them in databases.
3. Confusion while big data tool selection:
Companies often get confused while selecting the best tool for Big Data analysis and
storage. Is HBase or Cassandra the best technology for data storage? Is Hadoop
MapReduce good enough or will Spark be a better option for data analytics and
storage?
These questions bother companies and sometimes they are unable to find the answers.
They end up making poor decisions and selecting an inappropriate technology. As a
result, money, time, efforts and work hours are wasted.
4. Lack of data professionals:
To run these modern technologies and Big Data tools, companies need skilled data
professionals. These professionals will include data scientists, data analysts and data

2|Page
engineers who are experienced in working with the tools and making sense out of
huge data sets.
Companies face a problem of lack of Big Data professionals. This is because data
handling tools have evolved rapidly, but in most cases, the professionals have not.
Actionable steps need to be taken in order to bridge this gap.
5. Securing data:
Securing these huge sets of data is one of the daunting challenges of Big Data. Often
companies are so busy in understanding, storing and analyzing their data sets that
they push data security for later stages. But, this is not a smart move as unprotected
data repositories can become breeding grounds for malicious hackers.
6. Integrating data from a variety of sources:
Data in an organization comes from a variety of sources, such as social media pages,
ERP applications, customer logs, financial reports, e-mails, presentations and reports
created by employees. Combining all this data to prepare reports is a challenging task.
This is an area often neglected by firms. But, data integration is crucial for analysis,
reporting and business intelligence, so it has to be perfect.

3. Distinguish between IOT and Big Data

IOT BIG DATA

IOT is a global system of interrelated Big Data refers to massive volumes of data
computing devices that are able to sense, generated from a variety of sources that is so
collect, and exchange data over the Internet large to process using traditional techniques.
IOT collects, analyzes and processes data The data streams are not subjected to
streams at real-time without any delay to processing at real time and there is a delay
make control decision an effective manner. between when the data is collected and when
it is processed.
The concept is to provide interconnection The concept is to find insights in new and
between devices to create a smart emerging types of data and content that lead
environment, making machines smart enough to better decisions and strategic business
to bypass human effort. moves.
IOT involves analyzing machine-generated Big Data deals with human-generated data
data such as sensors in home appliances and such as social media usage, photos and
so on. videos, etc.

4. What are the cloud computing preliminaries explained in

detail?
A) Cloud Computing: Cloud computing is a distribution and sending of on-demand
computing services from one application to storage and typically over the internet and on
a pay as you go basis.
Features of Cloud Computing:

3|Page
i. It’s managed
ii. It’s on-demand
iii. It’s public or private
Types of Cloud Computing: There are 3 types of cloud computing
i. Infrastructure as a Service (IaaS): means you're buying access to raw computing
hardware over the Net, such as servers or storage. Since you buy what you need
and pay-as-you-go, this is often referred to as utility computing. Ordinary web
hosting is a simple example of IaaS: you pay a monthly subscription or a per-
megabyte/gigabyte fee to have a hosting company serve up files for your website
from their servers.
ii. Software as a Service (SaaS): means you use a complete application running on
someone else's system. Web-based email and Google Documents are perhaps the
best-known examples. Zoho is another well-known SaaS provider offering a
variety of office applications online.
iii. Platform as a Service (PaaS): means you develop applications using Web-based
tools so they run on systems software and hardware provided by another
company. So, for example, you might develop your own ecommerce website but
have the whole thing, including the shopping cart, checkout, and payment
mechanism running on a merchant's server. App Cloud and the Google App
Engine are examples of PaaS.
Advantages of Cloud Computing:
1) Cost Savings
2) Security
3) Flexibility
4) Mobility
5) Insight
6) Increased Collaboration
7) Quality Control
8) Disaster Recovery
9) Loss Prevention
10) Automatic Software Updates
11) Competitive Edge
12) Sustainability
Disadvantages of Cloud Computing:
1) Network Connection Dependency
2) Limited Features
3) Loss of Control
4) Security
5) Technical issues

5. What is big data generation?

A) Big data generation means generation of data in large quantities from various sources.
Big Data generates data from three fonts: People, Machines, and Corporations.
Internet data: Generate data from Social Networks (Facebook, Twitter, Instagram,
LinkedIn), emails, Internet, documents, blogs, among others.

4|Page
IOT data: Generate data from sensors, satellites, computer log files, cameras, genetic
sequencing machines, space telescopes, probes, among others
Corporations: Generated data from transactions administrative system, credit cards,
financial system, accountability, e-commerce, sales, medical records, research, among
others.

6. What is big data acquisition and data collection?

A) Data Acquisition:
Data acquisition has been understood as the process of gathering, filtering, and cleaning
data before the data is put in a data warehouse or any other storage solution. The
acquisition of big data is most commonly governed by four of the Vs: volume, velocity,
variety, and value. Most data acquisition scenarios assume high-volume, high-velocity,
high-variety, but low-value data, making it important to have adaptable and time-efficient
gathering, filtering, and cleaning algorithms that ensure that only the high-value
fragments of the data are actually processed by the data-warehouse analysis.
Data Collection:
During data collection, the researchers must identify the data types, the sources of data,
and what methods are being used.
There are 2 methods of Data Collection:
i. Primary: As the name implies, this is original, first-hand data collected by the
data researchers. This process is the initial information gathering step, performed
before anyone carries out any further or related research. Primary data results are
highly accurate provided the researcher collects the information. However, there’s
a downside, as first-hand research is potentially time-consuming and expensive.
ii. Secondary: Secondary data is second-hand data collected by other parties and
already having undergone statistical analysis. This data is either information that
the researcher has tasked other people to collect or information the researcher has
looked up. Simply put, it’s second-hand information. Although it’s easier and
cheaper to obtain than primary information, secondary information raises
concerns regarding accuracy and authenticity. Quantitative data makes up a
majority of secondary data.

7. What are data preprocessing techniques?

A) Data preprocessing involves 4 techniques:

They are:

a. Data Cleaning

b. Data Integration

c. Data Transformation

5|Page
d. Data Reduction

Data Cleaning: Data cleaning methods aim to fill in missing values, smooth out noise
while identifying outliers, and fix data discrepancies. Unclean data can confuse data and
the model. Therefore, running the data through various Data Cleaning/Cleansing methods
is an important Data Preprocessing step.

Data Integration: It is involved in a data analysis task that combines data from multiple
sources into a coherent data store. These sources may include multiple
databases. Databases and Data warehouses have Metadata (It is the data about data) it
helps in avoiding errors.

Data Transformation: This stage is used to convert the data into a format that can be
used in the mining process. This is done in the following ways:

1. Normalization: It is done to scale the data values in a specified range (-1.0 to

1.0 or 0.0 to 1.0)

2. Concept Hierarchy Generation: Using concept hierarchies, low-level or

primitive/raw data is substituted with higher-level concepts in data generalization.
Categorical qualities, for example, are generalized to higher-level notions such as
street, city, and nation. Similarly, numeric attribute values can be translated to
higher-level concepts like age, such as youthful, middle-aged, or elderly.

3. Smoothing: Smoothing works to remove the noise from the data. Such
techniques include binning, clustering, and regression.

4. Aggregation: Aggregation is the process of applying summary or aggregation

operations on data. Daily sales data, for example, might be combined to calculate
monthly and annual totals.

Data Reduction: Because data mining is a methodology for dealing with large amounts of
data. When dealing with large amounts of data, analysis becomes more difficult. We
employ a data reduction technique to get rid of this. Its goal is to improve storage
efficiency while lowering data storage and analysis expenses. There are 2 types of data
reduction. They are Dimensionality Reduction and Numerosity Reduction.

6|Page
7|Page

BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
HP Laserjet Pro M402 M403 M426 M427 Repair Manual TOC
100% (1)
HP Laserjet Pro M402 M403 M426 M427 Repair Manual TOC
28 pages
Eaton 93PS: More Advanced Power Protection
No ratings yet
Eaton 93PS: More Advanced Power Protection
10 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Class 12 Board Revision Study material union and red and green and red and green tea and red and green and white uniform and green tea extract of the best daa for the best daa for the best daa for the best daa for the best of CT to be a good but physics and science ? ok I think I think ? ? ? ? ? ? ? ? rial With Work sheet of the jee session doesn't match the requested information about your future life and red and green ?? ok va va ✨ and the requested documents to be the love song Tamil Nadu and red and green tea extract of the best daa for the best daa for the best daa for the best daa for the best daa for the
No ratings yet
Class 12 Board Revision Study material union and red and green and red and green tea and red and green and white uniform and green tea extract of the best daa for the best daa for the best daa for the best daa for the best of CT to be a good but physics and science ? ok I think I think ? ? ? ? ? ? ? ? rial With Work sheet of the jee session doesn't match the requested information about your future life and red and green ?? ok va va ✨ and the requested documents to be the love song Tamil Nadu and red and green tea extract of the best daa for the best daa for the best daa for the best daa for the best daa for the
193 pages
BDA AK
No ratings yet
BDA AK
107 pages
Document 1
No ratings yet
Document 1
9 pages
Data Science Vs Big Data
No ratings yet
Data Science Vs Big Data
34 pages
Bda - Unit 1
No ratings yet
Bda - Unit 1
33 pages
Unit-1 BDA
No ratings yet
Unit-1 BDA
30 pages
Big-Data-sent-24-10-24 (2)
No ratings yet
Big-Data-sent-24-10-24 (2)
49 pages
$RM5TSDQ
No ratings yet
$RM5TSDQ
70 pages
Hadoop 2 & 3 Units Final
No ratings yet
Hadoop 2 & 3 Units Final
27 pages
67163118e98feCCWeek-03Lecture05
No ratings yet
67163118e98feCCWeek-03Lecture05
62 pages
Cloud Computing Assign # 3
No ratings yet
Cloud Computing Assign # 3
21 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
Unit-III CC&BD Cs62 Ab
No ratings yet
Unit-III CC&BD Cs62 Ab
85 pages
Big Data: Abstract
No ratings yet
Big Data: Abstract
15 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
TCS Interview Questions and Answers
No ratings yet
TCS Interview Questions and Answers
5 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
P.Prabu (31x61c) CCS334-BDA.Unit-1
No ratings yet
P.Prabu (31x61c) CCS334-BDA.Unit-1
32 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Ist0769v3 4
No ratings yet
Ist0769v3 4
113 pages
Unit 1 Understanding Big Data
No ratings yet
Unit 1 Understanding Big Data
17 pages
# What is Big Data
No ratings yet
# What is Big Data
10 pages
Seminar_Report kiran
No ratings yet
Seminar_Report kiran
14 pages
Big Data12
No ratings yet
Big Data12
11 pages
P.prabu (31x61c) CCS334 BDA - Unit 1
No ratings yet
P.prabu (31x61c) CCS334 BDA - Unit 1
31 pages
Online Pharmacy Synopsis
No ratings yet
Online Pharmacy Synopsis
18 pages
CS401 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS401 Mcqs MidTerm by Vu Topper RM
64 pages
UNIT I BDA
No ratings yet
UNIT I BDA
18 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Unit 5
No ratings yet
Unit 5
63 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Big Data in Business
No ratings yet
Big Data in Business
11 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
Sem Csen1301
No ratings yet
Sem Csen1301
12 pages
TB TrackEssentials v3
No ratings yet
TB TrackEssentials v3
21 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Bitwarden Network Security Assessment Report - 2020
No ratings yet
Bitwarden Network Security Assessment Report - 2020
8 pages
BigData_UNIT-1.docx
No ratings yet
BigData_UNIT-1.docx
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
UNIT 4(class notes
No ratings yet
UNIT 4(class notes
28 pages
Invitation To Submit To 2nd International Workshop On Blockchain Security
No ratings yet
Invitation To Submit To 2nd International Workshop On Blockchain Security
3 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
H3C S6860-CMW710-R2702 版本说明书
No ratings yet
H3C S6860-CMW710-R2702 版本说明书
407 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Unit 1 and Unit 2 notes bda
No ratings yet
Unit 1 and Unit 2 notes bda
11 pages
Big Data UNIT1
No ratings yet
Big Data UNIT1
23 pages
G12 It Unit 2
No ratings yet
G12 It Unit 2
30 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Advanced DataBase Assignment
No ratings yet
Advanced DataBase Assignment
8 pages
Function Generator PDF
No ratings yet
Function Generator PDF
8 pages
Selvam J. Practice With ESP32 Project... 2022
No ratings yet
Selvam J. Practice With ESP32 Project... 2022
247 pages
Plc&Scada Merged
No ratings yet
Plc&Scada Merged
823 pages
Report of Big Data
No ratings yet
Report of Big Data
14 pages
AMI Manual KW 909-FM-Iss04
No ratings yet
AMI Manual KW 909-FM-Iss04
10 pages
BD 1
No ratings yet
BD 1
15 pages
117769
No ratings yet
117769
20 pages
SA 101 System Adminitrator
No ratings yet
SA 101 System Adminitrator
16 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Python
No ratings yet
Python
68 pages
CEM DT-986S Thermal Imager Catalogue
No ratings yet
CEM DT-986S Thermal Imager Catalogue
3 pages
Ractice Roblems O S: I D P: Perating Ystems Nternals and Esign Rinciples S E
100% (1)
Ractice Roblems O S: I D P: Perating Ystems Nternals and Esign Rinciples S E
28 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
BDA
No ratings yet
BDA
148 pages
Process Management, Threads, Process Scheduling - Operating Systems
67% (3)
Process Management, Threads, Process Scheduling - Operating Systems
21 pages
Questions 1
No ratings yet
Questions 1
50 pages
Educ Tech 2 Chapter 8
No ratings yet
Educ Tech 2 Chapter 8
2 pages
Fundamentals of Big Data JUNE 2022
No ratings yet
Fundamentals of Big Data JUNE 2022
11 pages
Module 3 Aws
No ratings yet
Module 3 Aws
132 pages
BIEXpertMagazine HANA Ops Berg v7 PDF
No ratings yet
BIEXpertMagazine HANA Ops Berg v7 PDF
23 pages
CPT 31624 Cci Edits
No ratings yet
CPT 31624 Cci Edits
2 pages
Taming - Io - The Online Multiplayer Survival Game With Pets!
No ratings yet
Taming - Io - The Online Multiplayer Survival Game With Pets!
1 page
Big Data
No ratings yet
Big Data
11 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
SAP ABAP Interview Questions
60% (5)
SAP ABAP Interview Questions
8 pages
5 e Commerce Intro 29slide
No ratings yet
5 e Commerce Intro 29slide
29 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Ccs 334
No ratings yet
Ccs 334
16 pages
SmartForm - Invoice Tutorial
100% (8)
SmartForm - Invoice Tutorial
17 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

BDA Assignment L9

Uploaded by

BDA Assignment L9

Uploaded by

BIG DATA ASSIGNMENT -1

Characteristics of Big Data:

Big data can be described by the following characteristics:

(ii) Variety – The next aspect of Big Data is its variety.

3. Distinguish between IOT and Big Data

IOT BIG DATA

4. What are the cloud computing preliminaries explained in

5. What is big data generation?

6. What is big data acquisition and data collection?

7. What are data preprocessing techniques?

1. Normalization: It is done to scale the data values in a specified range (-1.0 to

2. Concept Hierarchy Generation: Using concept hierarchies, low-level or

4. Aggregation: Aggregation is the process of applying summary or aggregation

You might also like