0% found this document useful (0 votes)

42 views

Unit 5 Concepts of Big Data and Data Lake

Uploaded by

jaysukhv234

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Unit 5 Concepts of Big Data and Data Lake

Uploaded by

jaysukhv234

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit 5 Concepts of Big Data and Data Lake

What is Big Data?

Data which are very large in size is called Big Data. Normally we work on data of size
MB(WordDoc ,Excel) or maximum GB(Movies, Codes) but data in Peta bytes i.e. 10^15 byte size
is called Big Data. It is stated that almost 90% of today's data has been generated in the past 3
years.

Types Of Big Data

Structured

• Any data that can be stored, accessed and processed in the form of fixed format is
termed as a ‘structured’ data. Over the period of time, talent in computer science has
achieved greater success in developing techniques for working with such kind of data
(where the format is well known in advance) and also deriving value out of it. However,
nowadays, we are foreseeing issues when a size of such data grows to a huge extent,
typical sizes are being in the rage of multiple zettabytes.

Unstructured

• Any data with unknown form or the structure is classified as unstructured data. In
addition to the size being huge, un-structured data poses multiple challenges in terms of
its processing for deriving value out of it. A typical example of unstructured data is a
heterogeneous data source containing a combination of simple text files, images, videos
etc. Now day organizations have wealth of data available with them but unfortunately,
they don’t know how to derive value out of it since this data is in its raw form or
unstructured format.

Semi-structured

• Semi-structured data can contain both the forms of data. We can see semi-structured
data as a structured in form but it is actually not defined with e.g. a table definition in
relational DBMS. Example of semi-structured data is a data represented in an XML file.

Characteristics (4V’S)
• (i) Volume – The name Big Data itself is related to a size which is enormous. Size of data
plays a very crucial role in determining value out of data. Also, whether a particular data
can actually be considered as a Big Data or not, is dependent upon the volume of data.
Hence, ‘Volume’ is one characteristic which needs to be considered while dealing with
Big Data solutions.

Prep By Akansha Srivastav Page 1

Unit 5 Concepts of Big Data and Data Lake

• (ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and
unstructured. During earlier days, spreadsheets and databases were the only sources of
data considered by most of the applications. Nowadays, data in the form of emails,
photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the
analysis applications. This variety of unstructured data poses certain issues for storage,
mining and analyzing data.

(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast
the data is generated and processed to meet the demands, determines real potential in
the data.

Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices,
etc. The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at
times, thus hampering the process of being able to handle and manage the data
effectively.

Sources of Big Data

These data come from many sources like

o Social networking sites: Facebook, Google, LinkedIn all these sites generates huge
amount of data on a day to day basis as they have billions of users worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs
from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very huge data which are
stored and manipulated to forecast weather.
o Telecom company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
o Share Market: Stock exchange across the world generates huge amount of data
through its daily transaction.

Prep By Akansha Srivastav Page 2

Unit 5 Concepts of Big Data and Data Lake

Traditional data

o Traditional data refers to structured data which is collected and stored in formats like
databases, spreadsheets, etc. Such data includes customer information, inventory
records, financial statements, etc.
o This data is stored in relational databases such as SQL and other traditional data analysis
tools. It can be easily processed and manually analyzed using traditional methods to gain
insight into business operations. It can also be used to create reports and visualizations,
so it plays a vital role in making the right profitable decisions after understanding the
trends and patterns in the data.

The main differences between traditional data and big data are as
follows:

Traditional Data Big Data

It is usually a small amount of data that can be It is usually a big amount of data that cannot be
collected and analyzed using traditional processed and analyzed easily using traditional
methods easily. methods.

It is usually structured data and can be stored in It includes semi-structured, unstructured, and
spreadsheets, databases, etc. structured data.

It often collects data manually. It collects information automatically with the use
of automated systems.

It usually comes from internal systems. It comes from various sources such as mobile
devices, social media, etc.

It consists of data such as customer information, It consists of data such as images, videos, etc.
financial transactions, etc.

Analysis of traditional data can be done with the Analysis of big data needs advanced analytics
use of primary statistical methods. methods such as machine learning, data mining,
etc.

Traditional methods to analyze data are slow Methods to analyze big data are fast and instant.
and gradual.

It generates data after the happening of an It generates data every second.

event.

It is typically processed in batches. It is developed and processed in real-time.

It is limited in its value and insights. It provides valuable insights and patterns for

Prep By Akansha Srivastav Page 3

Unit 5 Concepts of Big Data and Data Lake

good decision-making.

It contains reliable and accurate data. It may contain unreliable, inconsistent, or

inaccurate data because of its size and
complexity.

It is used for simple and small business It is used for complex and big business processes.
processes.

It does not provide in-depth insights. It provides in-depth insights.

It is easy to secure and protect than big data It is harder to secure and protect than traditional
because of its small size and simplicity. data because of its size and complexity.

It requires less time and money to store It requires more time and money to store big
traditional data. data.

It can be stored on a single computer or server. It requires distributed storage across numerous
systems.

It is less efficient than big data. It is more efficient than traditional data.

It can be managed in a centralized structure It requires a decentralized infrastructure to

easily. manage the data.

What is a Data Warehouse?

A Data Warehouse (DW) is a relational database that is designed for query and analysis rather
than transaction processing. It includes historical data derived from transaction data from single
and multiple sources.

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on

providing support for decision-makers for data modeling and analysis.

A Data Warehouse is a group of data specific to the entire organization, not only to a particular
group of users.

It is not used for daily operations and transaction processing but used for making decisions.

A Data Warehouse can be viewed as a data system with the following attributes:

o It is a database designed for investigative tasks, using data from various applications.
o It supports a relatively small number of clients with relatively long interactions.
o It includes current and historical data to provide a historical perspective of information.

Prep By Akansha Srivastav Page 4

Unit 5 Concepts of Big Data and Data Lake

o Its usage is read-intensive.

o It contains a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in

support of management's decisions."

Concept of Data Processing Techniques

Online Transaction Processing (OLTP)
o OLTP is an operational system that supports transaction-oriented applications in a 3-tier
architecture. It administers the day to day transaction of an organization. OLTP is
basically focused on query processing, maintaining data integrity in multi-access
environments as well as effectiveness that is measured by the total number of
transactions per second. The full form of OLTP is Online Transaction Processing.

Characteristics of OLTP
Following are important characteristics of OLTP:

o OLTP uses transactions that include small amounts of data.

o Indexed data in the database can be accessed easily.
o OLTP has a large number of users.
o It has fast response times
o Databases are directly accessible to end-users
o OLTP uses a fully normalized schema for database consistency.
o The response time of OLTP system is short.
o It strictly performs only the predefined operations on a small number of records.
o OLTP stores the records of the last few days or a week.
o It supports complex data models and tables.

Prep By Akansha Srivastav Page 5

Unit 5 Concepts of Big Data and Data Lake

Type of queries that an OLTP system can Process

OLTP system is an online database changing system. Therefore, it supports database query such
as insert, update, and delete information from the database.

POS system for OLTP

Consider a point of sale system of a supermarket, following are the sample queries that this
system can process:

o Retrieving the description of a particular product.

o Filtering all products related to the supplier.
o Searching the record of the customer.
o Listing products having a price less than the expected amount.

Prep By Akansha Srivastav Page 6

Unit 5 Concepts of Big Data and Data Lake

Architecture of OLTP
Here is the architecture of OLTP:

OLTP Architecture
1. Business / Enterprise Strategy: Enterprise strategy deals with the issues that affect the
organization as a whole. In OLTP, it is typically developed at a high level within the firm,
by the board of directors or the top management
2. Business Process: OLTP business process is a set of activities and tasks that, once
completed, will accomplish an organizational goal.
3. Customers, Orders, and Products: OLTP database store information about products,
orders (transactions), customers (buyers), suppliers (sellers), and employees.

Prep By Akansha Srivastav Page 7

Unit 5 Concepts of Big Data and Data Lake

4. ETL Processes: It separates the data from various RDBMS source systems, then
transforms the data (like applying concatenations, calculations, etc.) and loads the
processed data into the Data Warehouse system.
5. Data Mart and Data warehouse: A Data Mart is a structure/access pattern specific to
data warehouse environments. It is used by OLAP to store processed data.
6. Data Mining, Analytics, and Decision Making: Data stored in the data mart and data
warehouse can be used for data mining, analytics, and decision making. This data helps
you to discover data patterns, analyze raw data, and make analytical decisions for your
organization’s growth.

Example of OLTP Transaction

An example of the OLTP system is the ATM center. Assume that a couple has a joint account
with a bank. One day both simultaneously reach different ATM centers at precisely the same
time and want to withdraw the total amount present in their bank account.

OLTP for ATM image

However, the person that completes the authentication process first will be able to get money.
In this case, the OLTP system makes sure that the withdrawn amount will be never more than
the amount present in the bank. The key to note here is that OLTP systems are optimized for
transactional superiority instead of data analysis.
Other examples of OLTP system are:

 Online banking
 Online airline ticket booking
 Sending a text message
 Order entry
 Add a book to shopping cart

Prep By Akansha Srivastav Page 8

Unit 5 Concepts of Big Data and Data Lake

Advantages of OLTP
Following are the pros/benefits of OLTP system:

 OLTP offers accurate forecast for revenue and expense.

 It provides a solid foundation for a stable business /organization due to timely
modification of all transactions.
 OLTP makes transactions much easier on behalf of the customers.
 It broadens the client base for an organization by speeding up and simplifying individual
processes.
 OLTP provides support for bigger databases.
 Partition of data for data manipulation is easy.
 We need OLTP to use the tasks which are frequently performed by the system.
 When we need only a small number of records.
 The tasks that include insertion, updation, or deletion of data.
 It is used when you need consistency and concurrency in order to perform tasks that
ensure its greater availability.

Disadvantages of OLTP
Here are cons/drawbacks of OLTP system:

 If the OLTP system faces hardware failures, then online transactions get severely affected.
 OLTP systems allow multiple users to access and change the same data at the same time,
which many times created an unprecedented situation.
 If the server hangs for seconds, it can affect to a large number of transactions.
 OLTP required a lot of staff working in groups in order to maintain inventory.
 Online Transaction Processing Systems do not have proper methods of transferring
products to buyers by themselves.
 OLTP makes the database much more susceptible to hackers and intruders.
 In B2B transactions, there are chances that both buyers and suppliers miss out efficiency
advantages that the system offers.
 Server failure may lead to wiping out large amounts of data from the database.
 You can perform a limited number of queries and updates.

Prep By Akansha Srivastav Page 9

Unit 5 Concepts of Big Data and Data Lake

OLAP(Online Analytics Processing)

Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It
allows managers, and analysts to get an insight of the information through fast, consistent, and
interactive access to information.

Who uses OLAP and Why

OLAP applications are used by a variety of the functions of an organization.

Finance and accounting:
 Budgeting
 Activity-based costing
 Financial performance analysis
 And financial modeling
Sales and Marketing
 Sales analysis and forecasting
 Market research analysis
 Promotion analysis
 Customer analysis
 Market and customer segmentation
Production
 Production planning
 Defect analysis

Prep By Akansha Srivastav Page 10

Unit 5 Concepts of Big Data and Data Lake

Advantages

 The advantages of OLAP are as follows −

 Business-centred multidimensional information.
 Business-centred figuring’s.
 Dependable information and figuring’s.
 Speed-of-thought examination.
 Adaptable, self-administration detailing.

Disadvantages

 Pre-demonstrating is an absolute necessity. As to business information, the traditional

OLAP tools don't take into consideration quick investigation without pre-demonstrating.
 Extraordinary reliance on IT.
 Helpless calculation capacity.
 Shy Interactive examination capacity.
 Slow in responding.
 Theoretical model.
 Extraordinary, expected danger.

OLTP vs. OLAP

Here is the important difference between OLTP and OLAP:

OLTP OLAP

OLAP is an online analysis and data retrieving

OLTP is an online transactional system.
process.

It is characterized by large numbers of short

It is characterized by a large volume of data.
online transactions.

OLAP is an online database query management

OLTP is an online database modifying system.
system.

OLTP uses traditional DBMS. OLAP uses the data warehouse.

Insert, Update, and Delete information from the

Mostly select operations
database.

OLTP and its transactions are the sources of Different OLTP databases become the source of data

Prep By Akansha Srivastav Page 11

Unit 5 Concepts of Big Data and Data Lake

OLTP OLAP

data. for OLAP.

OLTP database must maintain data integrity OLAP database does not get frequently modified.
constraints. Hence, data integrity is not an issue.

It’s response time is in a millisecond. Response time in seconds to minutes.

The data in the OLTP database is always detailed The data in the OLAP process might not be
and organized. organized.

Allow read/write operations. Only read and rarely write.

It is a market-orientated process. It is a customer orientated process.

Queries in this process are standardized and

Complex queries involving aggregations.
simple.

Complete backup of the data combined with OLAP only need a backup from time to time. Backup
incremental backups. is not important compared to OLTP

DB design is an application-oriented example: DB design is subject-oriented. Example: Database

Database design changes with the industry like design changes with subjects like sales, marketing,
retail, airline, banking, etc. purchasing, etc.

It is used by Data critical users like clerk, DBA & It is used by Data knowledge users like workers,
Data Base professionals. managers, and CEO.

It is designed for analysis of business measures by

It is designed for real time business operations.
category and attributes.

Transaction throughput is the performance

Query throughput is the performance metric.
metric

This kind of Database user allows thousands of

This kind of Database allows only hundreds of users.
users.

It helps to Increase user’s self-service and Help to Increase the productivity of business
productivity analysts.

Data Warehouses historically have been a An OLAP cube is not an open SQL server data

Prep By Akansha Srivastav Page 12

Unit 5 Concepts of Big Data and Data Lake

OLTP OLAP

development project which may prove costly to warehouse. Therefore, technical knowledge and
build. experience are essential to managing the OLAP
server.

It ensures that response to the query is quicker

It provides a fast result for daily used data.
consistently.

It lets the user create a view with the help of a

It is easy to create and maintain.
spreadsheet.

A data warehouse is created uniquely so that it can

OLTP is designed to have fast response time, low
integrate different data sources for building a
data redundancy, and is normalized.
consolidated database

Concept of Data Lake

A Data Lake is a storage repository that can store large amount of structured, semi-structured,
and unstructured data. It is a place to store every type of data in its native format with no fixed
limits on account size or file. It offers high data quantity to increase analytic performance and
native integration.

Data Lake is like a large container which is very similar to real lake and rivers. Just like in a lake
you have multiple tributaries coming in, a data lake has structured data, unstructured data,
machine to machine, logs flowing through in real-time.

Prep By Akansha Srivastav Page 13

Unit 5 Concepts of Big Data and Data Lake

• Ingestion Tier: The tiers on the left side depict the data sources. The data could be

loaded into the data lake in batches or in real-time

• Insights Tier: The tiers on the right represent the research side where insights from the

system are used. SQL, NoSQL queries, or even excel could be used for data analysis.

• HDFS is a cost-effective solution for both structured and unstructured data. It is a

landing zone for all data that is at rest in the system.

• Distillation tier takes data from the storage tire and converts it to structured data for

easier analysis.

• Processing tier run analytical algorithms and users queries with varying real time,

interactive, batch to generate structured data for easier analysis.

Prep By Akansha Srivastav Page 14

Unit 5 Concepts of Big Data and Data Lake

• Unified operations tier governs system management and monitoring. It includes auditing

and proficiency management, data management, workflow management.

Difference Between Data Lakes &Data Warehouse

Prep By Akansha Srivastav Page 15

Logix Diagnostics - Proces-Rm003 - En-P
No ratings yet
Logix Diagnostics - Proces-Rm003 - En-P
104 pages
BBP - RET - 001 - Article Master Data Management - V 1.2
No ratings yet
BBP - RET - 001 - Article Master Data Management - V 1.2
54 pages
Manual Del Actor Constantin Stanislavski PDF
0% (10)
Manual Del Actor Constantin Stanislavski PDF
3 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
21 pages
Introduction to big data
No ratings yet
Introduction to big data
21 pages
Unit I-KCS-061
No ratings yet
Unit I-KCS-061
42 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
37 pages
Advanced DataBase Assignment
No ratings yet
Advanced DataBase Assignment
8 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
BIG DATA Technology: Subtitle
No ratings yet
BIG DATA Technology: Subtitle
34 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
(Ca) Bda Unit-I
No ratings yet
(Ca) Bda Unit-I
10 pages
Data Science Vs Big Data
No ratings yet
Data Science Vs Big Data
34 pages
21ai402 Data Analytics Unit-1
No ratings yet
21ai402 Data Analytics Unit-1
37 pages
Bda Module 1 Notes
No ratings yet
Bda Module 1 Notes
10 pages
Big Data Analytic
No ratings yet
Big Data Analytic
10 pages
Big Data Analytics Unit 1
No ratings yet
Big Data Analytics Unit 1
26 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
37 pages
BigData_BCom-Unit-1
No ratings yet
BigData_BCom-Unit-1
9 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
BigData_BCom
No ratings yet
BigData_BCom
57 pages
BDU1
No ratings yet
BDU1
39 pages
Unit-5 DS
No ratings yet
Unit-5 DS
20 pages
Ds Assignment
No ratings yet
Ds Assignment
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
Unit 1 What Is Big Data
No ratings yet
Unit 1 What Is Big Data
26 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
38 pages
UNIT I notes
No ratings yet
UNIT I notes
26 pages
117769
No ratings yet
117769
20 pages
BDA notes part 1
No ratings yet
BDA notes part 1
11 pages
IMP Questions pdf in Big Data
No ratings yet
IMP Questions pdf in Big Data
15 pages
mod 3
No ratings yet
mod 3
96 pages
Unit 5
No ratings yet
Unit 5
63 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
Big Data
No ratings yet
Big Data
14 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
BDA Assignment L9
No ratings yet
BDA Assignment L9
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
64 pages
Module 2-4
No ratings yet
Module 2-4
16 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
No ratings yet
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
6 pages
Big Data Analytics - Complete Notes
No ratings yet
Big Data Analytics - Complete Notes
136 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Module 3 - Business Analytics
No ratings yet
Module 3 - Business Analytics
34 pages
Big Data Cat 1
No ratings yet
Big Data Cat 1
11 pages
DS Assignment
No ratings yet
DS Assignment
31 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Subject: Port Information Systems and Platforms: Proposed By: Prof Tali
No ratings yet
Subject: Port Information Systems and Platforms: Proposed By: Prof Tali
9 pages
BDS Module-1
No ratings yet
BDS Module-1
59 pages
Unit 1
No ratings yet
Unit 1
17 pages
BDA Answerbank
No ratings yet
BDA Answerbank
71 pages
What Is Data
No ratings yet
What Is Data
20 pages
Introduction
No ratings yet
Introduction
10 pages
Big Data12
No ratings yet
Big Data12
11 pages
R19 BDA UNIT-1
No ratings yet
R19 BDA UNIT-1
22 pages
1,2 UNITS NOTES
No ratings yet
1,2 UNITS NOTES
53 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Schlumberger "Greenbook": Geolog 7 - Paradigm™ 2011 With Epos 4.1 Data Management Schlumberger "Greenbook" 4-1
No ratings yet
Schlumberger "Greenbook": Geolog 7 - Paradigm™ 2011 With Epos 4.1 Data Management Schlumberger "Greenbook" 4-1
7 pages
ps4 Error Codes
No ratings yet
ps4 Error Codes
9 pages
Telecalling Script CFO Conclave
No ratings yet
Telecalling Script CFO Conclave
2 pages
System View Display Current Configuratio
No ratings yet
System View Display Current Configuratio
14 pages
DLD - Chap1 - 2 (2nd Semester 2020-2021)
No ratings yet
DLD - Chap1 - 2 (2nd Semester 2020-2021)
96 pages
United States Patent: (10) Patent No.: US 9.229,444 B2
No ratings yet
United States Patent: (10) Patent No.: US 9.229,444 B2
16 pages
MATLAB Radar Assignment
No ratings yet
MATLAB Radar Assignment
11 pages
Free Wi-Fi Internet Access: "Koedo Sawara"
No ratings yet
Free Wi-Fi Internet Access: "Koedo Sawara"
2 pages
Manual de Reparación Pedal Boss GT-1
No ratings yet
Manual de Reparación Pedal Boss GT-1
29 pages
Pi Cossmil Soporte
No ratings yet
Pi Cossmil Soporte
90 pages
Excel Alarm Remind
No ratings yet
Excel Alarm Remind
2 pages
Automotive Electronics - Vol 2
100% (17)
Automotive Electronics - Vol 2
40 pages
Library Management: Article Information
No ratings yet
Library Management: Article Information
14 pages
Ico300 83b
No ratings yet
Ico300 83b
2 pages
English Application Form
No ratings yet
English Application Form
4 pages
Digital Rad Revd
No ratings yet
Digital Rad Revd
79 pages
Utility Dxing: Krypto500: Monitoring Quarterly 1
No ratings yet
Utility Dxing: Krypto500: Monitoring Quarterly 1
41 pages
Basics of Gaming
No ratings yet
Basics of Gaming
3 pages
MAC21_E
No ratings yet
MAC21_E
32 pages
Bridgeing The Gap Between Academia and Industry
No ratings yet
Bridgeing The Gap Between Academia and Industry
3 pages
Default) Default Is Returned If Key Is Not
No ratings yet
Default) Default Is Returned If Key Is Not
3 pages
Cambridge IGCSE ICT - Chapter 2 and 3 - Input & Output Devices - Storage Devices and Media
No ratings yet
Cambridge IGCSE ICT - Chapter 2 and 3 - Input & Output Devices - Storage Devices and Media
56 pages
Module 2 Answers
No ratings yet
Module 2 Answers
4 pages
Manual Guide: 32-Bit High-Speed CPU V3.00
No ratings yet
Manual Guide: 32-Bit High-Speed CPU V3.00
6 pages
MICROSERVICES
No ratings yet
MICROSERVICES
16 pages
FYP Mid Progress PPT Template (Sample)
No ratings yet
FYP Mid Progress PPT Template (Sample)
16 pages

Unit 5 Concepts of Big Data and Data Lake

Uploaded by

Unit 5 Concepts of Big Data and Data Lake

Uploaded by

Unit 5 Concepts of Big Data and Data Lake

What is Big Data?

Types Of Big Data

Prep By Akansha Srivastav Page 1

• (ii) Variety – The next aspect of Big Data is its variety.

Sources of Big Data

Prep By Akansha Srivastav Page 2

Traditional Data Big Data

It generates data after the happening of an It generates data every second.

It is typically processed in batches. It is developed and processed in real-time.

Prep By Akansha Srivastav Page 3

It contains reliable and accurate data. It may contain unreliable, inconsistent, or

It does not provide in-depth insights. It provides in-depth insights.

It can be managed in a centralized structure It requires a decentralized infrastructure to

What is a Data Warehouse?

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on

Prep By Akansha Srivastav Page 4

o Its usage is read-intensive.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in

Concept of Data Processing Techniques

o OLTP uses transactions that include small amounts of data.

Prep By Akansha Srivastav Page 5

Type of queries that an OLTP system can Process

POS system for OLTP

o Retrieving the description of a particular product.

Prep By Akansha Srivastav Page 6

Prep By Akansha Srivastav Page 7

Example of OLTP Transaction

OLTP for ATM image

Prep By Akansha Srivastav Page 8

 OLTP offers accurate forecast for revenue and expense.

Prep By Akansha Srivastav Page 9

OLAP(Online Analytics Processing)

Who uses OLAP and Why

OLAP applications are used by a variety of the functions of an organization.

Prep By Akansha Srivastav Page 10

 The advantages of OLAP are as follows −

 Pre-demonstrating is an absolute necessity. As to business information, the traditional

OLTP vs. OLAP

OLAP is an online analysis and data retrieving

It is characterized by large numbers of short

OLAP is an online database query management

OLTP uses traditional DBMS. OLAP uses the data warehouse.

Insert, Update, and Delete information from the

Prep By Akansha Srivastav Page 11

data. for OLAP.

It’s response time is in a millisecond. Response time in seconds to minutes.

Allow read/write operations. Only read and rarely write.

It is a market-orientated process. It is a customer orientated process.

Queries in this process are standardized and

DB design is an application-oriented example: DB design is subject-oriented. Example: Database

It is designed for analysis of business measures by

Transaction throughput is the performance

This kind of Database user allows thousands of

Prep By Akansha Srivastav Page 12

It ensures that response to the query is quicker

It lets the user create a view with the help of a

A data warehouse is created uniquely so that it can

Concept of Data Lake

Prep By Akansha Srivastav Page 13

loaded into the data lake in batches or in real-time

• HDFS is a cost-effective solution for both structured and unstructured data. It is a

landing zone for all data that is at rest in the system.

interactive, batch to generate structured data for easier analysis.

Prep By Akansha Srivastav Page 14

and proficiency management, data management, workflow management.

Difference Between Data Lakes &Data Warehouse

Prep By Akansha Srivastav Page 15

You might also like