0% found this document useful (0 votes)

2 views

bigdata

Big Data applications span various industries, enhancing efficiency, insights, and product development through monitoring, analysis, and innovative solutions. Key features of Big Data include the five Vs: Volume, Velocity, Variety, Veracity, and Value, which highlight its complexity and importance in decision-making. Technologies like SQL and NoSQL databases, along with distributed and parallel computing, support the management and processing of large-scale data, while foundational systems like Google MapReduce and GFS have shaped modern big data frameworks.

Uploaded by

onceinabluemoon2005

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

bigdata

Uploaded by

onceinabluemoon2005

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Applications of Big Data

Big Data is used across industries to improve efficiency, gain insights, and develop new
products.

1. Monitoring and Tracking Applications

● Public Health Monitoring: Tracks disease outbreaks and improves healthcare data
sharing.
● Consumer Sentiment Monitoring: Analyzes social media data to understand customer
opinions.
● Asset Tracking: Uses RFID tags to prevent counterfeiting and theft in industries like
defense and retail.
● Supply Chain Monitoring: Tracks inventory movement using RFID, ensuring timely
product delivery.
● Preventive Machine Maintenance: Uses sensors to predict equipment failures and
reduce downtime.

2. Analysis and Insight Applications

● Predictive Policing: Identifies crime hotspots to help law enforcement prevent future
crimes.
● Winning Political Elections: Uses voter data analysis to target potential supporters and
optimize campaigns.
● Personal Health: AI-based medical analysis improves diagnosis and treatment
recommendations.

3. New Product Development

● Flexible Auto Insurance: Adjusts insurance premiums based on real-time driving

behavior.
● Location-Based Retail Promotion: Sends personalized offers to customers based on
their location.
● Recommendation Services: Uses data analytics to suggest products, movies, and
music based on user preferences.

Importance of Big Data (Key Points in Simple Language)

1. Cost Savings

○ Helps reduce costs and improve efficiency.

○ Optimizes quality assurance and testing.
○ Useful in complex industries like biopharmaceuticals and nanotechnology.
2. Time Reduction

○ Uses real-time data analytics for quick decision-making.

○ Tools like Hadoop process large data fast.
3. Understanding Market Conditions

○ Analyzes customer purchase behavior.

○ Identifies popular products to improve business strategies.
○ Helps businesses stay ahead of competitors.
4. Social Media Listening

○ Uses sentiment analysis to monitor brand reputation.

○ Helps businesses understand customer opinions.
○ Improves online presence and customer engagement.
5. Customer Acquisition and Retention

○ Helps understand customer needs and preferences.

○ Identifies buying patterns for better customer experience.
○ Prevents customer loss and improves business growth.
6. Better Advertising and Marketing

○ Helps businesses target the right audience.

○ Improves marketing campaigns using data insights.
○ Modifies product range based on customer demand.
7. Driving Innovation and Product Development

○ Helps create new and improved products.

○ Identifies gaps in the market for innovation.
○ Enhances product features using customer data.

Five Vs of Big Data (Detailed and Simple Explanation)

1. Volume (Large Amount of Data)

○ Big Data means handling huge amounts of data collected from different sources.
○ This data is growing every second from social media, online transactions,
sensors, etc.
○ Example: In 2016, global mobile traffic was 6.2 exabytes per month, and by
2020, it was expected to reach 40,000 exabytes.
2. Velocity (Speed of Data Generation)

○ Data is being generated at an extremely high speed from various sources like
social media, machines, and IoT devices.
○ The faster data is collected, the quicker it needs to be processed for real-time
decision-making.
○ Example: Google handles 3.5 billion searches per day, and Facebook generates
massive amounts of posts, likes, and messages every second.
3. Variety (Different Types of Data)

○ Big Data comes in multiple formats, making it difficult to store and process in
traditional databases.
○ Types of data include:
■ Structured Data: Organized in tables (e.g., databases, spreadsheets).
■ Semi-structured Data: Partially organized (e.g., JSON files, XML,
emails).
■ Unstructured Data: No fixed format (e.g., images, videos, social media
posts).
○ Example: Text messages, audio recordings, GPS location data, and video
surveillance footage all come in different formats.
4. Veracity (Accuracy and Reliability of Data)

○ Since Big Data comes from multiple sources, some of it may be inaccurate,
inconsistent, or misleading.
○ Ensuring data quality and trustworthiness is important for making correct
business decisions.
○ Example: Social media posts may have fake reviews, duplicate data, or errors
that affect analysis.
5. Value (Extracting Useful Insights)

○ The main goal of Big Data is to analyze and gain meaningful insights that help in
decision-making.
○ Organizations use Big Data to improve customer experience, detect fraud,
optimize operations, and predict future trends.
○ Example: E-commerce websites use Big Data to recommend products based on
customer behavior, increasing sales and satisfaction.

SQL:

● Structured Query Language: Uses a standardized language for interacting with

the database.
● Relational: Data is organized into tables with rows and columns, linked by
relationships.
● Fixed Schema: Requires a predefined structure (schema) before data can be
stored.
● ACID Properties: Guarantees Atomicity, Consistency, Isolation, and Durability of
transactions. Focuses on data integrity.
● Vertical Scalability: Scaled by adding more resources (CPU, RAM) to a single
server.
● Best For: Applications with structured data, complex queries, and the need for
strong data consistency (e.g., financial systems, e-commerce).

NoSQL:

● Not Only SQL: Encompasses a variety of database types that don't adhere to
the relational model.
● Flexible Data Models: Supports various data formats like documents, key-value
pairs, graphs, etc.
● Flexible Schema: Schema can be dynamic or even non-existent, allowing for
more flexible data structures.
● CAP Theorem: Focuses on Consistency, Availability, and Partition Tolerance.
Often prioritizes availability over absolute consistency.
● Horizontal Scalability: Scaled by adding more servers to the database cluster.
● Best For: Applications with large volumes of unstructured or semi-structured
data, high traffic, and evolving data requirements (e.g., social media, big data
analytics).

Key Differences :

● Structure: SQL is structured, NoSQL is flexible.

● Scaling: SQL scales up, NoSQL scales out.
● Consistency: SQL prioritizes strong consistency, NoSQL often prioritizes
availability.
● Queries: SQL uses a standardized language, NoSQL uses various approaches.

When to Use SQL:

● Related data
● Data integrity is crucial
● Complex queries
● Transactions are important
● Structured data

When to Use NoSQL:

● Large volumes of data

● Unstructured or semi-structured data
● Flexible schema requirements
● High availability and scalability are critical
● Rapid development cycles

Relational Database Management System (RDBMS)

A Relational Database Management System (RDBMS) is a type of database

management system that stores data in a structured format using tables (rows and
columns) and manages relationships between them. It follows the principles of
relational model proposed by E.F. Codd.

Key Features of RDBMS

1. Data Stored in Tables

○ Data is stored in tables consisting of rows (records/tuples) and columns

(attributes/fields).
2. Primary Key and Foreign Key

○ A primary key uniquely identifies each record in a table.

○ A foreign key establishes relationships between tables by referencing a
primary key from another table.
3. ACID Properties

○ Atomicity: Transactions are fully completed or not done at all.

○ Consistency: Database remains in a valid state before and after
transactions.
○ Isolation: Transactions do not interfere with each other.
○ Durability: Once a transaction is completed, it is permanently saved.
4. Structured Query Language (SQL)

○ SQL is used for managing and querying data (e.g., SELECT, INSERT,
UPDATE, DELETE).
5. Normalization

○ Reduces data redundancy and improves data integrity by organizing data

efficiently.
6. Data Integrity and Security

○ Enforces constraints like NOT NULL, UNIQUE, and CHECK to maintain

data accuracy.
○ Provides user authentication and access control for security.
7. Scalability

○ Supports large amounts of data and can scale vertically by increasing

hardware resources.

Examples of RDBMS

● MySQL
● PostgreSQL
● Oracle Database
● Microsoft SQL Server
● IBM Db2

RDBMS is widely used in banking, e-commerce, enterprise applications, and other

structured data management systems due to its reliability and efficiency.

Distributed and Parallel Computing

Both distributed computing and parallel computing deal with executing multiple
tasks simultaneously, but they differ in how they operate and where they are used.

1. Distributed Computing
Definition:
Distributed computing involves multiple computers (nodes) working together to solve a
problem by sharing resources and communicating over a network.

Key Features:

● Multiple independent systems communicate and collaborate.

● Tasks are distributed among different machines.
● Nodes operate independently and may fail without stopping the entire system.
● Uses message passing for communication.

Examples of Distributed Computing:

● Cloud computing (AWS, Google Cloud, Microsoft Azure)

● Blockchain technology (Bitcoin, Ethereum)
● Distributed file systems (HDFS in Hadoop)
2. Parallel Computing
Definition:
Parallel computing involves executing multiple tasks simultaneously using multiple
processors within a single computer or tightly connected systems.

Key Features:

● Single system with multiple processors or cores.

● Tasks are broken into subtasks and executed in parallel.
● Shared memory architecture (common memory for communication).
● Improves performance by utilizing multiple CPUs/GPUs.

Types of Parallel Computing:

1. Shared Memory Model – All processors access the same memory (e.g.,
OpenMP).
2. Distributed Memory Model – Each processor has its own memory (e.g., MPI).

Examples of Parallel Computing:

● Supercomputers (e.g., IBM Summit, Fugaku)

● Parallel processing in AI/ML using GPUs
● Weather forecasting simulations

Key Differences Between Distributed and Parallel

Computing

Feature Distributed Computing Parallel Computing

System Type Multiple independent computers Single system with multiple

processors
Memory Separate memory for each node Shared or distributed memory

Communicati Uses networks for communication Uses memory for data sharing
on

Fault More fault-tolerant (node failure Less fault-tolerant (failure can

Tolerance does not stop system) halt execution)

Example Cloud computing, Blockchain Supercomputers, AI

processing

Both computing models are essential in handling large-scale data processing and
computation-heavy tasks, with distributed computing focusing on networked
systems and parallel computing leveraging multiple processors for speed.

Google MapReduce and Google File System (GFS) White Papers

Google introduced two foundational technologies for handling large-scale data

processing: Google MapReduce and Google File System (GFS). Both were described
in research papers published by Google engineers and have influenced modern big
data technologies like Hadoop and Apache Spark.

1. Google MapReduce White Paper (2004)

Title: MapReduce: Simplified Data Processing on Large Clusters
Authors: Jeffrey Dean and Sanjay Ghemawat
Published by: Google

Overview:

MapReduce is a programming model designed to process large-scale data sets in a

distributed and parallel manner across many machines. It simplifies tasks like web
indexing, log processing, and data analysis.
Working of MapReduce:

1. Map Phase: The input data is divided into key-value pairs and processed in
parallel by multiple nodes.
2. Shuffle Phase: The intermediate data is grouped and sorted based on keys.
3. Reduce Phase: The grouped data is aggregated and combined to produce the
final output.

Key Features:

● Automatic parallelization across thousands of nodes.

● Fault tolerance using replication and re-execution of failed tasks.
● Scalability to process petabytes of data.
● Optimized data locality by processing data near where it is stored.

Impact:

● Inspired Hadoop MapReduce, an open-source implementation.

● Used in search indexing, log analysis, and big data analytics.

2. Google File System (GFS) White Paper (2003)

Title: The Google File System
Authors: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Published by: Google

Overview:

GFS is a distributed file system designed to store and manage large amounts of
data across multiple servers efficiently.

Architecture:

● Master Node: Manages metadata, file system namespace, and chunk locations.
● Chunk Servers: Store actual data in 64 MB chunks and handle read/write
requests.
● Clients: Access data using file system API requests.

Key Features:
● High fault tolerance with data replication (default: 3 copies).
● Optimized for large files (GB to TB in size).
● Write-once, read-many model to handle high read throughput.
● Automatic load balancing and recovery mechanisms.

Impact:

● Inspired Hadoop Distributed File System (HDFS), used in big data frameworks.
● Used by Google for indexing and storage of web search data.

Comparison of MapReduce and GFS

Feature Google MapReduce Google File System (GFS)

Purpose Parallel data processing Distributed file storage

Key Concept Map and Reduce Large file storage with

functions chunking

Fault Automatic task Data replication across

Tolerance re-execution nodes

Impact Led to Hadoop Led to Hadoop HDFS

MapReduce

These two technologies together revolutionized big data processing, forming the
foundation for modern distributed computing systems like Hadoop, Spark, and cloud
storage solutions.
Evolution of Big Data

Big Data has evolved over the years due to advancements in storage, processing, and
analytics. The evolution can be divided into different phases:

1. Pre-Big Data Era (Before 2000s)

● Data Volume: Limited data from structured sources (databases, spreadsheets).

● Storage & Processing: Traditional Relational Database Management
Systems (RDBMS) were used.
● Key Technologies: SQL databases, mainframes.

Limitations:

● Could not handle unstructured data (text, images, videos).

● Data processing was slow and required expensive hardware.

2. Emergence of Big Data (2000-2010)

● Data Growth: Explosion of digital data due to the internet, social media, and
e-commerce.
● Challenges: RDBMS could not handle the volume, variety, and velocity of data.
● Key Innovations:
○ Google File System (GFS) & MapReduce (2003-2004) – Enabled
distributed data storage and processing.
○ Hadoop (2006) – Open-source framework inspired by GFS and
MapReduce.
○ NoSQL Databases (2009-2010) – MongoDB, Cassandra for handling
unstructured data.

Impact:

● Enabled large-scale data analytics, real-time processing, and cloud-based

storage.

3. Modern Big Data Era (2010-Present)

● Data Explosion: Social media, IoT, cloud computing, AI generate massive data.
● Advanced Processing: Faster and more efficient frameworks like Apache
Spark (2014) replaced Hadoop MapReduce.
● Key Technologies:
○ Cloud Computing (AWS, Google Cloud, Azure) for scalable storage.
○ Machine Learning & AI for predictive analytics.
○ Edge Computing & IoT for real-time data processing.

Current Trends:

● Data Lakes & Warehouses (Snowflake, Delta Lake) for centralized storage.
● Streaming Analytics (Kafka, Flink) for real-time data processing.
● Privacy & Security (GDPR, data encryption) to protect user data.

Future of Big Data

● Quantum Computing for even faster processing.

● Federated Learning for decentralized AI-driven insights.
● Ethical AI & Data Governance to ensure fair data usage.

Big Data continues to evolve with advancements in AI, cloud computing, and
cybersecurity, making data-driven decision-making more powerful.

Comparison of Google’s White Paper Technologies and Current Big Data

Technologies

Google’s MapReduce and Google File System (GFS) white papers introduced
foundational technologies for Big Data processing. These have evolved over time,
leading to modern, faster, and more efficient solutions.

1. Storage Systems

Google White Paper Technology Current Technology

Google File System (GFS) (2003) - Hadoop Distributed File System
Distributed storage system that breaks (HDFS) - Open-source version inspired
files into chunks and stores them across by GFS, widely used in Big Data.
multiple machines.

Bigtable (2006) - NoSQL database built Apache HBase, Cassandra - Modern

on GFS, optimized for scalability. NoSQL databases inspired by
Bigtable, handling massive real-time
workloads.

Colossus (Next-gen GFS, 2010s) - Cloud Storage (AWS S3, Google

Google's internal distributed storage with Cloud Storage, Azure Blob Storage)
improved speed and reliability. - Scalable object storage with high
availability.

2. Processing Frameworks

Google White Paper Technology Current Technology

MapReduce (2004) - Batch processing Apache Spark (2014) - Faster,

framework dividing tasks into "Map" and in-memory processing, replacing
"Reduce" phases. MapReduce in most use cases.

Pregel (2010) - Google’s graph Apache Giraph, GraphX -

processing framework for large-scale Open-source alternatives for large-scale
graphs. graph analytics.
Dremel (2010) - Columnar storage and Apache Drill, Presto, BigQuery -
processing for fast queries. Cloud-based and open-source tools
inspired by Dremel.

3. Query & Analytics

Google White Paper Current Technology

Technology

Sawzall (2003) - Google's early SQL-based tools (Presto, Trino, BigQuery) -

query language for log More flexible and scalable alternatives.
processing.

Google BigQuery (2010s) - Modern Data Warehouses (Snowflake, Amazon

Cloud-based real-time analytics Redshift, Databricks SQL) - More advanced,
platform. multi-cloud capabilities.

4. Stream Processing

Google White Paper Current Technology

Technology

MillWheel (2013) - Google’s Apache Flink, Kafka Streams, Spark

real-time stream processing Streaming - Open-source, widely adopted
engine. streaming platforms.
Key Advancements in Current Technologies
✅ In-Memory Processing: Apache Spark is 100x faster than MapReduce.
✅ Real-Time Processing: Kafka, Flink, and Spark Streaming replaced batch-oriented
✅ Cloud & Serverless Solutions: BigQuery, Snowflake, and AWS Lambda provide
systems.

✅ AI & ML Integration: TensorFlow and PyTorch enable predictive analytics on Big

scalable, low-maintenance alternatives.

Data.

Conclusion

Google’s research laid the foundation for Big Data. However, modern technologies have
improved efficiency, speed, and scalability, making Big Data processing more
accessible and cost-effective.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
Project Planning With Primavera Enterprise
100% (1)
Project Planning With Primavera Enterprise
41 pages
IMP Questions pdf in Big Data
No ratings yet
IMP Questions pdf in Big Data
15 pages
BD unit 1
No ratings yet
BD unit 1
5 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
big data notes
No ratings yet
big data notes
89 pages
BDA UNIT -1_pdf
No ratings yet
BDA UNIT -1_pdf
143 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
No ratings yet
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
33 pages
bdgujgdgjnctjvccnj
No ratings yet
bdgujgdgjnctjvccnj
6 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
4 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
BIGDATA ANALYTICS
No ratings yet
BIGDATA ANALYTICS
19 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
Big Data in Business (2)
No ratings yet
Big Data in Business (2)
13 pages
Unit 1.1 - Introduction to Big Data Analytics
No ratings yet
Unit 1.1 - Introduction to Big Data Analytics
19 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
When Where and Why To Use NoSQL
No ratings yet
When Where and Why To Use NoSQL
13 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
21CS71-SOLUTIONS
No ratings yet
21CS71-SOLUTIONS
24 pages
Answers For Sessional 1 BDA
No ratings yet
Answers For Sessional 1 BDA
11 pages
Big Data
No ratings yet
Big Data
16 pages
Imp Answers
No ratings yet
Imp Answers
29 pages
unit 2
No ratings yet
unit 2
6 pages
BDAV Question Bank Solution
No ratings yet
BDAV Question Bank Solution
63 pages
BDA Assign 1
No ratings yet
BDA Assign 1
21 pages
Big Data
No ratings yet
Big Data
18 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
239700a5-6c7a-43c1-810e-687c652d046e
No ratings yet
239700a5-6c7a-43c1-810e-687c652d046e
14 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Adbms Finals Reviewer
No ratings yet
Adbms Finals Reviewer
3 pages
Data Science Vs Big Data
No ratings yet
Data Science Vs Big Data
34 pages
UNIT 4(class notes
No ratings yet
UNIT 4(class notes
28 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
21CS71-SOLUTIONS
No ratings yet
21CS71-SOLUTIONS
24 pages
2 emerging
No ratings yet
2 emerging
10 pages
Bda Aiml Note Unit 1
No ratings yet
Bda Aiml Note Unit 1
14 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
Unit 2
No ratings yet
Unit 2
35 pages
Report On Bigdata
No ratings yet
Report On Bigdata
3 pages
Big Data
No ratings yet
Big Data
190 pages
Big Data: Abstract
No ratings yet
Big Data: Abstract
15 pages
Unit 1
No ratings yet
Unit 1
11 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
Unit I
No ratings yet
Unit I
61 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Big-Data-sent-24-10-24 (2)
No ratings yet
Big-Data-sent-24-10-24 (2)
49 pages
BDA Model Qp Soln
No ratings yet
BDA Model Qp Soln
55 pages
Mod 4
No ratings yet
Mod 4
76 pages
BDA Assignment L9
No ratings yet
BDA Assignment L9
7 pages
BA ppt
No ratings yet
BA ppt
17 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Unit-1 BDA
No ratings yet
Unit-1 BDA
30 pages
BDA
No ratings yet
BDA
52 pages
What Is Big Data ?
No ratings yet
What Is Big Data ?
6 pages
BigData_UNIT-1.docx
No ratings yet
BigData_UNIT-1.docx
19 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Assignment # 2
100% (1)
Assignment # 2
6 pages
Day To Day Activities of Oracle DBA - Checklist
67% (3)
Day To Day Activities of Oracle DBA - Checklist
5 pages
GRANT REVOKE Guide
No ratings yet
GRANT REVOKE Guide
9 pages
Take Test: Week 2 Midterm Exam – Database Management ...
No ratings yet
Take Test: Week 2 Midterm Exam – Database Management ...
19 pages
Vital Net
No ratings yet
Vital Net
4 pages
SCADA Testbed For Vulnerability Assessments, Penetration Testing and Incident Forensics
No ratings yet
SCADA Testbed For Vulnerability Assessments, Penetration Testing and Incident Forensics
6 pages
M.SC 1st Year 1st Semester Assignment
No ratings yet
M.SC 1st Year 1st Semester Assignment
6 pages
OSM Concepts
No ratings yet
OSM Concepts
92 pages
Computer Book Information
No ratings yet
Computer Book Information
22 pages
Conclusion and Future Enhancement
No ratings yet
Conclusion and Future Enhancement
2 pages
GOWWWWW Edited
No ratings yet
GOWWWWW Edited
11 pages
David Gebhardt Academic Profile
No ratings yet
David Gebhardt Academic Profile
9 pages
CH 10 Questions
No ratings yet
CH 10 Questions
5 pages
3 Levels of Database Architecture
No ratings yet
3 Levels of Database Architecture
19 pages
IT Tech Schedule July - Oct 2019
No ratings yet
IT Tech Schedule July - Oct 2019
8 pages
Chapter1 3 1
No ratings yet
Chapter1 3 1
139 pages
Red Hat Directory Server-8.2-Performance Tuning Guide-En-US
No ratings yet
Red Hat Directory Server-8.2-Performance Tuning Guide-En-US
44 pages
Medical Store Management System
No ratings yet
Medical Store Management System
14 pages
DB2 UDB V8.1 Family Application Development Certification:: CLI/ODBC Programming
No ratings yet
DB2 UDB V8.1 Family Application Development Certification:: CLI/ODBC Programming
32 pages
Unit 1
No ratings yet
Unit 1
335 pages
Infinity Train Player's Guide V1.0
No ratings yet
Infinity Train Player's Guide V1.0
94 pages
Bibliographic Data Migration From Libsys To Koha
No ratings yet
Bibliographic Data Migration From Libsys To Koha
8 pages
Smartsheet - Implementation RFP
No ratings yet
Smartsheet - Implementation RFP
8 pages
Actatek Access Manager Lite Installation Manual
No ratings yet
Actatek Access Manager Lite Installation Manual
18 pages
(eBook PDF) Using Microsoft Excel and Access 2016 for Accounting 5th 2024 scribd download
100% (3)
(eBook PDF) Using Microsoft Excel and Access 2016 for Accounting 5th 2024 scribd download
45 pages
CV Megha Jain 332341
No ratings yet
CV Megha Jain 332341
7 pages
How To Install Oracle Database 11gR2 On Oracle Linux 7 VMware Workstation
No ratings yet
How To Install Oracle Database 11gR2 On Oracle Linux 7 VMware Workstation
14 pages
CSA Recap Part2
No ratings yet
CSA Recap Part2
27 pages
raks
No ratings yet
raks
11 pages

bigdata

Uploaded by

bigdata

Uploaded by

Applications of Big Data

1. Monitoring and Tracking Applications

2. Analysis and Insight Applications

3. New Product Development

●​ Flexible Auto Insurance: Adjusts insurance premiums based on real-time driving

Importance of Big Data (Key Points in Simple Language)

1.​ Cost Savings​

○​ Helps reduce costs and improve efficiency.

○​ Uses real-time data analytics for quick decision-making.

○​ Analyzes customer purchase behavior.

○​ Uses sentiment analysis to monitor brand reputation.

○​ Helps understand customer needs and preferences.

○​ Helps businesses target the right audience.

○​ Helps create new and improved products.

Five Vs of Big Data (Detailed and Simple Explanation)

1.​ Volume (Large Amount of Data)​

●​ Structured Query Language: Uses a standardized language for interacting with

●​ Structure: SQL is structured, NoSQL is flexible.

When to Use SQL:

When to Use NoSQL:

●​ Large volumes of data

Relational Database Management System (RDBMS)

A Relational Database Management System (RDBMS) is a type of database

Key Features of RDBMS

1.​ Data Stored in Tables​

○​ Data is stored in tables consisting of rows (records/tuples) and columns

○​ A primary key uniquely identifies each record in a table.

○​ Atomicity: Transactions are fully completed or not done at all.

○​ Reduces data redundancy and improves data integrity by organizing data

○​ Enforces constraints like NOT NULL, UNIQUE, and CHECK to maintain

○​ Supports large amounts of data and can scale vertically by increasing

RDBMS is widely used in banking, e-commerce, enterprise applications, and other

Distributed and Parallel Computing

●​ Multiple independent systems communicate and collaborate.

Examples of Distributed Computing:

●​ Cloud computing (AWS, Google Cloud, Microsoft Azure)

●​ Single system with multiple processors or cores.

Types of Parallel Computing:

Examples of Parallel Computing:

●​ Supercomputers (e.g., IBM Summit, Fugaku)

Key Differences Between Distributed and Parallel

Feature Distributed Computing Parallel Computing

System Type Multiple independent computers Single system with multiple

Fault More fault-tolerant (node failure Less fault-tolerant (failure can

Example Cloud computing, Blockchain Supercomputers, AI

Google MapReduce and Google File System (GFS) White Papers

Google introduced two foundational technologies for handling large-scale data

1. Google MapReduce White Paper (2004)

MapReduce is a programming model designed to process large-scale data sets in a

●​ Automatic parallelization across thousands of nodes.

●​ Inspired Hadoop MapReduce, an open-source implementation.

2. Google File System (GFS) White Paper (2003)

Comparison of MapReduce and GFS

Feature Google MapReduce Google File System (GFS)

Purpose Parallel data processing Distributed file storage

Key Concept Map and Reduce Large file storage with

Fault Automatic task Data replication across

Impact Led to Hadoop Led to Hadoop HDFS

1. Pre-Big Data Era (Before 2000s)

●​ Data Volume: Limited data from structured sources (databases, spreadsheets).

●​ Could not handle unstructured data (text, images, videos).

2. Emergence of Big Data (2000-2010)

●​ Enabled large-scale data analytics, real-time processing, and cloud-based

3. Modern Big Data Era (2010-Present)

Future of Big Data

●​ Quantum Computing for even faster processing.

Comparison of Google’s White Paper Technologies and Current Big Data

Google White Paper Technology Current Technology

Bigtable (2006) - NoSQL database built Apache HBase, Cassandra - Modern

Colossus (Next-gen GFS, 2010s) - Cloud Storage (AWS S3, Google

Google White Paper Technology Current Technology

MapReduce (2004) - Batch processing Apache Spark (2014) - Faster,

Pregel (2010) - Google’s graph Apache Giraph, GraphX -

3. Query & Analytics

Google White Paper Current Technology

Sawzall (2003) - Google's early SQL-based tools (Presto, Trino, BigQuery) -

● Flexible Auto Insurance: Adjusts insurance premiums based on real-time driving

1. Cost Savings

○ Helps reduce costs and improve efficiency.

○ Uses real-time data analytics for quick decision-making.

○ Analyzes customer purchase behavior.

○ Uses sentiment analysis to monitor brand reputation.

○ Helps understand customer needs and preferences.

○ Helps businesses target the right audience.

○ Helps create new and improved products.

1. Volume (Large Amount of Data)

● Structured Query Language: Uses a standardized language for interacting with

● Structure: SQL is structured, NoSQL is flexible.

● Large volumes of data

1. Data Stored in Tables

○ Data is stored in tables consisting of rows (records/tuples) and columns

○ A primary key uniquely identifies each record in a table.

○ Atomicity: Transactions are fully completed or not done at all.

○ Reduces data redundancy and improves data integrity by organizing data

○ Enforces constraints like NOT NULL, UNIQUE, and CHECK to maintain

○ Supports large amounts of data and can scale vertically by increasing

● Multiple independent systems communicate and collaborate.

● Cloud computing (AWS, Google Cloud, Microsoft Azure)

● Single system with multiple processors or cores.

● Supercomputers (e.g., IBM Summit, Fugaku)

● Automatic parallelization across thousands of nodes.

● Inspired Hadoop MapReduce, an open-source implementation.

● Data Volume: Limited data from structured sources (databases, spreadsheets).

● Could not handle unstructured data (text, images, videos).

● Enabled large-scale data analytics, real-time processing, and cloud-based

● Quantum Computing for even faster processing.