SlideShare a Scribd company logo
MACHINE LEARNING WITH
BIG DATA
CHALLENGES & APPROACHES
BY K.DAVID
DMCA-40
Introduction
The Big Data revolution promises to transform how we live, work, and think by
enabling process optimization, empowering insight discovery and improving decision
making.
Today, the amount of data is exploding at an unprecedented rate as a result of developments
in Web technologies, social media, and mobile and sensing devices. For example, Twitter
processes over 70M tweets per day, thereby generating over 8TB daily.
The ability to extract value from Big Data depends on data analytics which consider
analytics to be the core of the Big Data revolution
Data analytics involves various approaches, technologies, and tools such as those from text
analytics, business intelligence, data visualization, and statistical analysis. The machine
learning (ML) as a fundamental component of data analytics. The ML will be one of the
main drivers of the Big Data revolution. The reason for this is its ability to learn from data
and provide data driven insights, decisions, and predictions.
ML Challenges Originating From Big Data
Definition
Big Data are often described by its dimensions, which are referred to as
its V’s.Earlier definitions of Big Data focussed on three V’s (volume,
velocity, and variety); however, a more commonly accepted definition
now relies upon the following four V’s : volume, velocity, variety, and
veracity.
It is important to note that other Vs can also be found in the
literature. For example, value is often added as a 5th V. However, value
is defined as the desired outcome of Big Data processing and not as
defining characteristics of Big Data itself
This section identifies machine learning challenges and associates each
challenge with a specific dimension of Big Data.
Charecteristics associated with challengs
VOLUME:
The first and the most talked about characteristic of Big Data
is volume: it is the amount, size, and scale of the data.
In the machine learning context, size can be defined either
Vertically by the number of records or samples in a dataset or
horizontally by the number of features or attributes it contains.
1) Processing Performance:-
One of the main challenges encountered in computations with
Big Data comes from the simple principle that scale, or
volume, adds computational complexity.
Consequently, as the scale becomes large, even trivial
operations can become costly.
volume
For example, the standard support vector machine (SVM)
algorithm has a training time complexity of O(m3
) and a
space complexity of O(m2
), where m is the number of
training samples. Therefore, an increase in the size m will
drastically affect the time and memory needed to train the
SVM algorithm and may even become computationally
infeasible on very large datasets.
Many other ML algorithms also exhibit high time complexity:
for example, the time complexity of principal component
analysis is O(mn2
+n3
) , that of logistic regression O(mn2
+n3
) ,
that of locally weighted linear regression O(mn2+n3) , and
that of Gaussian discriminative analysis O(mn2+n3), where
m is the number of samples and n the number of features.
Volume
Class imbalance:-
As datasets grow larger, the assumption that the data are
uniformly distributed across all classes is often broken.
This leads to a challenge referred to as class imbalance
The performance of a machine learning algorithm can be
negatively affected when datasets contain data from classes
with various probabilities of occurrence.
This problem is especially prominent when some classes are
represented by a large number of samples and some by very
few.
volume
Curse of Dimensionality:-
It refers to difficulties encountered when working in high
dimensional space. Specifically, the dimensionality describes
the number of features or attributes present in the dataset.
In addition, dimensionality affects processing performance:
the time and space complexity of ML algorithms is closely
related to data dimensionality. The time complexity of
many ML algorithms is polynomial in the number of
dimensions.
volume
Feature Engineering:-
This is the process of creating features, typically using
domain knowledge, to make machine learning perform better.
Indeed, the selection of the most appropriate features is one
of the most time consuming pre-processing tasks in machine
Learning.
As the dataset grows, both vertically and horizontally, it
becomes more difficult to create new, highly relevant features.
volume
Bonferonni’s principle :
It embodies the idea that if one is looking for a specific type
of event within a certain amount of data, the likelihood of
finding this event is high.
However, more often than not these occurrences are bogus,
meaning that they have no cause and are therefore
meaningless
instances within a Dataset.
In statistics, the Bonferonni correction theorem provides a
means of
avoiding those bogus positive searches within a dataset. It
suggests
That :- if testing m hypotheses with a desired significance of α ,
each
volume
Variance and Bias
Machine learning relies upon the idea of generalization;
through
observations and manipulations of data, representations can
be
generalized to enable analysis and prediction. Generalization
error
can be broken down into two components: variance and bias
Variance describes the consistency of a learner’s ability to
predict
random things, whereas bias describes the ability of a learner
to learn
the wrong thing
Variance and Bias:
APPROACHES
This paper reviews and organizes various proposed
machine learning approaches and discusses how they
address the identified challenges.
The big picture of approach-challenge correlations is
presented in Table.
It includes a list of approaches along with the challenges
that each best addresses. Symbol ‘√’ indicate high degree
of remedy while ‘*’ represents partial resolution.
Machine learning with Big Data power point presentation
Approaches ...
The following sub-sections introduce techniques and
methodologies being developed and used to handle the
challenges associated with machine learning with Big Data.
First, manipulation techniques used in conjunction with
existing algorithms are presented.
Second, various machine learning paradigms that are
especially well suited to handle Big Data challenge
A. Manipulations for Big Data
Data analytics using machine learning relies on an
established suite of events, also known as the data analytics
pipeline. The approaches presented in this section discuss
possible manipulations in various steps of the existing
Pipeline.
Dimensionality reduction aims to map high dimensionality
space onto lower-dimensionality one without significant loss
of information.
Data Analytics pipeline:
These three categories, along with their corresponding sub
categories and sample solutions
Manipulations
B. Machine Learning Paradigms for Big
Data
A variety of learning paradigms exists in the field of machine
learning; however, not all types are relevant to all areas of
Research
1) Deep Learning
Deep learning is an approach from the representation
learning family of machine learning. Representation learning
is also often referred to as feature learning
It transforms data into abstract representations that enable
the
features to be learnt. In a deep learning architecture, these
representations are subsequently used to accomplish the
machine learning tasks
Machine learning with Big Data power point presentation
2) Online Learning
Because it responds well to large-scale processing by nature,
online learning is another machine learning paradigm that has
been explored to bridge efficiency gaps created by Big Data.
Online learning can be seen as an alternative to batch
learning, the paradigm typically used in conventional machine
learning.
Local Learning
Local learning is a strategy that offers an alternative to typical
global learning. Conventionally, ML algorithms make use of
global learning through strategies such as generative learning
The idea behind it is to separate the input space into clusters
and then build a separate model for each cluster. This
reduces overall cost and complexity.
Machine learning with Big Data power point presentation
Transfer Learning
Transfer learning is an approach for improving learning in a
particular domain, referred to as the target domain, by
training
the model with other datasets from multiple domains, denoted
as source domains, with similar attributes or features, such as
the problem and constraints.
This type of learning is used when the data size within the
target domain is insufficient or the learning task is different .
Figure shows an abstract view of transfer learning.
Machine learning with Big Data power point presentation
Lifelong learning
Lifelong learning mimics human learning; learning is
continuous; knowledge is retained and used to solve different
problems. It is directed to maximize overall learning, to be
able to solve a new task by training either on one single
domain or on heterogeneous domains collectively
The learning outcomes from the training process are
collected and combined together in a space called the topic
model or knowledge model.
Ensemble learning
Conclusion
This paper has provided a systematic review of the
challenges associated with machine learning in the context
Of Big Data and categorized them according to the V
dimensions of Big Data. Moreover, it has presented an
overview of ML approaches and discussed how these
techniques overcome the various challenges identified.
Thank you
Ad

More Related Content

What's hot (20)

Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
Rui Pedro Paiva
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Machine learning
Machine learningMachine learning
Machine learning
Sanjay krishne
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
Krish_ver2
 
data mining
data miningdata mining
data mining
uoitc
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
ahsan irfan
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
Colleen Farrelly
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
Shuvra Ghosh
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
YouKnowwho28
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
Pragya Pandey
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
Rui Pedro Paiva
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
Krish_ver2
 
data mining
data miningdata mining
data mining
uoitc
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
Colleen Farrelly
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
Shuvra Ghosh
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
YouKnowwho28
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
Pragya Pandey
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 

Similar to Machine learning with Big Data power point presentation (20)

Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
PhD Assistance
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
NeeleEilers
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
Pranya Prabhakar
 
chapter Three artificial intelligence 1.pptx
chapter Three artificial intelligence   1.pptxchapter Three artificial intelligence   1.pptx
chapter Three artificial intelligence 1.pptx
gadisaadamu101
 
ML crash course
ML crash courseML crash course
ML crash course
mikaelhuss
 
Cognitive automation
Cognitive automationCognitive automation
Cognitive automation
Trideeb Kumar Das
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
Technovision
TechnovisionTechnovision
Technovision
SayantanGhosh58
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
cscpconf
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Arpana Awasthi
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
AnanthReddy38
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...
butest
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
butest
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business Revolution
Cognizant
 
INTRODUCTIONTOML2024 for graphic era.pptx
INTRODUCTIONTOML2024 for graphic era.pptxINTRODUCTIONTOML2024 for graphic era.pptx
INTRODUCTIONTOML2024 for graphic era.pptx
chirag19saxena2001
 
Anirban part1
Anirban part1Anirban part1
Anirban part1
kamatchi priya
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
andreecapon
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
PhD Assistance
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
NeeleEilers
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
Pranya Prabhakar
 
chapter Three artificial intelligence 1.pptx
chapter Three artificial intelligence   1.pptxchapter Three artificial intelligence   1.pptx
chapter Three artificial intelligence 1.pptx
gadisaadamu101
 
ML crash course
ML crash courseML crash course
ML crash course
mikaelhuss
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
cscpconf
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Arpana Awasthi
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
AnanthReddy38
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...
butest
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
butest
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business Revolution
Cognizant
 
INTRODUCTIONTOML2024 for graphic era.pptx
INTRODUCTIONTOML2024 for graphic era.pptxINTRODUCTIONTOML2024 for graphic era.pptx
INTRODUCTIONTOML2024 for graphic era.pptx
chirag19saxena2001
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
andreecapon
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
Ad

Recently uploaded (20)

computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Ad

Machine learning with Big Data power point presentation

  • 1. MACHINE LEARNING WITH BIG DATA CHALLENGES & APPROACHES BY K.DAVID DMCA-40
  • 2. Introduction The Big Data revolution promises to transform how we live, work, and think by enabling process optimization, empowering insight discovery and improving decision making. Today, the amount of data is exploding at an unprecedented rate as a result of developments in Web technologies, social media, and mobile and sensing devices. For example, Twitter processes over 70M tweets per day, thereby generating over 8TB daily. The ability to extract value from Big Data depends on data analytics which consider analytics to be the core of the Big Data revolution Data analytics involves various approaches, technologies, and tools such as those from text analytics, business intelligence, data visualization, and statistical analysis. The machine learning (ML) as a fundamental component of data analytics. The ML will be one of the main drivers of the Big Data revolution. The reason for this is its ability to learn from data and provide data driven insights, decisions, and predictions.
  • 3. ML Challenges Originating From Big Data Definition Big Data are often described by its dimensions, which are referred to as its V’s.Earlier definitions of Big Data focussed on three V’s (volume, velocity, and variety); however, a more commonly accepted definition now relies upon the following four V’s : volume, velocity, variety, and veracity. It is important to note that other Vs can also be found in the literature. For example, value is often added as a 5th V. However, value is defined as the desired outcome of Big Data processing and not as defining characteristics of Big Data itself This section identifies machine learning challenges and associates each challenge with a specific dimension of Big Data.
  • 5. VOLUME: The first and the most talked about characteristic of Big Data is volume: it is the amount, size, and scale of the data. In the machine learning context, size can be defined either Vertically by the number of records or samples in a dataset or horizontally by the number of features or attributes it contains. 1) Processing Performance:- One of the main challenges encountered in computations with Big Data comes from the simple principle that scale, or volume, adds computational complexity. Consequently, as the scale becomes large, even trivial operations can become costly.
  • 6. volume For example, the standard support vector machine (SVM) algorithm has a training time complexity of O(m3 ) and a space complexity of O(m2 ), where m is the number of training samples. Therefore, an increase in the size m will drastically affect the time and memory needed to train the SVM algorithm and may even become computationally infeasible on very large datasets. Many other ML algorithms also exhibit high time complexity: for example, the time complexity of principal component analysis is O(mn2 +n3 ) , that of logistic regression O(mn2 +n3 ) , that of locally weighted linear regression O(mn2+n3) , and that of Gaussian discriminative analysis O(mn2+n3), where m is the number of samples and n the number of features.
  • 7. Volume Class imbalance:- As datasets grow larger, the assumption that the data are uniformly distributed across all classes is often broken. This leads to a challenge referred to as class imbalance The performance of a machine learning algorithm can be negatively affected when datasets contain data from classes with various probabilities of occurrence. This problem is especially prominent when some classes are represented by a large number of samples and some by very few.
  • 8. volume Curse of Dimensionality:- It refers to difficulties encountered when working in high dimensional space. Specifically, the dimensionality describes the number of features or attributes present in the dataset. In addition, dimensionality affects processing performance: the time and space complexity of ML algorithms is closely related to data dimensionality. The time complexity of many ML algorithms is polynomial in the number of dimensions.
  • 9. volume Feature Engineering:- This is the process of creating features, typically using domain knowledge, to make machine learning perform better. Indeed, the selection of the most appropriate features is one of the most time consuming pre-processing tasks in machine Learning. As the dataset grows, both vertically and horizontally, it becomes more difficult to create new, highly relevant features.
  • 10. volume Bonferonni’s principle : It embodies the idea that if one is looking for a specific type of event within a certain amount of data, the likelihood of finding this event is high. However, more often than not these occurrences are bogus, meaning that they have no cause and are therefore meaningless instances within a Dataset. In statistics, the Bonferonni correction theorem provides a means of avoiding those bogus positive searches within a dataset. It suggests That :- if testing m hypotheses with a desired significance of α , each
  • 11. volume Variance and Bias Machine learning relies upon the idea of generalization; through observations and manipulations of data, representations can be generalized to enable analysis and prediction. Generalization error can be broken down into two components: variance and bias Variance describes the consistency of a learner’s ability to predict random things, whereas bias describes the ability of a learner to learn the wrong thing
  • 13. APPROACHES This paper reviews and organizes various proposed machine learning approaches and discusses how they address the identified challenges. The big picture of approach-challenge correlations is presented in Table. It includes a list of approaches along with the challenges that each best addresses. Symbol ‘√’ indicate high degree of remedy while ‘*’ represents partial resolution.
  • 15. Approaches ... The following sub-sections introduce techniques and methodologies being developed and used to handle the challenges associated with machine learning with Big Data. First, manipulation techniques used in conjunction with existing algorithms are presented. Second, various machine learning paradigms that are especially well suited to handle Big Data challenge
  • 16. A. Manipulations for Big Data Data analytics using machine learning relies on an established suite of events, also known as the data analytics pipeline. The approaches presented in this section discuss possible manipulations in various steps of the existing Pipeline. Dimensionality reduction aims to map high dimensionality space onto lower-dimensionality one without significant loss of information.
  • 17. Data Analytics pipeline: These three categories, along with their corresponding sub categories and sample solutions
  • 19. B. Machine Learning Paradigms for Big Data A variety of learning paradigms exists in the field of machine learning; however, not all types are relevant to all areas of Research 1) Deep Learning Deep learning is an approach from the representation learning family of machine learning. Representation learning is also often referred to as feature learning It transforms data into abstract representations that enable the features to be learnt. In a deep learning architecture, these representations are subsequently used to accomplish the machine learning tasks
  • 21. 2) Online Learning Because it responds well to large-scale processing by nature, online learning is another machine learning paradigm that has been explored to bridge efficiency gaps created by Big Data. Online learning can be seen as an alternative to batch learning, the paradigm typically used in conventional machine learning.
  • 22. Local Learning Local learning is a strategy that offers an alternative to typical global learning. Conventionally, ML algorithms make use of global learning through strategies such as generative learning The idea behind it is to separate the input space into clusters and then build a separate model for each cluster. This reduces overall cost and complexity.
  • 24. Transfer Learning Transfer learning is an approach for improving learning in a particular domain, referred to as the target domain, by training the model with other datasets from multiple domains, denoted as source domains, with similar attributes or features, such as the problem and constraints. This type of learning is used when the data size within the target domain is insufficient or the learning task is different . Figure shows an abstract view of transfer learning.
  • 26. Lifelong learning Lifelong learning mimics human learning; learning is continuous; knowledge is retained and used to solve different problems. It is directed to maximize overall learning, to be able to solve a new task by training either on one single domain or on heterogeneous domains collectively The learning outcomes from the training process are collected and combined together in a space called the topic model or knowledge model.
  • 28. Conclusion This paper has provided a systematic review of the challenges associated with machine learning in the context Of Big Data and categorized them according to the V dimensions of Big Data. Moreover, it has presented an overview of ML approaches and discussed how these techniques overcome the various challenges identified.