0% found this document useful (0 votes)

256 views

Module -02 Machine Learning(BCS602) Notes

The document discusses bivariate and multivariate data analysis in machine learning, highlighting the importance of understanding relationships between variables. It covers statistical measures like covariance and correlation, as well as techniques for dimensionality reduction and feature engineering. Additionally, it introduces various graphical methods for visualizing data relationships, such as scatter plots and heatmaps.

Uploaded by

harishjoshi aiml

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

256 views

Module -02 Machine Learning(BCS602) Notes

Uploaded by

harishjoshi aiml

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 38

MACHINE LEARNING [BCS602]

Module-2

Chapter–01-UnderstandingData–2

BivariateDataandMultivariateData

BivariateData

Bivariatedatainvolvestwovariables,andthegoalofbivariateanalysisistoexplorethe
relationship betwethem.

Thisrelationshipcanhelpincomparisons,identifyingcauses,andfurtherexploratio
nof the data.

BivariateDatainvolvestwovariables.Bivariatedatadealswithcausesofrelationshi
ps. The aim is to find relationships among data.

ConsiderthefollowingTable2.3,withdataofthetemperatureinashopandsalesof
sweaters.

Page1
MACHINE LEARNING [BCS602]

ScatterPlot

Ascatterplotisausefulgraphicalmethodforvisualizingbivariatedata.

Itisparticularlyeffectivefor illustratingtherelationshipbetweentwovariables.

Page2
MACHINE LEARNING [BCS602]

Thekeyfeaturesofascatterplotare:

 Strength:Indicateshowcloselythedatapointsfitapatternortrend.
 Shape:Helpsinidentifyingthetypeofrelationship(linear,quadratic,etc.).
 Direction:Showswhethertherelationshipispositive,negative,orneutral.
 Outliers:Helpsidentifyanypointsthatdeviatesignificantlyfromthetrend.

Scatterplotsareoftenusedintheexploratoryphaseofdataanalysisbeforecalculating
correlation coefficients or fitting regression models.

BivariateStatistics

Therearevariousstatisticalmeasurestodescribetherelationshipbetweentwo
variables.

TwoimportantbivariatestatisticsareCovarianceandCorrelation.

Covariance

Covariancemeasuresthejointvariabilityoftworandomvariables.Ittellsyouwhet
her an increase in one variable results in anincrease or decrease in theother
variable.
Mathematically,thecovariancebetweentwovariablesXandYisdefined as:

Page3
MACHINE LEARNING [BCS602]

Covariancevalues:

 Positivecovariance:Asonevariableincreases,theothervariablealsoincreases.
 Negativecovariance:Asonevariableincreases,theothervariabledecreases.
 Zerocovariance:Nolinear relationshipbetweenthevariables.

Correlation

Whilecovariancemeasuresthedirectionoftherelationship,correlationquantifie
sthe strength of the relationship between two variables.

Themostcommonmeasureof correlationisthePearsoncorrelationcoefficient:

Unlikecovariance,correlationisdimensionless,meaningitisnotaffectedbytheu
nitsof the variables.

Page4
MACHINE LEARNING [BCS602]

MultivariateStatistics

Multivariatedatareferstodatathatinvolvesmorethantwovariables,andinmachine
learning, most datasets are multivariate.

Thegoalofmultivariateanalysisistounderstandrelationshipsamongmultiple
variables simultaneously.

Thiscaninvolvemultipledependent(response)variables,andisoftenusedfor
analyzing more complex data scenarios.

Multivariateanalysistechniquesinclude:

 RegressionAnalysis
 PrincipalComponentAnalysis(PCA)
 PathAnalysis

Themeanvectorisusedtorepresentthemeanofmultiplevariables,andthecovar
iance matrix represents the variance and relationships among all variables.

Themeanvectorisalsoknownasthecentroid,whilethecovariancematrixisals
o referred to as the dispersion matrix.

MultivariateAnalysisTechn

iques Regression

Analysis:

Usedtomodeltherelationshipbetweenmultipleindependentvariablesanda
dependent variable.

FactorAnalysis:

Astatisticalmethodusedtoidentifyunderlyingrelationshipsbetweenobserved

Page5
MACHINE LEARNING [BCS602]

variables.

Page6
MACHINE LEARNING [BCS602]

MultivariateAnalysisofVariance(MANOVA):

ExtendsANOVAtoanalyzemultipledependentvariablessimultaneousl

y. VisualizationTechniquesforMultivariateData

Heatmap

Aheatmapisagraphicalrepresentationofa2Dmatrixwherevaluesarerepresente
dby colors. In a heatmap:

 Darkercolorsindicatelargervalues.
 Lightercolorsindicatesmallervalues.

Applications:

Heatmapsareusefulforvisualizingcomplexdataliketrafficpatternsorpatienthealt
h data, where you can easily identify regions of higher or lower values.

Example:

Page7
MACHINE LEARNING [BCS602]

Invehicletrafficdata,regionswithheavytrafficarehighlightedwithdarkcolors,mak
ing it easy to spot problem areas.

Pairplot(orScatterMatrix)

Apairplot(orscattermatrix)isamatrixofscatterplotsthatshowsrelationships
between every pair of variables in a multivariate dataset.

Thismethodallowsyoutovisuallyexaminecorrelationsorrelationshipsbetween
variables.

Arandommatrixofthreecolumnsischosenandtherelationshipsofthecolumnsis
plotted as a pairplot (or scattermatrix) as shown below in Figure 2.14.

 VisualLayout:Eachscatterplotinthematrixshowstherelationshipbetwee
ntwo variables.
 Usefulness:Byexaminingthepairplot,youcaneasilyidentifypatt
erns, correlations, or clusters among the variables.

Page8
MACHINE LEARNING [BCS602]

EssentialMathematicsforMultivariateData

Intherealmofmachinelearningandmultivariatedataanalysis,severalmathematica
l concepts are foundational.

TheseincludeconceptsfromLinearAlgebra,Statistics,Probability,andOptim
ization. Below is an overview of essential mathematical tools that are
necessary for understanding and working with multivariate data.

LinearAlgebra

Linearalgebraiscrucialinmachinelearningasitprovidesthetoolsfordealingwith
data in the form of vectors and matrices. Here's a breakdown of important
topics:

 Vectors:Avectorisanorderedlistofnumbers.Itcanrepresentdatapoints
or features of an observation in a multivariate dataset.
o Dotproductandcrossproductareusedtocomputeprojectio
nsand angles between vectors.
 Matrices:Amatrixisa2Darrayofnumbers.Inmachinelearning,matriceso
ften represent data where rows are instances and columns are
features.
o Matrixmultiplicationallowsthetransformationofdataandisus
edin
variousalgorithmslikelinearregression,neuralnetworks,andm
ore.
 EigenvaluesandEigenvectors:Theseareimportantfordimensionalityr
eduction
techniquessuchasPrincipalComponentAnalysis(PCA).Theyareused
to transform data into a new basis that captures the most variance.
 DeterminantsandInverses:Thedeterminantofamatrixtellsusifthema
trixis invertible(non-

Page9
MACHINE LEARNING [BCS602]

singular).Theinverseofamatrixisusedtosolvelinearsystems of
equations.
 SingularValueDecomposition(SVD):Thisisafactorizationmethoduse
dinPCA and other dimensionality reduction techniques to decompose
a matrix into singular values and vectors.

Page10
MACHINE LEARNING [BCS602]

Statistics

Statisticsiskeytounderstandingtherelationshipsbetweendifferentvariablesin
multivariate data. Key concepts include:

 MeanandVariance:Measuresofcentraltendency(mean)andspread(var
iance) are essential to understanding the distribution of each variable.
 Covariance: Covariance measures the relationship between two
variables. A
positivecovarianceindicatesthatasonevariableincreases,theothertend
sto increase.
 Correlation:Correlationisanormalizedmeasureofcovariancethatindicat
esthe strength and direction of the relationship between two
variables.
 MultivariateNormalDistribution:Manymachinelearningalgorithmsas
sumethat the data follows a multivariate normal distribution, which
extends the idea of normal distribution to more than one variable.
 PrincipalComponentAnalysis(PCA):PCAisusedtoreducethedimensio
nalityof the dataset while retaining as much variance as possible. It
uses eigenvectors and eigenvalues to identify the principal
components.

Probability

Probabilitytheoryunderpinstheconceptofuncertainty,whichisinherentinreal-
world data:

 RandomVariables: A random variable represents a quantity

whosevalue is
subjecttochance.Inmultivariatedata,wedealwithvectorsofrandomvariabl
es.
 ProbabilityDistributions:Thesedescribethelikelihoodofvariousoutc

Page11
MACHINE LEARNING [BCS602]

omes.
Commondistributionsinmachinelearningincludethenormaldistributio
nand the multinomial distribution.

Page12
MACHINE LEARNING [BCS602]

 Bayes' Theorem:This theorem describes theprobability of an event,

based on
priorknowledgeofrelatedevents.It'sfundamentaltoalgorithmslikeNaiveB
ayes and Bayesian Inference.
 MarkovChains:Theseare usedfor modeling systems that
undergotransitions
fromonestatetoanotherwithacertainprobability,withoutmemoryofprevi
ous states.

Optimization

Optimizationiskeytofindingthebestmodelformultivariatedata.Manymachine
learning algorithms are formulated as optimization problems.

 GradientDescent:Aniterativeoptimizationalgorithmusedtominimiz
eacost function (such as in linear regression or neural networks).
 ConvexOptimization:Involvesminimizingconvexfunctions,and
playsa
significantroleinmachinelearning,asmanycostfunctionsareconv
ex.
 LagrangeMultipliers:Usedforoptimizingfunctionssubjecttoconstraints,
which is often seen in constrained optimization problems in machine
learning.

MultivariateAnalysis

 MultivariateRegression:Thisistheextensionoflinearregressiontop
redict multiple dependent variables using a set of independent
variables.
 MultivariateAnalysisofVariance(MANOVA):AnextensionofANOVAus
edwhen there are two or more dependent variables. It tests for
differences between groups.

Page13
MACHINE LEARNING [BCS602]

 FactorAnalysis:Amethodforidentifyingtheunderlyingrelationshipsbet
ween observed variables. It’s often used in exploratory data
analysis.

Page14
MACHINE LEARNING [BCS602]

GraphicalTechniquesfor MultivariateData

 ScatterPlots:Ascatterplotcanbeusedtovisualizetherelationshipbetwee
ntwo
variables.Formultivariatedata,pairplotsorscattermatricesareusedtoe
xamine the relationships between all pairs of variables.
 Heatmaps:Usedtovisualizecorrelationmatricesorcovariancematrices,
where color intensity represents the strength of the relationship.

MultivariateDataModels

 MultivariateNormalDistribution:Ageneralizationoftheunivariaten
ormal
distributiontomultiplevariables,frequentlyassumedinmultivariatestatis
tical analysis.
 MultivariateLinearModels:Modelssuchasmultipleregression,where
multiple independent variables are used to predict a set of
dependent variables.

DimensionalityReduction

Dimensionalityreductionisusedtoreducethenumberofvariablesinadatasetwhile
maintaining the essential information:

 PrincipalComponentAnalysis(PCA):Atechniquethatreducesthedimen
sionality of the dataset by projecting the data onto a set of orthogonal
axes (principal components) that explain the most variance.
 t-SNE:Atechniquefordimensionalityreductionthatiswell-
suitedforvisualizing high-dimensional data in 2D or 3D space.

FeatureEngineeringandDimensionalityReductionTechniques

Featureengineeringanddimensionalityreductionarecriticalstepsinmachinelea

Page15
MACHINE LEARNING [BCS602]

rning workflows.

Page16
MACHINE LEARNING [BCS602]

Theyensurethatmodelsarenotonlyaccuratebutalsoefficient,interpretable,and
scalable.

1. FeatureEngineering

Featureengineeringinvolvescreating,modifying,orselectingfeatures(variables)
from raw data to improve the performance of machine learning models.

TechniquesinFeature Engineering

1. FeatureCreation

2. FeatureTransformation
o Normalization:Scalingvaluestoaspecificrange,typically[0,1].
o Standardization:Transformingfeaturestohaveameanof0
anda standard deviation of 1.
o LogTransformation:Reducingtheimpactoflargevaluesbyapplyi
ngthe log function.
o PowerTransformation:Stabilizingvariancebyapplyingfuncti
onslike square root or exponential transformations.
3. HandlingMissingValues
o Imputation:Fillingmissingvalueswithstatisticalmeasures(m
ean, median, mode) or predictions from models.
o DroppingFeaturesorRows:Removingfeaturesorsampleswithexc
essive missing data.

Page17
MACHINE LEARNING [BCS602]

4. EncodingCategoricalFeatures
o LabelEncoding:Assigningnumericalvaluestocategories.
o One-HotEncoding:Creatingbinarycolumnsforeachcategory.
o TargetEncoding:Replacingcategorieswiththemeanofthet
arget variable.
5. FeatureSelection
o FilterMethods:Usingstatisticaltests(e.g.,correlation,chi-
square)to select features.
o WrapperMethods:Selectingfeaturesbasedontheperforman
ceofa model (e.g., recursive feature elimination).
o EmbeddedMethods:Featureselectionintegratedintomodeltraini
ng(e.g., regularization methods like LASSO).

DimensionalityReduction

Dimensionalityreductionaimstoreducethenumberoffeatureswhilepreservingas
much relevant information as possible.

Ithelpscombatissueslikeoverfitting,highcomputationalcosts,andthecurseof
dimensionality.

TechniquesforDimensionalityReduction

1. PrincipalComponentAnalysis(PCA)
o Purpose:Identifiesdirections(principalcomponents)inthedat
athat explain the maximum variance.
o Projectsdataontoanewcoordinatesystemwhereeachaxisrepresent
sa principal component.
o Capturesthemostvarianceinthefirstfewcomponents.

Applications:Commonlyusedinimagecompression,geneexpr
ession analysis, and exploratory data analysis.

Page18
MACHINE LEARNING [BCS602]

2. LinearDiscriminantAnalysis(LDA)
o Purpose:SimilartoPCAbutfocusesonmaximizingclassseparabilit
yin supervised learning tasks.
o Projectsdataontoalower-
dimensionalspacewhilemaintainingclass distinction.

Applications:Oftenusedinclassificationproblems.

3. t-DistributedStochasticNeighborEmbedding (t-SNE)
o Purpose:Reduceshigh-dimensionaldatato2Dor3Dforvisualization.
o Preservesthelocalstructureofthedatawhilesacrificingglobalstructure.

Applications:Usefulforvisualizingclustersinhigh-
dimensionaldatalike embeddings.

4. Autoencoders(DeepLearning-BasedReduction)
o Purpose:Learnsacompressedrepresentationofthedatausingn
eural networks.
o Theencodercompressesthedata,andthedecoderreconstructsit.
o Thebottlenecklayerrepresentsthereduceddimensions.

Applications:Imagecompression,anomalydetection,andgenerativ
e models.

5. Feature Agglomeration
o Purpose:Groupsfeatureswithsimilarcharacteristics(hierarc
hical clustering for features).
o Combinesredundantfeatures intoasinglerepresentativefeature.

Applications:Usefulfordatasetswithmanycorrelatedfeatures.

Page19
MACHINE LEARNING [BCS602]

6. IndependentComponentAnalysis (ICA)
o Purpose:Decomposesdataintostatisticallyindependentcomponents.
o Usefulforsignalswithnon-Gaussiandistributions.

Applications:Signalprocessing,suchasseparatingaudiosignalsinth
e "cocktail party problem."

7. FactorAnalysis
o Purpose:Identifiesunderlyinglatentvariables
(factors)thatexplain observed variables.
o Assumesthatobserveddataisinfluencedbyasmallernumber
of unobservable factors.

Applications:Psychometrics,finance,andsocialsciences.

8. BackwardFeatureElimination
o Purpose:Iterativelyremovesfeaturesthathavetheleastimpacto
nthe target variable.
o Usesatrainedmodel'sperformanceasthecriterion.

Applications:Effectiveforsmalldatasetswherecomputationalcostisn’ta
concern.

CombiningFeatureEngineeringandDimensionality

Reduction Pipeline Integration:

Manymachinelearningframeworks(e.g.,scikit-
learn)supportbuildingpipelineswhere feature engineering and dimensionality
reduction steps are automated.

HybridMetho

ds: For

Page20
MACHINE LEARNING [BCS602]

example:

Page21
MACHINE LEARNING [BCS602]

o CombinePCAwithfeatureselectiontoreducenoiseandretainrel
evant features.
o Useautoencoderstogeneratecompactfeatures,thenapplysupe
rvised learning techniques.

Applicati

ons Text

Data:

o UseTF-
IDFforfeaturecreationandLatentSemanticAnalysis(LSA)for
dimensionality reduction.

ImageData:

o ApplyConvolutionalAutoencodersorPCAforreducingpixel-baseddatadimensions.

GenomicData:

o Use PCAort-SNEtovisualizehigh-dimensionalgeneexpressiondata.

SensorData:

o CombineFouriertransformsforfeatureextractionandPCAfordimens
ionality reduction.

BestPractices

UnderstandData:Alwaysbeginwithexploratorydataanalysis(EDA)tounderstan
d feature importance and relationships.

DomainKnowledge:Incorporatedomainexpertisetocreatemeaningfulfeatures.

Page22
MACHINE LEARNING [BCS602]

AvoidOver-
Reduction:Ensurethatdimensionalityreductiontechniquesretainsufficient
information to build an accurate model.

Page23
MACHINE LEARNING [BCS602]

Evaluate:Continuouslyevaluatefeatureengineeringanddimensionalityreductio
nusing cross-validation.

Page24
MACHINE LEARNING [BCS602]

Chapter– 02

BasicLearningTheory

DesignofLearningSystem

Alearningsystemisacomputationalsystemthatusesalgorithmstolearnfromdataor
experiences to improve its performance over time.

Thedesignofsuchsystemsfocusesonthefollowingessentialsteps:

Choosing a Training Experience

Thefirststepinbuildingalearningsystemisselectingthetypeoftrainingexperience
it will use to learn. This involves determining the source of dataandhow it will
be used.

TypesofTrainingExperience:

DirectExperience:

 Thesystemisexplicitlyprovidedwithexamplesofboardstatesandtheircorr
ect moves.
 Example:Inachessgame,thesystemisgivenspecificboardstatesandtheoptim
al moves for those states.

IndirectExperience:

 Insteadofexplicitguidance,thesystemisprovidedwithsequencesofmovesa
nd their results.
 Example:Thesystemobservestheoutcome(winorloss)ofdifferentmo
ve sequences and learns to optimize its strategy.

Page25
MACHINE LEARNING [BCS602]

Supervisedvs.UnsupervisedTraining:

Insupervisedtraining,asupervisorlabelsallvalidmovesforagivenboardstate.

Intheabsenceofasupervisor,thesystemusesself-playorexplorationtolearn.For
example,achessagentcanplaygamesagainstitselfandidentifysuccessfulmoves.

TrainingDataDistribution:

o Forreliableperformance,trainingsamplesmustcoverawiderangeofscenarios.
o Ifthetrainingdataandtestingdatahavesimilardistributions,thesyste
m's performance will be better.

DeterminingtheTargetFunction

Thetargetfunctionrepresentstheknowledgethesystemneedstolearn.

Itspecifiesthegoalofthelearningsystemandwhatitistryingtopredictoroptimize.

Page26
MACHINE LEARNING [BCS602]

RepresentationoftheTarget Function

Oncethetargetfunctionisdefined,thenextstepisdecidinghowtorepresentit.The
representation depends on the complexity of the problem and the available
computational resources.

CommonRepresentations:

LookupTables:

 Usedforsimpleproblemswhereallpossiblestatesandactionscanbeenumerated.
 Example:Asmallchessboardwithalimitednumberofmoves.

MathematicalFunctions:

 Representedusingequationsormodels(e.g.,linearregressionorpolyno
mial equations).

MachineLearningModels:

 Forcomplexsystems,modelslikeneuralnetworks,decisiontrees,orsupportve
ctor machines are used to approximate the target function.
 Example:Usinganeuralnetworktopredictthebestchessmovesbasedonboa
rd states.

FunctionApproximation

Inmostreal-worldproblems,thetargetfunctionistoocomplextoberepresented
exactly. Instead, an approximation of the target function is learned.

Approaches

toApproximation:

Parametric Models:

Page27
MACHINE LEARNING [BCS602]

Modelswithafixednumberofparameters(e.g.,linearregression,neuralnetwo

rks). Non-Parametric Models:

Modelsthatadapttheircomplexitytotheamountofdata(e.g.,k-nearestneighbors,
decision trees).

LearningAlgorithms:

o Algorithms like gradient descent, reinforcement learning, or evolutionary

algorithms are used to optimize the parameters of the function.
o Example:Inachessgame,reinforcementlearningallowstheagenttolearnbyt
rial and error, optimizing its strategy over time.

PracticalExample:DesigningaChessLearningSyste

m Training Experience:

Useacombinationofself-play(indirectexperience)andhistoricalgamedata(direct
experience).

TargetFunction:

DefinethetargetfunctionasselectingthebestmoveMgiventheboardstateB:

RepresentationoftheTargetFunction:

Useadeepneuralnetworktorepresentthetargetfunction,whereinputsareboard
states and outputs are move probabilities.

FunctionApproximation:

Page28
MACHINE LEARNING [BCS602]

Traintheneuralnetworkusingreinforcementlearning,withrewardsbasedonthe
outcome of games played by the system.

Introduction toConceptofLearning

Conceptlearningisastrategyinmachinelearningthatinvolvesacquiringabstract
knowledge or inferring general concepts from the given training data.

Itenablesthelearnertogeneralizefromspecifictrainingexamplesandclassifyobje
cts or instances based on common, relevant features.

WhatisConceptLearning?

Conceptlearningistheprocessofabstractionandgeneralizationfromdata,where:

 Thelearneridentifiescommonfeaturessharedbypositiveexamples.
 Itusesthesefeaturestoclassifynewinstancesintocategories.

Itinvolves:

 Comparingandcontrastingcategoriesbyanalyzingpositiveand
negative examples.
 Simplifyingobservationsfromtrainingdataintoamodelorhypothesis.
 Applyingthismodeltoclassifyfuturedata.

Thisprocessisalsoknownaslearningfromexperie

nce. Features of Concept Learning

Categorization:

o Conceptlearningenablesclassificationofobjectsbasedonasetofrelev
ant features.

Page29
MACHINE LEARNING [BCS602]

o Forexample,humansclassifyanimalslikeelephants,cats,ordogsbased
on specific distinguishing features.

Boolean-ValuedFunction:

o EachconceptorcategorylearnedisrepresentedasaBooleanfunctionthatretu
rns true or false:
 Trueforpositiveexamplesthatbelongtothecategory.
 Falsefornegativeexamplesthatdonotbelongtothecategory.

Example:

o Humanscategorize animals byrecognizingfeaturessuchas:

 Size,shape,color,andbehavior.
o Forexample,toidentifyanelephant:
 Largesize,trunk,tusks,andbigearsarethespecificfeatures.

FormalDefinitionof ConceptLearning

ConceptlearningistheprocessofinferringaBoolean-
valuedfunctionbyprocessing training examples.

Thegoalisto:

1. Identifyasetofspecificorcommonfeatures.
2. Usethesefeaturestodefineatargetconceptforclassifyingobjects.

ComponentsofConceptLea

rning Input:

o Alabeledtrainingdatasetconsistingof:
 Positiveexamples:Instancesthatbelongtothetargetconcept.

Page30
MACHINE LEARNING [BCS602]

 Negativeexamples:Instancesthatdonotbelongtothetar
get concept.
o Thelearnerusesthispastexperiencetotrainthemodel.

Output:

o TheTargetConceptorTargetFunctionf(x):
 Afunctionf(x)mapsinputxtooutputy.
 Theoutputisusedtodeterminetherelevantfeaturesf
or classification.
o Example:Identifyinganelephantrequiresaspecificsetoffeaturessuchas"ha
sa trunk" and "has tusks."

Testing:

o Newinstancesareprovidedtotestthelearnedmodel.
o Thesystemclassifiesthesenewinstancesbasedonthehypothesisderiveddur
ing training.

ProcessofConceptLear

ning Training:

o Thelearnerobservesasetoflabeledexamples(positiveandnegativeinstances).
o Itidentifiescommon,relevantfeaturesfromthepositiveexamplesandcontr
asts them with negative examples.

HypothesisFormation:

o Thesystemgeneratesahypothesistorepresentthetargetconcept.
o Example:"Anelephanthasatrunkandtusks"couldbethehypothesistoclassify
an elephant.

Page31
MACHINE LEARNING [BCS602]

Generalization:

o Thehypothesisisgeneralizedtoclassifynewinstancescorrectly.

TestingandValidation:

o Thelearnedmodelistestedonunseendatatoevaluateitsperformance.

Example:Concept LearningforAnimals

Input:Trainingdatasetofanimalswithlabeledfeatures.

o Positiveexamples:Animalslabeledas"elephants."
o Negativeexamples:Animalsnotlabeledas"elephants."

Output:Targetconceptforanelephant,e.g.,"hasatrunk,""hastusks,"and"largesi

ze."Testing: New animal instances are classified based on the learned

concept.

ApplicationsofConceptLearning

1. NaturalLanguageProcessing:Categorizingwordsorsentencesb
asedon grammatical or semantic features.
2. ImageRecognition:Identifyingobjectsorpatternsinimages.
3. RecommendationSystems:Classifyingproductsorservicestop
rovide personalized recommendations.
4. MedicalDiagnosis:Identifyingdiseasesbasedonsymptomsandmedic
altest results.

ModellinginMachineLearning

Amachinelearningmodelabstractsatrainingdatasetandmakespredictionson
unseen data.

Page32
MACHINE LEARNING [BCS602]

Training:Involvesfeedingtrainingdataintoamachinelearningalgorithm,tuning
parameters, and generating a predictive model.

Goals:Selectingtherightmodel,trainingeffectively,reducingtrainingtime,and
achieving high performance on unseen data.

TypesofParameters:

ModelParameters:Learnabledirectlyfromtrainingdata(e.g.,regressioncoe
fficients, decision tree splits, neural network weights).

Hyperparameters:Cannotbelearneddirectlyandmustbeset(e.g.,regula
rization strength, number of trees in random forests).

EvaluationandErrorMetrics

Dataset Splitting:

o Trainingdataset:Usedtotrainthemodel.
o Testdataset:Usedtoevaluatethemodel'sabilitytogeneralize.

ErrorTypes:

o TrainingError(In-sampleError):Errorwhenthemodelistestedontrainingdata.
o TestError (Out-of-sampleError):Errorwhenpredictingonunseentestdata.

LossFunction:Measurespredictionerror.Example:MeanSquaredError(MSE
)—a smaller value indicates higher accuracy.

StepsinMachineLearningProcess

AlgorithmSelection:Chooseamodelsuitablefortheproblemanddat

aset. Training: Train the selected algorithm on the dataset.

Page33
MACHINE LEARNING [BCS602]

Tuning:Adjustparameterstoimproveaccuracy.

Evaluation: Validate the model using test

data. Model Selection and Evaluation

Challenges:

Balancingperformance(accuracy)andcomplexity(overfittingorunderfitting).

Approaches:

1. Resamplingmethodslikesplittingdatasetsorcross-validation.
2. Calculatingaccuracyorerrormetrics.
3. Probabilistic frameworksforscoringmodel performance.

ResamplingMethods

RandomTrain/

TestSplits:Randomlysplitthedatafortrainingandtesting. Cross-

Validation:Tunemodelsbysplittingdataintofolds:

o K-foldCross-Validation:Splitdataintokparts,trainonk-
1folds,andtestonthe remaining fold.

Page34
MACHINE LEARNING [BCS602]

o StratifiedK-
fold:Ensureseachfoldcontainsaproportionatedistributionofclass labels.

Page35
MACHINE LEARNING [BCS602]

o Leave-One-OutCross-
Validation(LOOCV):Trainonalldataexceptoneinstance; repeat for
every instance.

Page36
MACHINE LEARNING [BCS602]

VisualizingModelPerformance

ROCCurve(ReceiverOperatingCharacteristic):

o PlotsTruePositiveRatevs.FalsePositiveRate.
o AreaUndertheCurve(AUC):Measuresclassifierperformanc
e(1.0= perfect, closer to diagonal = less accurate).

Precision-RecallCurve:

o Usefulforimbalanceddatasets toevaluateprecisionandrecall.

ScoringandComplexityMethods

ScoringModels:Combinemodelperformanceandcomplexityintoasinglescore.

Example: Minimum Description Length (MDL):

Selectsthesimplestmodelwiththefewestbitstorepresentbothdataandpredictions.

Page37
MACHINE LEARNING [BCS602]

Page38

Lyrical Syllabus
No ratings yet
Lyrical Syllabus
1 page
Machine Learning Unit 1
100% (7)
Machine Learning Unit 1
112 pages
AI Module 1, 2 Notes 4 TH Sem Updated
No ratings yet
AI Module 1, 2 Notes 4 TH Sem Updated
41 pages
Computer Vision Lecture Notes All
No ratings yet
Computer Vision Lecture Notes All
18 pages
The BFG - An Integrated English Unit
No ratings yet
The BFG - An Integrated English Unit
7 pages
ML Unit-1
100% (2)
ML Unit-1
12 pages
ME P4252-II Semester - MACHINE LEARNING
No ratings yet
ME P4252-II Semester - MACHINE LEARNING
48 pages
Solutions To Exercises-Alpaydin
33% (3)
Solutions To Exercises-Alpaydin
64 pages
Question Bank Module-1 Questions. Introduction and Concept Learning
No ratings yet
Question Bank Module-1 Questions. Introduction and Concept Learning
6 pages
1-Mapping Problems To Machine Learning Tasks
No ratings yet
1-Mapping Problems To Machine Learning Tasks
19 pages
CS2351 - Artificial Intelligence-2 Marks
100% (1)
CS2351 - Artificial Intelligence-2 Marks
16 pages
BCS602 Module 1
No ratings yet
BCS602 Module 1
35 pages
Chapter 02 Understanding of Data
No ratings yet
Chapter 02 Understanding of Data
96 pages
@vtucode - in 21AI54 Question Bank 2021 Scheme
No ratings yet
@vtucode - in 21AI54 Question Bank 2021 Scheme
5 pages
AI Modue1 5TH SEMESTER VTU bcs515b Notes
No ratings yet
AI Modue1 5TH SEMESTER VTU bcs515b Notes
82 pages
Bai602 Ml Lesson Plan 2024-25 Even Aiml Dept
No ratings yet
Bai602 Ml Lesson Plan 2024-25 Even Aiml Dept
5 pages
BCS515B
0% (1)
BCS515B
2 pages
Module-02 AIML 21CS54
67% (3)
Module-02 AIML 21CS54
27 pages
Ai Lab Manual
No ratings yet
Ai Lab Manual
37 pages
Python Ppt-Module1 - 109202
No ratings yet
Python Ppt-Module1 - 109202
30 pages
Ai (Bad402)
100% (2)
Ai (Bad402)
4 pages
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
No ratings yet
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
54 pages
BAI602-ML-I
No ratings yet
BAI602-ML-I
4 pages
Machine Learning Question Paper Solved ML
No ratings yet
Machine Learning Question Paper Solved ML
55 pages
CP5191 NAAC - Machine Learning Techniques Lesson Plan - M.E 2017
No ratings yet
CP5191 NAAC - Machine Learning Techniques Lesson Plan - M.E 2017
4 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
2 pages
CCS350 KNOWLEDGE ENGINEERING - Syllabus
No ratings yet
CCS350 KNOWLEDGE ENGINEERING - Syllabus
2 pages
Module-2: Microcontroller and Embedded Systems
No ratings yet
Module-2: Microcontroller and Embedded Systems
74 pages
Lecturernotes - Module - 3 - BCS515D - Distributed Systems
No ratings yet
Lecturernotes - Module - 3 - BCS515D - Distributed Systems
11 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
AIML Module 3
No ratings yet
AIML Module 3
25 pages
AI - Model Paper
No ratings yet
AI - Model Paper
2 pages
21cs502 Unit 4 Ai Notes Short
No ratings yet
21cs502 Unit 4 Ai Notes Short
32 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
MCS101-Artificial Intelligence
100% (1)
MCS101-Artificial Intelligence
3 pages
Bcs302 Complete Notes
No ratings yet
Bcs302 Complete Notes
179 pages
Algorithms As A Technology
No ratings yet
Algorithms As A Technology
4 pages
ML OLD Question Paper
50% (6)
ML OLD Question Paper
2 pages
@vtucode - in BCS515B Module 3 Textbook
No ratings yet
@vtucode - in BCS515B Module 3 Textbook
32 pages
Vtu 4th Sem Syllabus PDF
0% (1)
Vtu 4th Sem Syllabus PDF
21 pages
Machine Learning Unit 2 MCQ
No ratings yet
Machine Learning Unit 2 MCQ
17 pages
AIML 4th and 5th Module Notes
No ratings yet
AIML 4th and 5th Module Notes
77 pages
Solution
No ratings yet
Solution
18 pages
AIML LAB MANAUAL R23
100% (1)
AIML LAB MANAUAL R23
10 pages
Machine Learning-Unit-V-Notes
No ratings yet
Machine Learning-Unit-V-Notes
23 pages
Ccs337 - Cognitive Science Laboratory Lab Manual Record
No ratings yet
Ccs337 - Cognitive Science Laboratory Lab Manual Record
27 pages
BCS306A Justification CSE
100% (3)
BCS306A Justification CSE
3 pages
BAD402 AI Module 3 Notes
No ratings yet
BAD402 AI Module 3 Notes
29 pages
Unit Iii ML MCQ
100% (1)
Unit Iii ML MCQ
7 pages
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
100% (1)
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
2 pages
Question Bank BCS306A
No ratings yet
Question Bank BCS306A
2 pages
MACHINE LEARNING AL3451
No ratings yet
MACHINE LEARNING AL3451
10 pages
ML Technical Book (3170724) - 1-29
50% (2)
ML Technical Book (3170724) - 1-29
29 pages
Lecture 3 Hypothesis Space & Inductive Bias
No ratings yet
Lecture 3 Hypothesis Space & Inductive Bias
29 pages
Check - Circle: Thumb - Up Thumb - Down
0% (1)
Check - Circle: Thumb - Up Thumb - Down
3 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
CS8603 Distributed Systems
No ratings yet
CS8603 Distributed Systems
11 pages
DMW Question Paper
0% (1)
DMW Question Paper
7 pages
OS Unit - 5 Notes
100% (1)
OS Unit - 5 Notes
34 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
ML. MODEL 2
No ratings yet
ML. MODEL 2
31 pages
textbook ML_removed (1)
No ratings yet
textbook ML_removed (1)
22 pages
BPLCK105D
No ratings yet
BPLCK105D
5 pages
Solution Chapter 1
No ratings yet
Solution Chapter 1
5 pages
Pop
No ratings yet
Pop
2 pages
Bright and Weak Students
No ratings yet
Bright and Weak Students
2 pages
18CSL48 PART-A Pgms
No ratings yet
18CSL48 PART-A Pgms
5 pages
Revised B.tech I Year 2022 2023
No ratings yet
Revised B.tech I Year 2022 2023
1 page
Nora Between Memory and History
No ratings yet
Nora Between Memory and History
19 pages
Process Engineer
No ratings yet
Process Engineer
3 pages
109 學年度校本部暨南大校區 106 學年度以後入學學生畢業離校程序公告 (2021 School leaving procedure notice)
No ratings yet
109 學年度校本部暨南大校區 106 學年度以後入學學生畢業離校程序公告 (2021 School leaving procedure notice)
2 pages
Quot Ingilis Dili Quot Happy Campers Asas Xarici Dil Fanni Uzra 4 Cu Sinif Ucun Metodik Vasait 1692870452 739
No ratings yet
Quot Ingilis Dili Quot Happy Campers Asas Xarici Dil Fanni Uzra 4 Cu Sinif Ucun Metodik Vasait 1692870452 739
99 pages
Personal Presentation
100% (1)
Personal Presentation
4 pages
164575
No ratings yet
164575
6 pages
CAR Sample Proposal
No ratings yet
CAR Sample Proposal
38 pages
Was Recognised by Govt of India Istd Co in
No ratings yet
Was Recognised by Govt of India Istd Co in
21 pages
Internship Report On Customer Satisfaction Level of Aarong A Social Enterprise
No ratings yet
Internship Report On Customer Satisfaction Level of Aarong A Social Enterprise
33 pages
Brief Introduction to Pointers
No ratings yet
Brief Introduction to Pointers
9 pages
Co2 Lesson Plan
No ratings yet
Co2 Lesson Plan
9 pages
EDS 111 - Module 3E
No ratings yet
EDS 111 - Module 3E
7 pages
bai-tap-tieng-anh-11-friends-global-unit-1
No ratings yet
bai-tap-tieng-anh-11-friends-global-unit-1
2 pages
Brahms Kierkegaard and Repetition Three Intermezzi
100% (1)
Brahms Kierkegaard and Repetition Three Intermezzi
17 pages
LITO-LP-10, 1st Types of Wires
No ratings yet
LITO-LP-10, 1st Types of Wires
5 pages
Critical Thinking and Logical Reasoning
100% (1)
Critical Thinking and Logical Reasoning
10 pages
The Life of Mozart
No ratings yet
The Life of Mozart
12 pages
ACTIVITY-MATRIX Parents Orientation
No ratings yet
ACTIVITY-MATRIX Parents Orientation
5 pages
CAPE Information Technology: U2 - M1 - O6 Data Flow Diagrams
No ratings yet
CAPE Information Technology: U2 - M1 - O6 Data Flow Diagrams
37 pages
PR1 Form For Title Defense
No ratings yet
PR1 Form For Title Defense
3 pages
Leadership Competency Development Guide Thoroughness
No ratings yet
Leadership Competency Development Guide Thoroughness
23 pages
Early Wake Up Benefits
No ratings yet
Early Wake Up Benefits
2 pages
Essay
No ratings yet
Essay
16 pages
Latest DLL in English 4
100% (2)
Latest DLL in English 4
19 pages
Literature Review Teamwork Theories
100% (2)
Literature Review Teamwork Theories
9 pages
RRL Diversity
100% (1)
RRL Diversity
20 pages
BSC (Hons) Management With Work Placement (Ummn-Akb14) : Course and Award Details
No ratings yet
BSC (Hons) Management With Work Placement (Ummn-Akb14) : Course and Award Details
8 pages