0% found this document useful (0 votes)

13 views

machine learning unit 2

Uploaded by

Nalini Bangaram

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

machine learning unit 2

Uploaded by

Nalini Bangaram

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Process of Machine Learning

Data set:
A dataset is a collection of related information or records.The information may be on some entity or some subject
area. Each row of a data set is called a record. Each data set also has multiple attributes, each of which gives
information on a specific characteristic. For example, in the dataset on students, there are four attributes namely Roll
Number, Name, Gender, and Age, each of which understandably is a specific characteristic about the student entity.
Attributes can also be termed as feature, variable, dimension or field.

Types of DATA:
Data can broadly be divided into following two types:
1. Qualitative data
2. Quantitative data

1.Qualitative data: Qualitative data provides information about the quality of an object or information which cannot
be measured. For example, if we consider the quality of performance of students in terms of ‘Good’, ‘Average’, and
‘Poor’. Also, name or roll number of students are information that cannot be measured using some scale of
measurement. Qualitative data is also called categorical data.
Qualitative data can be further subdivided into two types as follows:
 Nominal data: has named value
 Ordinal data:has named value which can be naturally ordered

 Nominal data is one which has no numeric value, but a named value. It is used for assigning named
values to attributes. Nominal values cannot be quantified. Examples of nominal data are

 Bloodgroup: A,B,O,AB,etc.
 Nationality: Indian, American, British,etc.
 Gender: Male, Female, Other

It is obvious, mathematical operations such as addition, subtraction, multiplication, etc. cannot be

performed on nominal data. For that reason, statistical functions such as mean, variance, etc.can also not
be applied on nominal data.
However, a basic count is possible. So mode, i.e. most frequently occurring value, can be identified for
nominal data.

 Ordinal data, in addition to possessing the properties of nominal data, can also be naturally ordered. This
means ordinal data also assigns named values to attributes but unlike nominal data,they can be arranged
in a sequence of increasing or decreasing value so that we can say whether a value is better than or
greater than another value. Examples of ordinal data are
 Customer satisfaction:‘VeryHappy’,‘Happy’,‘Unhappy’,etc.
 Grades: A,B,C,etc.
 Hardness of Metal:‘VeryHard’,‘Hard’,‘Soft’,etc.

Like nominal data, basic counting is possible for ordinal data. Hence, the mode can be identified. Since
ordering is possible incase of ordinal data, median can be identified in addition. Mean can still not be
calculated.

2.Quantitative data: Quantitative data relates to information about the quantity of an object – hence it can be
measured. There are two types of quantitative data:
 Intervel data
 Ratio data
 Interval data: numeric data for which the exact difference between values is known. An ideal
example of interval data is Celsius temperature. For example, the difference between 12°C and 18°C
degrees is measurable and is 6°C. However, such data do not have something called a ‘true zero’
value. For example, there is nothing called ‘0 temperature’ or ‘no temperature’. Hence, only addition
and subtraction applies for interval data.
For interval data, mathematical operations such as addition and subtraction are possible. For that
reason, for interval data, the central tendency can be measured by mean, median, or mode. Standard
deviation can also be calculated.

 Ratio data: numeric data for which exact value can be measured and absolute zero is available.
Examples of ratio data include height, weight, age, salary, etc.
For Ratio data, mathematical operations such as addition and subtraction are possible. For that
reason, for interval data, the central tendency can be measured by mean, median, or mode. Standard
deviation can also be calculated.

Note: Measures of central tendency help to understand the central point of a set of data. Standard
measures of central tendency of data are mean, median, and mode.

Structure of data:
By now, we understand that in machine learning, we have two basic data types – numeric and categorical.
So, We need to understand that in a data set, which of the attributes are numeric and which are categorical
in nature.This is because, the approach of exploring numerical data is different than the approach of
exploring categorical data.

Exploring numerical data:

There are two most effective mathematical plots to explore numerical data–boxplot and histogram.
1.Box Plot:
A boxplot (also known as a box-and-whisker plot) is a graphical representation of a data distribution.

It provides a summary of the data through its five-number summary:

1. Minimum: The smallest data point in the dataset (excluding outliers).
2. First Quartile (Q1): The median of the lower half of the data (25th percentile).
3. Median (Q2): The middle value of the dataset (50th percentile).
4. Third Quartile (Q3): The median of the upper half of the data (75th percentile).
5. Maximum: The largest data point (excluding outliers).

Key Features of a Boxplot:

 Box: The central box is drawn between the first quartile (Q1) and third quartile (Q3), showing the
interquartile range (IQR).
 Whiskers: Lines extending from the box to the smallest and largest data points within 1.5 times the
IQR from Q1 and Q3.
 Outliers: Data points outside the whiskers (more than 1.5 times the IQR from Q1 or Q3).

A boxplot is useful for:

 Visualizing the spread of the data.
 Identifying the median, quartiles, and potential outliers.
 Comparing distributions across different categories or groups.
 Boxplots for above table:
 Boxplots for above table:
Boxplots for above table

2.Histogram:
A histogram is a type of bar chart that represents the distribution of a dataset. It displays the frequency of
data points within specific intervals, called bins. Each bar in a histogram represents the count (or
frequency) of data points that fall within a certain range.
Key Features of a Histogram:
1. Bins (Intervals): The data is divided into intervals, also called bins. The bins represent a range of
values, and the width of each bin corresponds to the interval size.
2. Bars: Each bar's height represents the number of data points that fall within the corresponding bin's
range. Taller bars indicate higher frequencies, and shorter bars indicate lower frequencies.
3. X-Axis: Represents the values of the data, divided into bins.
4. Y-Axis: Represents the frequency or count of data points in each bin.

How Histograms Are Used:

 Visualizing Distribution: Histograms provide insight into the distribution of the data (e.g., whether
it's normally distributed, skewed, or has multiple peaks).
 Identifying Patterns: They can highlight patterns, trends, and potential outliers.
 Comparing Data: Multiple histograms can be compared to visualize the distribution of different
datasets.
Example of Data Representation in a Histogram:
 Data: Test scores from 0 to 100.
 Bins: You could create bins like 0-10, 10-20, 20-30, etc.
 Bars: Each bar would show how many test scores fall within each bin range.

Exploring Categorical data: Most popularly we use scatter plot technic for Categorical data

Scatter plot:
A scatter plot is a type of data visualization that displays the relationship between two continuous
variables. Each point on the plot represents a pair of values from the dataset, with the horizontal axis (x-
axis) representing one variable and the vertical axis (y-axis) representing the other.
Key Features of a Scatter Plot:

1. Data Points: Each point represents an observation in the dataset, plotted based on its values for
the two variables.
2. Axes: The x-axis and y-axis represent the two variables you're comparing.
o X-Axis: One variable, typically independent (e.g., time, age, etc.).
o Y-Axis: The other variable, typically dependent (e.g., height, sales, etc.).
3. Trend or Correlation: A scatter plot can help identify the correlation between the two variables:
o Positive correlation: Points tend to rise from left to right (upward slope).
o Negative correlation: Points tend to fall from left to right (downward slope).
o No correlation: Points are scattered randomly with no discernible pattern.

Uses of a Scatter Plot:

 Identifying Relationships: It helps in visually assessing whether two variables have a linear or
non-linear relationship.
 Detecting Outliers: Outliers appear as points that deviate significantly from the general trend.
 Correlations: It is especially useful for identifying whether a positive, negative, or no correlation
exists between the variables.

Data Quality and Remediation:

Data quality refers to the condition or fitness of data for its intended use. High-quality data is accurate,
complete, consistent, timely, and relevant, while poor-quality data can lead to errors, misleading
conclusions, and poor decision-making.

Success of machine learning depends largely on the quality of data. A data which has the right quality
helps to achieve better prediction accuracy, in case of supervised learning. However, it is not realistic to
expect that the data will be flawless.

Data remediation involves the process of identifying, correcting, and improving data quality issues within a
dataset. It ensures that the data is suitable for analysis, reporting, and decision-making.
Common Data Quality Issues and Remediations:

1. Missing Data: When values for certain fields are missing (e.g., missing customer email addresses).
o Remediation: You can either fill in missing data (imputation), remove incomplete records, or
flag them as "unknown."
2. Duplicate Records: Identical or very similar records appearing multiple times in the dataset.
o Remediation: Use deduplication techniques to identify and merge duplicates or remove
redundant records.
3. Inconsistent Formatting: Data represented in multiple formats, such as date formats
(MM/DD/YYYY vs. DD/MM/YYYY) or inconsistent naming conventions.
o Remediation: Standardize data formats using automated rules, ensuring consistency across
records.
4. Outliers and Invalid Data: Extreme values or data entries that fall outside the expected range or
are clearly erroneous (e.g., a negative value for age).
o Remediation: Identify and investigate outliers to determine whether they are errors. You
may replace, remove, or correct such data points.
5. Inaccurate Data: Data entries that are incorrect or not up-to-date.
o Remediation: Validate data accuracy by cross-referencing with trusted sources or through
manual review, and update as necessary.

Data modeling:
To increase the level of accuracy of machine, human participation should be added to the machine learning
process. For this we follow mainly 4 steps:
Step 1-DATA PRE-PROCESSING :
a.Dimensionality reduction:
Dimensionality reduction in machine learning is the process of reducing the number of input variables
(features) in a dataset while preserving as much of the relevant information as possible. This helps
improve model performance, reduce overfitting, and decrease computational cost.

High-dimensional data sets need a high amount of computational space and time. At the same time, not
all features are useful – they degrade the performance of machine learning algorithms. Most of the
machine learning algorithms perform better if the dimensionality of dataset, i.e. the number of features in
the data set, is reduced. Dimensionality reduction helps in reducing irrelevance and redundancy in
features. Also, it is easier to understand a model if the number of features involved in the learning activity
is less.
Common techniques include:
 Principal Component Analysis (PCA): A linear method that transforms the data into a smaller set of
orthogonal components (principal components), capturing the most variance in the data.
 Singular Value Decomposition (SVD): A matrix factorization method that decomposes a matrix into
singular values, used in tasks like topic modeling and recommendation systems.

b.Feature Subset Selection :

Feature subset selection is a technique in machine learning used to select a subset of relevant
features (or variables) from the original set of features in a dataset. The goal is to improve model
performance by reducing overfitting, decreasing computational complexity, and enhancing model
interpretability, all while retaining the most important information.

It may seem that a feature subset may lead to loss of useful information as certain features are going to
be excluded from the final set of features used for learning. However, for elimination only features which
are not relevant or redundant are selected. All irrelevant features are eliminated while selecting the final
feature subset.

There are three methods to perform feature subset selection, which can be categorized as:
1. Filter Methods
2. Wrapper Methods
3. Embedded Methods
Step 2:Learning of the Data model:
1.Selecting a Model in Machine Learning
Selecting the right model for your machine learning task is a crucial step in building an effective solution.
The model you choose depends on the nature of the data, the problem you're trying to solve, and the
computational resources available.
Choosing the right model is critical for the learning process. Different models have different strengths and
weaknesses, and their suitability depends on the type of problem you're trying to solve.
Types of Machine Learning Models:
 Linear Models
 Decision Trees
 Support Vector Machines (SVM)
 K-Nearest Neighbors (KNN)

2.Training the Model

Once you've selected the model, the next step is to train it on the training data. This involves fitting the
model to the data by adjusting its internal parameters (e.g., coefficients, weights) based on the input-output
relationships in the training set.
Training a machine learning model is the process of teaching the model to make predictions based on data.
This involves feeding the model with a training dataset, allowing it to learn the patterns and relationships in
the data, and then refining its internal parameters (e.g., weights, coefficients) to minimize error and improve
performance.
Steps:

 Initialize the Model: Create an instance of the model class.

 Train the Model: Use the fit() function to train the model on the training dataset.

3.Model Representation And Interpretability:

Model Representation:
A machine learning model cannot understand the given data directly. We have to create a representation of
the data to provide the model. The Model representation refers to how the machine learning model
captures, encodes, and transforms data into a structure that can be used to make predictions or
classifications.
Interpretability:
Interpretability refers to the extent to which a human can understand the reasons behind a model's
predictions. It's the ability to explain, in human terms, why a model made a certain decision or prediction.
This is especially important for complex models (like deep learning) and in high-stakes applications (e.g.,
healthcare, finance, criminal justice).

Step 3:Analysing performance Evaluation of model:

There are mainly 3 ways of analyzing methods to evaluate a data model
1.classification
2.regression
3.clustering

1.classification: The Classification algorithm is a Supervised Learning technique that is used to identify
the category of new observations on the basis of training data. In Classification, a program learns from the
given dataset or observations and then classifies new observation into a number of classes or groups.
Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels
or categories.

The main goal of the Classification algorithm is to identify the category of a given dataset, and these
algorithms are mainly used to predict the output for the categorical data.
Classification algorithms:
1. decision tree algorithm
2.random forest algorithm
3.Support vector machine algorithm
Classification of class A and Class B

2.Regression :
It is a supervised machine learning technique, used to predict the value of the dependent variable for
new, unseen data. It models the relationship between the input features and the target variable, allowing
for the estimation or prediction of numerical values.

Regression analysis problem works with if output variable is a real or continuous value, such as “salary”
or “weight”. Many different models can be used, the simplest is the linear regression. It tries to fit data
with the best hyper-plane which goes through the points.

Regression algorithms:
1.simple linear regression
2.multiple linear regression

Clustering:
The task of grouping data points based on their similarity with each other is called Clustering or Cluster
Analysis. This method is defined under the branch of Unsupervised Learning, which aims at gaining
insights from unlabelled data points, that is, unlike supervised learning we don’t have a target variable.

Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset. It
evaluates the similarity based on a metric like Euclidean distance, Cosine similarity, Manhattan distance,
etc. and then group the points with highest similarity score together.

Clustering algorithms:
1.k-means algorithm
2. Hierarchical Clustering

Step 4:The performance improvement of a model:

Improving the performance of a machine learning model is a critical task, whether you're working on a
classification, regression, or any other type of machine learning problem. Model performance can be
enhanced through a variety of techniques, including data preprocessing, feature engineering, algorithm
tuning, and evaluation strategies.

Model parameter tuning is the process of adjusting the model fitting options. For example, in the popular
classification model k-Nearest Neighbour (kNN), using different values of ‘k’ or the number of nearest
neighbours to be considered, the model can be tuned.
The approach of combining different models with diverse strengths is known as ensemble. Ensemble
methods combine weaker learners to create stronger ones.
One of the earliest and most popular ensemble models is bootstrap aggregating or bagging. Bagging uses
bootstrapping to generate multiple training data sets. These training data sets are used to generate a set of
models using the same learning algorithm.

Just like bagging, boosting is another key ensemble-based technique. In boosting, weaker learning models
are trained on resampled data and the outcomes are combined using a weighted voting approach based
on the performance of different models.

MCQs (Machine Learning)
50% (22)
MCQs (Machine Learning)
7 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
29 pages
ML U2
No ratings yet
ML U2
62 pages
CHP 2
No ratings yet
CHP 2
52 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
36 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
Lect 3
No ratings yet
Lect 3
51 pages
02 Data
No ratings yet
02 Data
62 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
02 Data
No ratings yet
02 Data
65 pages
Module 1
No ratings yet
Module 1
64 pages
unit1
No ratings yet
unit1
78 pages
Transportation Data Mining: Chapter 2. Getting To Know Your Data
No ratings yet
Transportation Data Mining: Chapter 2. Getting To Know Your Data
77 pages
02Data (2)
No ratings yet
02Data (2)
36 pages
Chapter 2 - Preparing To Model
No ratings yet
Chapter 2 - Preparing To Model
16 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
02 Data
No ratings yet
02 Data
64 pages
CS822-DataMining-Week2 (2)
No ratings yet
CS822-DataMining-Week2 (2)
28 pages
02Data
No ratings yet
02Data
65 pages
VIPDMTheoryChapter2
No ratings yet
VIPDMTheoryChapter2
56 pages
IT326 - Ch2
No ratings yet
IT326 - Ch2
44 pages
data mining 2
No ratings yet
data mining 2
64 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
12. B Lab Manual Machine Learning SEM-7 CSE 2024
No ratings yet
12. B Lab Manual Machine Learning SEM-7 CSE 2024
49 pages
02 Data
No ratings yet
02 Data
64 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
54 pages
Data Analysts-1
No ratings yet
Data Analysts-1
65 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
63 pages
02 Data
No ratings yet
02 Data
41 pages
02Know Your Data Lecture2 3
No ratings yet
02Know Your Data Lecture2 3
53 pages
02Data
No ratings yet
02Data
66 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
02Data
No ratings yet
02Data
65 pages
02know Your Data-Lecture2-3
No ratings yet
02know Your Data-Lecture2-3
53 pages
Lec.02 Getting to Know Your Data
No ratings yet
Lec.02 Getting to Know Your Data
62 pages
DM UNIT-1-1
No ratings yet
DM UNIT-1-1
56 pages
Data Type, Data Chart, Descriptive Statistics
No ratings yet
Data Type, Data Chart, Descriptive Statistics
65 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
02data (Compatibility Mode)
No ratings yet
02data (Compatibility Mode)
11 pages
UNIT-1 (Preparing To Model)
No ratings yet
UNIT-1 (Preparing To Model)
82 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
ML 3170724 Unit-2
No ratings yet
ML 3170724 Unit-2
40 pages
Unit1 Statistics
No ratings yet
Unit1 Statistics
60 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
46 pages
02data DMDW
No ratings yet
02data DMDW
40 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
CH 2
No ratings yet
CH 2
68 pages
Lec 2
No ratings yet
Lec 2
26 pages
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
No ratings yet
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
52 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Lectur 4 Basic Statistical Descriptions of Data
No ratings yet
Lectur 4 Basic Statistical Descriptions of Data
44 pages
1_L2_Intro_DAM
No ratings yet
1_L2_Intro_DAM
27 pages
Data Mining (DM) : Lecture 3: Know Your Data
No ratings yet
Data Mining (DM) : Lecture 3: Know Your Data
53 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
ds key
No ratings yet
ds key
9 pages
C23BCE KEY (1)
No ratings yet
C23BCE KEY (1)
10 pages
BA Material
No ratings yet
BA Material
103 pages
BA
No ratings yet
BA
6 pages
V9no5oct20 pdf12
No ratings yet
V9no5oct20 pdf12
11 pages
Project Report: Bangladesh University of Business & Technology (BUBT)
No ratings yet
Project Report: Bangladesh University of Business & Technology (BUBT)
18 pages
CQ-Concepts-For-2-Marks Northern University Bangladesh
No ratings yet
CQ-Concepts-For-2-Marks Northern University Bangladesh
3 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
21 pages
Estimating The Cut (Ds0) Sizeof Classifiers Without Product Particle-Sizemeasurement
No ratings yet
Estimating The Cut (Ds0) Sizeof Classifiers Without Product Particle-Sizemeasurement
10 pages
Download full Making sense of data I a practical guide to exploratory data analysis and data mining 2ed. Edition Glenn J Myatt ebook all chapters
100% (23)
Download full Making sense of data I a practical guide to exploratory data analysis and data mining 2ed. Edition Glenn J Myatt ebook all chapters
60 pages
Crime Rate Prediction Using Machine Learning and Data Mining
No ratings yet
Crime Rate Prediction Using Machine Learning and Data Mining
12 pages
K-Medoids-Clustering Method
No ratings yet
K-Medoids-Clustering Method
5 pages
KNN REPORT
No ratings yet
KNN REPORT
28 pages
Hattarki Project Report CSE 572
No ratings yet
Hattarki Project Report CSE 572
5 pages
CODELATEX
No ratings yet
CODELATEX
10 pages
Few Shot Learning Seminar
No ratings yet
Few Shot Learning Seminar
14 pages
Live 1 - AI - K Nearest Neighbors
No ratings yet
Live 1 - AI - K Nearest Neighbors
21 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
ML Ques Bank For 2nd Unit PDF
No ratings yet
ML Ques Bank For 2nd Unit PDF
5 pages
Pip-Net: Patch-Based Intuitive Prototypes For Interpretable Image Classification
No ratings yet
Pip-Net: Patch-Based Intuitive Prototypes For Interpretable Image Classification
10 pages
Solutions HOML PDF
No ratings yet
Solutions HOML PDF
45 pages
PDF ML.NET Revealed: Simple Tools for Applying Machine Learning to Your Applications Sudipta Mukherjee download
100% (5)
PDF ML.NET Revealed: Simple Tools for Applying Machine Learning to Your Applications Sudipta Mukherjee download
50 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
6 pages
ML and AI Notes
100% (1)
ML and AI Notes
43 pages
Analysing Earth Near object & visualizing hazard
No ratings yet
Analysing Earth Near object & visualizing hazard
5 pages
Intelligent Clinical Documentation: Harnessing Generative AI For Patient-Centric Clinical Note Generation
No ratings yet
Intelligent Clinical Documentation: Harnessing Generative AI For Patient-Centric Clinical Note Generation
15 pages
Plant Leaf Disease Detection
No ratings yet
Plant Leaf Disease Detection
4 pages
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
No ratings yet
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
7 pages
Wheat Leaf Disease Detection Using Machine Learning Method-A Review
No ratings yet
Wheat Leaf Disease Detection Using Machine Learning Method-A Review
6 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
28 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Chapter 1 Book Notes
No ratings yet
Chapter 1 Book Notes
4 pages
IJACSA Volume3No1
No ratings yet
IJACSA Volume3No1
210 pages

machine learning unit 2

Uploaded by

machine learning unit 2

Uploaded by

Process of Machine Learning

It is obvious, mathematical operations such as addition, subtraction, multiplication, etc. cannot be

Exploring numerical data:

It provides a summary of the data through its five-number summary:

Key Features of a Boxplot:

A boxplot is useful for:

How Histograms Are Used:

Uses of a Scatter Plot:

Data Quality and Remediation:

b.Feature Subset Selection :

2.Training the Model

 Initialize the Model: Create an instance of the model class.

3.Model Representation And Interpretability:

Step 3:Analysing performance Evaluation of model:

Step 4:The performance improvement of a model:

You might also like