0% found this document useful (0 votes)

12 views14 pages

UNIT 2 ML

The document outlines key concepts and steps in data preparation for machine learning, including data collection, cleaning, feature engineering, and exploratory data analysis. It emphasizes the importance of preparing data to ensure reliable predictions and improve model performance. The document also discusses the iterative nature of data preparation and the need for documentation and reproducibility in machine learning projects.

Uploaded by

anugowda2724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views14 pages

UNIT 2 ML

Uploaded by

anugowda2724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

VASAVI JNANA PEETHA FIRST GRADE COLLEGE

VIJAYANAGAR,BANGALORE-40
SUBJECT:MACHINE LEARNING
UNIT -2 NOTES
DATA PREPARATION

1.WORKING WITHREAL DATA

2.LOOKING AT THE BIG PICTURE
3.GET THE DATA
4.DISCOVER AND VISUALIZE THE DATA TO GAIN INSIGHTS
5.PREPARE THE DATA FOR ML ALGORITHMS
6.SELECT AND TRAIN MODEL
1)What is data preparation?
 Data preparation is the process of preparing raw data so that it is suitable
for further processing and analysis.
 Key steps include collecting, cleaning, and labeling raw data into a form
suitable for machine learning (ML) algorithms and then exploring and
visualizing the data.
 Data preparation is defined as a gathering, combining, cleaning, and
transforming raw data to make accurate predictions in Machine learning
projects.

2.Why is Data Preparation important?

 It helps to provide reliable prediction outcomes in various analytics
operations.
 It helps identify data issues or errors and significantly reduces the
chances of errors.
 It increases decision-making capability.
 It reduces overall project cost (data management and analytic cost).

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 1
 It helps to remove duplicate content to make it worthwhile for different
applications.
 It increases model performance.

3)Explain working with real data?

Everyone must explore a few essential tasks when working with data in the
data preparation step. These are as follows:
o Data cleaning: This task includes the identification of errors and making
corrections or improvements to those errors.
o Feature Selection: We need to identify the most important or relevant
input data variables for the model.
o Data Transforms: Data transformation involves converting raw data into
a well-suitable format for the model.
o Feature Engineering: Feature engineering involves deriving new
variables from the available dataset.
o Dimensionality Reduction: The dimensionality reduction process
involves converting higher dimensions into lower dimension features
without changing the information.
Working with real data:-
Working with real data in data preparation for machine learning involves
several steps to ensure the data is properly formatted, cleaned, and preprocessed
for use in training a machine learning model.
Here's a general guide to the data preparation process:
a)Data Collection:
 Obtain the dataset from reliable sources.
 This could be from databases, APIs, CSV files, Excel spreadsheets, or
any other structured data format.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 2
b)Exploratory Data Analysis (EDA):
Perform EDA to understand the structure, distribution, and characteristics of
the data. This involves:
 Checking for missing values.
 Summarizing statistics (mean, median, min, max, etc.).
 Visualizations (histograms, box plots, scatter plots, etc.) to understand
relationships and distributions.
 Identify outliers and anomalies.
c)Data Cleaning:
i)Handle missing values:
Impute missing values (using mean, median, mode, or more sophisticated
methods), or remove rows/columns with missing data depending on the
amount of missingness and the nature of the problem.
ii)Deal with outliers:
Decide whether to remove outliers or transform them to mitigate their
impact on the model.
iii)Address inconsistencies and errors in data:
This might involve correcting typos, standardizing formats, or resolving
inconsistencies in categorical variables.
d)Feature Engineering:
 Create new features: Combine existing features or derive new ones that
might be more informative for the model.
 Encode categorical variables: Convert categorical variables into
numerical representations using techniques like one-hot encoding, label
encoding, or embeddings.
 Feature scaling: Scale numerical features to a similar range (e.g., using
min-max scaling or standardization) to prevent features with large values
from dominating the model.
YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 3
e)Data Transformation:
•Normalize the data: Scale the features to have a mean of 0 and a standard
deviation of 1 to improve convergence during training.
•Dimensionality reduction: If dealing with high-dimensional data, use
techniques like Principal Component Analysis (PCA) or feature selection
to reduce the number of features while preserving most of the variance.
f)Data Splitting:
Split the data into training, validation, and test sets to assess model
performance and prevent overfitting.
g)Data Preprocessing Pipeline:
 Create a preprocessing pipeline that encapsulates all the data preparation
steps.
 This ensures consistency and allows easy application to new data.
h)Iterative Process:
 Data preparation is often an iterative process.
 You may need to revisit previous steps based on insights gained during
model training and evaluation.
i)Documentation:
 Document all the steps taken during data preparation, including any
assumptions made or decisions taken.
 This documentation is crucial for reproducibility and collaboration.

4)Explain Looking at the big picture?

Look at the Big Picture:
Looking at the big picture in data preparation for machine learning involves
understanding the overarching goals, challenges, and best practices that guide
the entire process. Here's an overview:

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 4
a)Understanding Business Objectives:
 Data preparation starts with a clear understanding of the business problem
or objectives that the machine learning model aims to address.
 This understanding helps in defining the scope of data collection, the
choice of features, and the evaluation metrics for the model.
b)Data Collection:
 Acquiring relevant and high-quality data is crucial for the success of any
machine learning project.
 This involves identifying data sources, collecting data, and ensuring its
integrity, accuracy, and completeness.
 Data may come from various internal or external sources, such as
databases, APIs, sensor data, or web scraping.
c)Data Cleaning and Preprocessing:
 Raw data often contains errors, missing values, outliers, and
inconsistencies that need to be addressed before feeding it into a machine
learning model.
 Data cleaning involves tasks like imputation of missing values, handling
outliers, removing duplicates, and standardizing formats.
 Preprocessing includes feature scaling, encoding categorical variables,
and handling skewness in distributions.
d)Feature Engineering:
 Feature engineering is the process of creating new features or
transforming existing ones to enhance the performance of machine
learning models.
 This step requires domain knowledge and creativity to extract meaningful
information from the data.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 5
 Feature engineering aims to capture relevant patterns, reduce
dimensionality, and improve the model's ability to generalize.
e)Exploratory Data Analysis (EDA):
 EDA is an essential step in understanding the underlying structure and
relationships within the data.
 It involves visualizations, statistical summaries, and hypothesis testing to
gain insights into the data distribution, correlations between variables,
and potential patterns or trends.
f)Data Transformation and Scaling:
 Data transformation techniques like normalization or standardization are
applied to ensure that all features have a similar scale.
 This prevents features with larger magnitudes from dominating the model
and helps in achieving faster convergence during training.
g)Data Splitting:
 Before training a machine learning model, the dataset is split into
training, validation, and test sets.
 This ensures that the model is trained on one set, validated on another set
for hyper parameter tuning, and tested on a separate set to evaluate its
generalization performance.
h)Documentation and Reproducibility:
 Documenting the entire data preparation process is crucial for
reproducibility and transparency.
 This includes recording data sources, preprocessing steps, feature
engineering techniques, and any assumptions or decisions made during
the process.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 6
i)Iterative Process:
Data preparation is often an iterative process that involves refining data cleaning
procedures, experimenting with different feature engineering techniques, and
optimizing preprocessing steps based on model performance and feedback.
j)Continuous Monitoring and Maintenance:
 Once a machine learning model is deployed, it's essential to monitor its
performance over time and update the data preparation pipeline
accordingly.
 This ensures that the model remains effective in real-world scenarios and
adapts to changing data patterns or business requirements.

5)Explain Get the Data(Discover and Visualize the Data to Gain Insights)?
Get the Data:
Discover and Visualize the Data to Gain Insights:
Discovering and visualizing the data is a crucial step in data preparation for
machine learning. Here's a guide on how to perform exploratory data analysis
(EDA) to gain insights:
a)Load the Data: Start by loading your dataset into your preferred data analysis
environment such as Python with libraries like Pandas, NumPy, and
Matplotlib/Seaborn for visualization.
b)Basic Data Exploration:
 Check the first few rows of the dataset using the .head( ) function to
understand its structure.
 Check the dimensions of the dataset (number of rows and columns) using
the .shape attribute.
 Use the .info( ) function to get a concise summary of the dataset,
including data types and missing values.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 7
c)Summary Statistics:
 Compute summary statistics such as mean, median, standard deviation,
minimum, and maximum values for numerical features using the
.describe( ) function.
 For categorical features, you can use the .value_counts( ) function to get
the frequency distribution of unique values.
d)Data Visualization:
 Histograms: Plot histograms to visualize the distribution of numerical
features. This helps in understanding the range and spread of values and
identifying potential outliers.
 Box plots: Use box plots to visualize the distribution of numerical
features, identify outliers, and compare distributions across different
categories.
 Scatter plots: Plot scatter plots to visualize the relationship between pairs
of numerical features. This helps in identifying patterns, correlations, and
potential trends in the data.
 Bar plots: Use bar plots to visualize the frequency distribution of
categorical features. This helps in understanding the distribution of
categories and identifying dominant categories.
 Heatmaps: Plot heatmaps to visualize the correlation matrix between
numerical features. This helps in identifying multicollinearity and
understanding the strength and direction of correlations.
e)Feature Relationships:
 Explore relationships between features using scatter plots, pair plots (for
multiple numerical features), and categorical plots (for categorical
features).

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 8
 Look for patterns, trends, and correlations between features, which can
provide valuable insights for feature selection and engineering.
f)Missing Values and Outliers:
 Visualize missing values using heatmaps or bar plots to identify patterns
of missingness across features.
 Plot box plots or scatter plots to identify outliers in numerical features.
Decide whether to remove or impute outliers based on domain knowledge
and the impact on the model.
g)Interactive Visualizations:
Consider using interactive visualization libraries like Plotly or Bokeh for
more interactive and dynamic exploration of the data.
h)Iterative Exploration:
 Perform iterative data exploration and visualization based on initial
insights and hypotheses generated.
 This may involve drilling down into specific subsets of the data or
focusing on particular features of interest.

6)Explain Prepare the Data for machine Learning Algorithms?

Prepare the Data for machine Learning Algorithms:
Preparing data for machine learning algorithms involves several steps to ensure
that the dataset is formatted correctly, features are appropriately scaled, and the
data is ready to be used for training a model. Here's a comprehensive guide to
preparing data for machine learning:
a)Handling Missing Values:
Identify and handle missing values in the dataset.
Options include:
 Removing rows or columns with missing values if they are insignificant.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 9
 Imputing missing values using methods like mean, median, mode, or
more sophisticated techniques such as K-nearest neighbors (KNN)
imputation or predictive modeling.
 For categorical variables, consider adding a new category to represent
missing values if they carry meaningful information.
b)Encoding Categorical Variables:
Convert categorical variables into a numerical format suitable for machine
learning algorithms. Common techniques include:
One-hot encoding: Create binary columns for each category, where 1 indicates
the presence of the category and 0 indicates absence.
Label encoding: Map each category to a unique integer. This is suitable for
ordinal categorical variables with a natural order.
Target encoding: Encode categorical variables based on the target variable's
mean or other aggregated metrics. This can be useful for high-cardinality
categorical variables.
c)Feature Scaling:
Scale numerical features to a similar range to prevent features with larger
magnitudes from dominating the model.
Common scaling techniques include:
•Min-max scaling (Normalization): Scale features to a range between 0 and 1.
•Standardization: Transform features to have a mean of 0 and a standard
deviation of 1.
•Robust scaling: Scale features using median and interquartile range to handle
outliers.
d)Feature Engineering:
Create new features or transform existing ones to capture meaningful
information and improve the model's performance. Feature engineering
techniques include:
YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 10
•Polynomial features: Generate polynomial combinations of features to capture
nonlinear relationships.
•Interaction terms: Create new features by taking the product or ratio of
existing features.
•Domain-specific transformations: Apply domain knowledge to create
features relevant to the problem.
e)Dimensionality Reduction:
Reduce the number of features to alleviate the curse of dimensionality and
improve computational efficiency.
Techniques include:
•Principal Component Analysis (PCA): Project data onto a lower-dimensional
subspace while preserving the maximum variance.
•Feature selection: Select a subset of relevant features based on statistical tests,
feature importance scores, or domain knowledge.
f)Data Splitting:
Split the dataset into training, validation, and test sets to evaluate the model's
performance.
Common splits include:
•Training set: Used to train the model.
•Validation set: Used to tune hyper parameters and assess model performance
during training.
•Test set: Held out for final evaluation to estimate the model's generalization
performance on unseen data.
g)Data Pipeline:
 Create a data preprocessing pipeline that encapsulates all the data
preparation steps.
 This ensures consistency and facilitates reproducibility when applying the
preprocessing steps to new data.
YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 11
h)Documentation and Versioning:
 Document all data preparation steps, including assumptions,
transformations, and preprocessing techniques applied.
 Version control the data preprocessing pipeline to track changes and
ensure reproducibility.

7)Explain Select and Train a Model?

Select and Train a Model:
Selecting and training a model in the data preparation phase of machine learning
involves choosing an appropriate algorithm, training it on the prepared dataset,
and evaluating its performance. Here's a step-by-step guide:
a)Choose a Model:
 Select a machine learning algorithm suitable for your problem based on
factors such as the nature of the data (e.g., classification, regression), size
of the dataset, interpretability requirements, and computational resources.
 Common algorithms include linear regression, logistic regression,
decision trees, random forests, support vector machines (SVM), k-nearest
neighbors (KNN), and neural networks.
b)Prepare the Data:
Ensure that the dataset is properly cleaned, preprocessed, and split into training
and testing sets as described earlier in the data preparation process.
c)Train the Model:
 Fit the selected model to the training data using the fit( ) method or
equivalent in your chosen machine learning library (e.g., scikit-learn in
Python).
 Provide the training features (X_train) and the corresponding target labels
(y_train) as input to the fit () method.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 12
d)Model Evaluation:
 Evaluate the trained model's performance using appropriate evaluation
metrics based on the problem type (e.g., accuracy, precision, recall, F1-
score for classification; mean squared error, R-squared for regression).
 Calculate the performance metrics on the test set using the predict( )
method to generate predictions and compare them with the actual target
labels (y_test).
 Visualize the model's performance using relevant plots such as confusion
matrices, ROC curves (for binary classification), or calibration plots.
e)Hyper parameter Tuning:
 Fine-tune the model's hyper parameters to improve its performance. This
involves searching over a predefined hyper parameter space using
techniques like grid search or random search.
 Use cross-validation to estimate the model's performance on different
subsets of the training data and avoid overfitting.
f)Model Selection:
 Compare the performance of different models using cross-validation or a
separate validation set.
 Select the model with the best performance based on the evaluation
metrics and your specific requirements (e.g., accuracy, interpretability,
computational efficiency).
g)Training Pipeline:
 Create a training pipeline that encapsulates the data preparation, model
training, and evaluation steps.
 This ensures reproducibility and facilitates experimentation with different
algorithms and hyper parameters.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 13
h)Documentation and Reporting:
 Document the model selection process, including the chosen algorithm,
hyper parameters, and evaluation results.
 Provide insights into the model's strengths, weaknesses, and potential
areas for improvement.

YOGESH S N
B.Sc B.ed,MCA,M.Sc in Mathematics
VASAVI JNANA PEETHA FIRST GRADE COLLEGE Page 14

ML_DA
No ratings yet
ML_DA
55 pages
Unit I
No ratings yet
Unit I
41 pages
Presentation-2 Data Pre-Processing in Machine Learning
No ratings yet
Presentation-2 Data Pre-Processing in Machine Learning
11 pages
AML MIDSEM
No ratings yet
AML MIDSEM
59 pages
Lecture No 2 Data Preparation
No ratings yet
Lecture No 2 Data Preparation
23 pages
HCA2 (1)
No ratings yet
HCA2 (1)
63 pages
ML 2022
No ratings yet
ML 2022
10 pages
Data Preparation: January 2017
No ratings yet
Data Preparation: January 2017
15 pages
Basics of Machine Learning1
No ratings yet
Basics of Machine Learning1
67 pages
L3 Overview of ML Model Development Lifecycle-1
No ratings yet
L3 Overview of ML Model Development Lifecycle-1
30 pages
Model Evaluation
No ratings yet
Model Evaluation
39 pages
Unit 1
No ratings yet
Unit 1
41 pages
1725892639Module 3 the Machine Learning Process
No ratings yet
1725892639Module 3 the Machine Learning Process
17 pages
Unit 7 ML
No ratings yet
Unit 7 ML
33 pages
S-9
No ratings yet
S-9
18 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
ML_1
No ratings yet
ML_1
13 pages
The Data Arena.
No ratings yet
The Data Arena.
11 pages
Data Preparation For Machine Learning Mini Course
No ratings yet
Data Preparation For Machine Learning Mini Course
19 pages
TE_ML_LAB_mannual
No ratings yet
TE_ML_LAB_mannual
21 pages
1
No ratings yet
1
19 pages
Statistics for Data Science
No ratings yet
Statistics for Data Science
39 pages
Unit 2
No ratings yet
Unit 2
18 pages
Lecture 3 Unit 1
No ratings yet
Lecture 3 Unit 1
61 pages
UNIT - 2 ML
No ratings yet
UNIT - 2 ML
8 pages
1-s2.0-S1746809424011388-main
No ratings yet
1-s2.0-S1746809424011388-main
19 pages
Data Preprocessing
No ratings yet
Data Preprocessing
4 pages
Improve Model Accuracy With Data Pre-Processing
No ratings yet
Improve Model Accuracy With Data Pre-Processing
11 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004_compressed (1)
No ratings yet
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004_compressed (1)
6 pages
Ch8 Data and Its Processing
No ratings yet
Ch8 Data and Its Processing
32 pages
DPT Week 1
No ratings yet
DPT Week 1
3 pages
MSDSModule 2
No ratings yet
MSDSModule 2
35 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
Ml Notes All
No ratings yet
Ml Notes All
32 pages
How To Prepare Data For Machine Learning
No ratings yet
How To Prepare Data For Machine Learning
34 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Unit 4_Question Bank and answers
No ratings yet
Unit 4_Question Bank and answers
23 pages
Chương
No ratings yet
Chương
12 pages
Module 3 Notes
No ratings yet
Module 3 Notes
5 pages
Machine learning Life cycle
No ratings yet
Machine learning Life cycle
11 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
4 pages
Data Preprocessing Before Classification: Presented by
No ratings yet
Data Preprocessing Before Classification: Presented by
23 pages
ML_MDU_2024_10939237
No ratings yet
ML_MDU_2024_10939237
20 pages
5_Unit 2 - Lecture 2-Data Handling
No ratings yet
5_Unit 2 - Lecture 2-Data Handling
15 pages
Machine Learning Tips
No ratings yet
Machine Learning Tips
2 pages
6. Integrating+LLMs+into+AI-Driven+Supply+Chains
No ratings yet
6. Integrating+LLMs+into+AI-Driven+Supply+Chains
35 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Module 2
No ratings yet
Module 2
8 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
Experiment 01: AIM: To Perform Data Preparation Using Numpy and Panda. Theory
No ratings yet
Experiment 01: AIM: To Perform Data Preparation Using Numpy and Panda. Theory
5 pages
UNIT - 2 ML
No ratings yet
UNIT - 2 ML
8 pages
Data Cleaning and Preprocessing
No ratings yet
Data Cleaning and Preprocessing
4 pages
Six Steps To Master Machine Learning With Data Preparation
No ratings yet
Six Steps To Master Machine Learning With Data Preparation
44 pages
Machine Learning Project Checklist
100% (1)
Machine Learning Project Checklist
10 pages
Week 2
No ratings yet
Week 2
3 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Diabetes Project Proposal
No ratings yet
Diabetes Project Proposal
6 pages
46 Precision Diagnosis of Tomato Diseases For
No ratings yet
46 Precision Diagnosis of Tomato Diseases For
11 pages
A Survey of Deep Convolutional Neural Networks Applied for Prediction of Plant Leaf Diseases
No ratings yet
A Survey of Deep Convolutional Neural Networks Applied for Prediction of Plant Leaf Diseases
35 pages
02 - Embedded and Edge Hardware
No ratings yet
02 - Embedded and Edge Hardware
49 pages
s10639-022-11573-9
No ratings yet
s10639-022-11573-9
30 pages
20 End-to-End Data Science Projects for a Junior Portfolio
No ratings yet
20 End-to-End Data Science Projects for a Junior Portfolio
7 pages
2025_XIAN_BRepFormer_Transformer-Based B-rep Geometric Feature_
No ratings yet
2025_XIAN_BRepFormer_Transformer-Based B-rep Geometric Feature_
10 pages
Technical Seminar Darshan.docx
No ratings yet
Technical Seminar Darshan.docx
32 pages
RAIL GUARD RAILWAY TRACK OBSTACLE DETECTION AND ALERT SYSTEM (First Review)
No ratings yet
RAIL GUARD RAILWAY TRACK OBSTACLE DETECTION AND ALERT SYSTEM (First Review)
24 pages
Rough Work
No ratings yet
Rough Work
27 pages
1-s2.0-S095741742303083X-main
No ratings yet
1-s2.0-S095741742303083X-main
16 pages
Plant Species Using Ml
No ratings yet
Plant Species Using Ml
11 pages
Amazon_ML_Challenge_2024_Solution_by_Vishesh_Rawal_Sep,_2024_Medium
No ratings yet
Amazon_ML_Challenge_2024_Solution_by_Vishesh_Rawal_Sep,_2024_Medium
15 pages
1b. Literature Review Review[1]
No ratings yet
1b. Literature Review Review[1]
22 pages
AIF C01 Test Prep
No ratings yet
AIF C01 Test Prep
20 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
CAAFE
No ratings yet
CAAFE
23 pages
Updated_Hotel_Cancellation_Project_Report
No ratings yet
Updated_Hotel_Cancellation_Project_Report
25 pages
15th_ICCCNT_2024_paper_3723
No ratings yet
15th_ICCCNT_2024_paper_3723
8 pages
Fingerprint_Liveliness_Detection_using_Stacked_Ensemble_and_Transfer_Learning_Technique
No ratings yet
Fingerprint_Liveliness_Detection_using_Stacked_Ensemble_and_Transfer_Learning_Technique
7 pages
Unit-1
No ratings yet
Unit-1
18 pages
Main Project-PPT Script
No ratings yet
Main Project-PPT Script
13 pages
Report on Coral Leaf Stage -1
No ratings yet
Report on Coral Leaf Stage -1
25 pages
13
No ratings yet
13
1 page
icest Journal Paper
No ratings yet
icest Journal Paper
12 pages
(Ebook) Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python by Akshay Kulkarni, Adarsha Shivananda ISBN 9781484273500, 1484273508 All Chapters Instant Download
100% (10)
(Ebook) Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python by Akshay Kulkarni, Adarsha Shivananda ISBN 9781484273500, 1484273508 All Chapters Instant Download
81 pages
Paddy Leaf Disease Prediction Using Transfer Learning
No ratings yet
Paddy Leaf Disease Prediction Using Transfer Learning
4 pages
SPARC Workshop Brochure Machine Learning July 2024 2024-6-25!9!49-10
No ratings yet
SPARC Workshop Brochure Machine Learning July 2024 2024-6-25!9!49-10
2 pages
4. Data Cleaning and Preparation
No ratings yet
4. Data Cleaning and Preparation
20 pages
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

UNIT 2 ML

Uploaded by

UNIT 2 ML

Uploaded by

VASAVI JNANA PEETHA FIRST GRADE COLLEGE

1.WORKING WITHREAL DATA

2.Why is Data Preparation important?

3)Explain working with real data?

4)Explain Looking at the big picture?

6)Explain Prepare the Data for machine Learning Algorithms?

7)Explain Select and Train a Model?

You might also like