0% found this document useful (0 votes)

36 views24 pages

Machine Learning-1

The document discusses various machine learning concepts including decision trees, decision tree classifiers, export_text function in scikit-learn, seaborn library for data visualization, astype method in Pandas, cat.codes attribute in Pandas, classification report for model evaluation, and standard scaler preprocessing technique. It provides descriptions and purposes of these concepts with relevant code examples.

Uploaded by

factpolice007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views24 pages

Machine Learning-1

Uploaded by

factpolice007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

MACHINE LEARNING

ASSIGNMENT-1
NAME : RAMAN MAURYA
REGNO : 21DBCAD023
Decision Tree Classifier
 A decision tree is a non-parametric supervised learning algorithm, which is utilized
for both classification and regression tasks. It has a hierarchical, tree structure,
which consists of a root node, branches, internal nodes and leaf nodes.

 Its primary purpose is to create a model that predicts the target variable (a
categorical label in classification problems) based on a set of input features.

 It does this by recursively partitioning the input data into subsets, making decisions
at each step, ultimately forming a tree-like structure where the leaves represent the
class labels or predicted values.
Purpose of Decision Tree Classifier

 Classification : Decision trees are commonly used for classification tasks, where
the goal is to predict a categorical target variable. This could include tasks like spam
detection, disease diagnosis, sentiment analysis, or customer churn prediction.

 Interpretability: Decision trees are highly interpretable models, making them

valuable in situations where understanding why a particular prediction was made is
essential. They can be visualized and easily understood by humans.

 Feature Importance: Decision trees can implicitly rank the importance of input
features, allowing you to identify which features are most relevant for making
predictions.
Export_text
 The export_text function in scikit-learn is used to generate a textual representation
of a decision tree model, which can be useful for understanding how the tree makes
predictions.

 This textual representation provides information about the decision rules at each
node in the tree, including feature names, threshold values, and class predictions.

 It's a valuable tool for model interpretation and debugging.

Purpose of Export_text

 Model Interpretation: export_text allows you to generate a human-readable

representation of a decision tree model. This can help you understand how the model is
making decisions based on input features, making it easier to interpret and explain the
model's behavior to stakeholders or domain experts.

 Debugging: When working with decision tree models, especially deep or complex
ones, export_text can be used for debugging. You can visually inspect the tree structure
and decision rules to identify potential issues or sources of misclassification.
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
iris = load_iris()
X = iris['data’]
y = ['setosa']*50+['versicolor']*50+['virginica']*50
decision_tree = DecisionTreeClassifier(random_state=0, max_depth=3)
decision_tree = decision_tree.fit(X, y)
from sklearn.tree.export import export_text
r = export_text(decision_tree, feature_names=iris['feature_names'],decimals=0, show_weights=True)
print(r)

Output:
Seaborn
 Seaborn is a Python data visualization library based on Matplotlib.

 Its primary purpose is to provide a high-level interface for creating attractive and
informative statistical graphics.

 Seaborn is particularly well-suited for visualizing complex datasets and statistical

relationships, making it a popular choice for data exploration and presentation in data
analysis and machine learning tasks.
Purpose of Seaborn

 Statistical Visualization: Seaborn simplifies the creation of complex statistical

visualizations, allowing you to explore the relationships between variables in your data.
It provides functions for creating informative plots like scatter plots, bar plots,
histograms, box plots, violin plots, and more.

 Aesthetically Pleasing Plots: Seaborn comes with a set of appealing color palettes
and themes that make it easy to create visually pleasing and publication-quality plots
with minimal customization.

 Faceted Data Exploration: Seaborn provides built-in support for faceted data
exploration, allowing you to create multi-plot grids to examine interactions between
variables more easily.
import numpy as np
import seaborn as sns
sns.set(style="white")
rs = np.random.RandomState(10)
d = rs.normal(size=100)
sns.histplot(d, kde=True, color="m")

Output : histogram with seaborn

Astype
 The astype method in Pandas is used to change the data type of one or more columns
in a DataFrame.

 Its primary purpose is to allow you to explicitly specify the data type for columns in your
DataFrame.

 It can be useful for data type conversion, data manipulation, and data analysis tasks.
Purpose of astype

 Data Type Conversion: The primary purpose of astype is to change the data type of
one or more columns in a DataFrame. This can be helpful when you need to ensure
that a column has the correct data type for your analysis or when you want to convert
data from one type to another (e.g., from a string to a numeric type).
 Memory Optimization: By changing data types, you can reduce the memory usage of
your DataFrame. For example, converting integer columns to smaller integer types or
using float32 instead of float64 can lead to significant memory savings, which can be
crucial when dealing with large datasets.
 Data Cleaning: It can be used to clean data by converting erroneous or inconsistent
data into the correct data type. For example, converting string representations of
numbers into actual numeric types.
import pandas as pd
data = {'A': ['1', '2', '3’], 'B': [4.1, 5.2,
6.3]}
df = pd.DataFrame(data)
print("Initial Data Types:")
print(df.dtypes)
df['A'] = df['A'].astype(int)
df['B'] = df['B'].astype('float32')
print("\nUpdated Data Types:")
print(df.dtypes)

Output :
cat.codes
 In Pandas, the cat.codes attribute is used to obtain the category codes (or integer
codes) of the values in a categorical or "category" data type column.

 The purpose of cat.codes is to provide a way to represent categorical data as integers,

which can be useful for various data manipulation and analysis tasks.
Purpose of cat.codes

 Numerical Representation: cat.codes allows you to represent categorical data as

integers, making it easier to work with such data in numerical operations and analyses.
This can be especially valuable when you need to use categorical data in machine
learning models that require numerical inputs.
 Memory Efficiency: Integer representations are more memory-efficient compared to
storing the actual categories as strings or objects, which can be important when dealing
with large datasets.
 Sorting and Grouping: You can use the integer codes for sorting and grouping data.
For instance, you can sort a DataFrame by a categorical column based on its codes.
import pandas as pd
data = {'Category': ['A', 'B', 'C', 'A', 'C', 'B', 'B']}
df = pd.DataFrame(data)
df['Category'] = df['Category'].astype('category’)
category_codes = df['Category'].cat.codes
df['Category Codes'] = category_codes
print(df)

Output :
Classification Report
 The classification report is a tool in machine learning for evaluating the performance of
a classification model. Its primary purpose is to provide a detailed summary of the
model's performance in terms of various evaluation metrics for each class or category in
a classification problem.
 It's particularly useful for understanding how well a model is performing across different
classes and can help in identifying where the model may be making mistakes.
 The classification report typically includes metrics such as precision, recall, F1-score,
and support for each class, along with an overall accuracy score. These metrics are
valuable for assessing a model's performance in tasks like binary classification, multi-
class classification, and multi-label classification.
Purpose of classification Report

 Detailed Evaluation: The classification report provides a detailed breakdown of model

performance, helping you understand how well the model is performing for each class.
This is especially important in imbalanced datasets where some classes may have
fewer examples.
 Model Comparison: It allows you to compare the performance of different models or
algorithms on the same dataset. You can use the report to choose the model that best
suits your specific classification task.
 Error Analysis: The report helps in identifying which classes the model is good at
predicting and which ones it struggles with, which can guide further data collection,
preprocessing, or model improvement efforts.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
clf = DecisionTclf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print("Classification Report:")
print(report)

Output :
Standard Scaler
 The Standard Scaler is a preprocessing technique commonly used in machine learning
to scale and center numerical features (variables) in a dataset.
 Its primary purpose is to transform the features so that they have a mean of 0 and a
standard deviation of 1.
 Standardization (also known as z-score normalization) is particularly useful when
dealing with features that have different scales or units because it ensures that all
features have the same scale.
 It can improve the performance of many machine learning algorithms, especially those
sensitive to the scale of features.
Purpose of Standard Scaler

 Scale Features: Standard Scaler scales each feature independently, transforming them
to have a mean of 0 and a standard deviation of 1. This scaling makes it easier to
compare and interpret the impact of different features on a model.
 Improve Model Performance: Many machine learning algorithms, such as support
vector machines, k-nearest neighbors and principal component analysis, perform better
when features are standardized. Standardization helps prevent features with larger
scales from dominating the learning process.
 Normalize Distributions: Standard Scaler can help normalize the feature distributions,
making the data more suitable for models that assume normality.
import numpy as np
from sklearn.preprocessing import StandardScaler
data = np.array([[10.0, 5.0], [20.0, 10.0], [30.0, 15.0]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
print("Original Data:")
print(data)
print("\nScaled Data:")
print(scaled_data)

Output :
Label Encoding
 Label encoding is a technique used to convert categorical data (data that consists of
labels or categories) into numerical format.

 Its primary purpose is to prepare categorical data for machine learning algorithms that
require numerical inputs.

 Label encoding assigns a unique integer value to each category, effectively converting
them into numeric labels.
Purpose of Label Encoding

 Numeric Representation: Label encoding converts categorical data into a numeric

format, which is essential for many machine learning algorithms that work with
numerical data.

 Preserve Ordinal Information: Label encoding can be useful for ordinal categorical
data, where the order or ranking among categories matters. The assigned integers
retain the ordinal relationship between the categories.
from sklearn.preprocessing import LabelEncoder
data = ['red', 'green', 'blue', 'green', 'red']
label_encoder = LabelEncoder()
encoded_data = label_encoder.fit_transform(data)
print("Original Data:")
print(data)
print("\nEncoded Data:")
print(encoded_data)
decoded_data =
label_encoder.inverse_transform(encoded_data)
print("\nDecoded Data:")
print(decoded_data)

Output :

Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
DOC-20250315-WA0005.
No ratings yet
DOC-20250315-WA0005.
29 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
PP (R18) U2 NotesRK 23012023
No ratings yet
PP (R18) U2 NotesRK 23012023
29 pages
1.10. Decision Trees — scikit-learn 0.24.1 documentation
No ratings yet
1.10. Decision Trees — scikit-learn 0.24.1 documentation
10 pages
Alay Shah: Education
No ratings yet
Alay Shah: Education
1 page
Unit 7: Problem Solving Real World Programming Problems
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
8th Computer
No ratings yet
8th Computer
2 pages
Prac5 AAM
No ratings yet
Prac5 AAM
2 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Huum - Info db2 Notes PR
No ratings yet
Huum - Info db2 Notes PR
77 pages
decisiontree with example
No ratings yet
decisiontree with example
45 pages
Politician Class Example
No ratings yet
Politician Class Example
2 pages
Updated Jay
No ratings yet
Updated Jay
2 pages
Codes and other relevant explanations for supervised learning (Part 1)_Session by Sabyasachi Mukhopadhyay_August 3
No ratings yet
Codes and other relevant explanations for supervised learning (Part 1)_Session by Sabyasachi Mukhopadhyay_August 3
5 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
3 pages
PCA2-1
No ratings yet
PCA2-1
26 pages
categorical.rst
No ratings yet
categorical.rst
22 pages
IDA Pro Qt Plugin Isolation
No ratings yet
IDA Pro Qt Plugin Isolation
16 pages
Bhabesh - Chapter 5
No ratings yet
Bhabesh - Chapter 5
19 pages
How to Disable Optimizer Statistics Advisor From 12.2 Onwards (Doc ID 2686022.1)
No ratings yet
How to Disable Optimizer Statistics Advisor From 12.2 Onwards (Doc ID 2686022.1)
2 pages
Bignum
No ratings yet
Bignum
2 pages
Avl Tree
No ratings yet
Avl Tree
10 pages
ch-9 Oosd The Object-Oriented Design Process and Design Axioms
No ratings yet
ch-9 Oosd The Object-Oriented Design Process and Design Axioms
53 pages
Chetan Project
No ratings yet
Chetan Project
10 pages
Cse-209 Object Oriented Programming Using C++ (3 1 0 4)
No ratings yet
Cse-209 Object Oriented Programming Using C++ (3 1 0 4)
2 pages
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
Python: With Django
No ratings yet
Python: With Django
8 pages
Test2 ML Model Answer
No ratings yet
Test2 ML Model Answer
10 pages
mlviva
No ratings yet
mlviva
14 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Project Report
No ratings yet
Project Report
37 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
AUT
No ratings yet
AUT
19 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Machine Learning Practical
No ratings yet
Machine Learning Practical
59 pages
Modern Machine Learning in Python
No ratings yet
Modern Machine Learning in Python
50 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Oracle EBS sample
No ratings yet
Oracle EBS sample
4 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
practical 15 python
No ratings yet
practical 15 python
6 pages
14MachineLearningDecisionTreeRandomForest - Ipynb - Colaboratory
No ratings yet
14MachineLearningDecisionTreeRandomForest - Ipynb - Colaboratory
29 pages
RANDOM FOREST (Binary Classification)
No ratings yet
RANDOM FOREST (Binary Classification)
5 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
HP Placement Papers With Answers
No ratings yet
HP Placement Papers With Answers
29 pages
Lect3 Supervised1
No ratings yet
Lect3 Supervised1
25 pages
Eeee3206-Matlab Programming-2023-2024
No ratings yet
Eeee3206-Matlab Programming-2023-2024
37 pages
Machine Learning Laboratory: Manual
No ratings yet
Machine Learning Laboratory: Manual
52 pages
Beginner’s Guide to Implementing a Simple Machine Learning Project - DeV Community
No ratings yet
Beginner’s Guide to Implementing a Simple Machine Learning Project - DeV Community
9 pages
List of Imported Libraries
No ratings yet
List of Imported Libraries
12 pages
ML LabManual (1)
No ratings yet
ML LabManual (1)
16 pages
CSE II - II Syllabus
No ratings yet
CSE II - II Syllabus
16 pages
CD Labmanual
No ratings yet
CD Labmanual
54 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
RANDOM_FOREST__1737667979
No ratings yet
RANDOM_FOREST__1737667979
11 pages
Scikit Learn Docs PDF
100% (3)
Scikit Learn Docs PDF
2,204 pages
OOPs PPT
No ratings yet
OOPs PPT
14 pages
Exp-1
No ratings yet
Exp-1
22 pages
Python-for-Data-Analysis-edgar
No ratings yet
Python-for-Data-Analysis-edgar
49 pages
Efficient Python Tricks and Tools For Data Scientists - by Khuyen Tran
No ratings yet
Efficient Python Tricks and Tools For Data Scientists - by Khuyen Tran
20 pages
Quiz Section4 Java Fundamental
75% (4)
Quiz Section4 Java Fundamental
2 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
ML_Exp
No ratings yet
ML_Exp
9 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Difference Between Action and Function: Chapters
No ratings yet
Difference Between Action and Function: Chapters
1 page
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
1.1 Problem Definition To Develop A Library Management System
No ratings yet
1.1 Problem Definition To Develop A Library Management System
9 pages
Ds You Should Know
No ratings yet
Ds You Should Know
6 pages
Lab 1 Introduction To Stateflow
No ratings yet
Lab 1 Introduction To Stateflow
39 pages
VB Script 01
No ratings yet
VB Script 01
23 pages
System Verilog Interview Q
No ratings yet
System Verilog Interview Q
22 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Scikit Learn Docs PDF
No ratings yet
Scikit Learn Docs PDF
2,387 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
Changing SAP ALV Row Colour
No ratings yet
Changing SAP ALV Row Colour
6 pages
Dbms (CSN 208) Lab Assignment: Submitted To: Alka Jindal Submitted By: Tamanna Puaar 16103077 CSE, 2 Year
No ratings yet
Dbms (CSN 208) Lab Assignment: Submitted To: Alka Jindal Submitted By: Tamanna Puaar 16103077 CSE, 2 Year
17 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Test 1csc508 Answer
No ratings yet
Test 1csc508 Answer
7 pages
PetrelRE2012 AHM Tips 5762109 03
No ratings yet
PetrelRE2012 AHM Tips 5762109 03
11 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Machine Learning-1

Uploaded by

Machine Learning-1

Uploaded by

MACHINE LEARNING

 Interpretability: Decision trees are highly interpretable models, making them

 It's a valuable tool for model interpretation and debugging.

 Model Interpretation: export_text allows you to generate a human-readable

 Seaborn is particularly well-suited for visualizing complex datasets and statistical

 Statistical Visualization: Seaborn simplifies the creation of complex statistical

Output : histogram with seaborn

 The purpose of cat.codes is to provide a way to represent categorical data as integers,

 Numerical Representation: cat.codes allows you to represent categorical data as

 Detailed Evaluation: The classification report provides a detailed breakdown of model

 Numeric Representation: Label encoding converts categorical data into a numeric

You might also like