0% found this document useful (0 votes)

96 views

ML 2

machine learning

Uploaded by

Pushkal Vashishtha Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

ML 2

machine learning

Uploaded by

Pushkal Vashishtha Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Subject: Machine Learning Subject Code: IT-613

Assignment-3

1. What are the different types of regression.

2. Can decision tree be used for regression? If yes,
explain how. If no, explain why.
3. What do you mean by information gain and
entropy? How is it used to build the decision
trees? Illustrate using an example.
4. What are issues in decision tree learning? How are
they overcome?
5. Use the following data to generate a linear
regression model for annual salary as function of
GPA and number of months worked.
1. What are the different types of regression?

The main types of regression models used in machine learning are:

 Linear Regression: Predicts the relationship between the dependent and independent
variables by fitting a linear equation. It’s suitable for continuous data with a linear
relationship.
 Logistic Regression: Used for binary classification problems. It predicts the
probability that a given input belongs to a particular category, using a logistic
function.
 Polynomial Regression: Extends linear regression by using polynomial terms. It’s
useful when the relationship between the independent and dependent variables is non-
linear.
 Ridge Regression: A form of linear regression that includes a penalty term for large
coefficients, helping to avoid overfitting.
 Lasso Regression: Similar to ridge regression but with a penalty that can shrink some
coefficients to zero, effectively performing feature selection.
 Elastic Net Regression: A combination of ridge and lasso regression, useful for
handling multicollinearity and feature selection.
 Support Vector Regression (SVR): Uses Support Vector Machine principles to fit a
model within a margin of tolerance. It's suitable for complex relationships in small
datasets.
 Decision Tree Regression: A tree-based model where each decision node represents
a test on an attribute, and each leaf node represents an output value.
 Random Forest Regression: An ensemble method that combines multiple decision
trees for improved accuracy and reduced overfitting.

2. Can a decision tree be used for regression?

Yes, decision trees can be used for regression. This is known as Decision Tree Regression.

Explanation:

 In decision tree regression, the algorithm splits the data at nodes based on the feature
that minimizes the variance in the target variable within each split.
 Unlike classification trees, where each leaf represents a class label, each leaf node in
regression trees represents a continuous value, typically the average of all values in
that node's data subset.
 Decision trees for regression are particularly useful for handling complex
relationships between variables and when there are non-linear relationships.
3. What do you mean by information gain and entropy? How is it used to
build decision trees? Illustrate using an example.

 Entropy: In the context of decision trees, entropy is a measure of the impurity or

randomness in a dataset. It’s calculated using the formula:

 Information Gain: Information gain is a metric that quantifies the reduction in

entropy (or impurity) achieved by a split in the dataset. It is calculated as:

How it’s used to build decision trees:

1. The decision tree algorithm calculates the entropy for each possible split in the
dataset.
2. It chooses the split that maximizes the information gain, thereby reducing the
dataset’s impurity the most.
3. This process continues recursively until each node is pure (or reaches a stopping
condition).

Example: Suppose we want to classify whether a person will buy a car based on their
income. We can use entropy and information gain to determine the best split at each node,
choosing features that maximize information gain to grow the tree.

4. What are issues in decision tree learning? How are they overcome?

Common issues with decision trees include:

 Overfitting: Decision trees can easily become too complex, capturing noise in the
data.
o Solution: Use techniques like pruning (removing branches that have low
importance), setting a minimum number of samples per leaf, or limiting tree
depth.
 High variance: Small changes in data can result in a different tree structure.
o Solution: Use ensemble methods like Random Forest, which averages the
predictions of multiple trees to reduce variance.
 Bias towards features with more levels: Decision trees tend to favor features with
more levels for splitting.
o Solution: Use feature scaling or consider alternative methods that do not rely
on this criterion.
 Sensitivity to imbalance in classes: Decision trees might favor the majority class in
imbalanced datasets.
o Solution: Use resampling techniques like SMOTE or assign class weights to
balance the classes.

5. Use the following data to generate a linear regression model for annual
salary as a function of GPA and number of months worked.

To perform a multiple linear regression, where Annual Salary is predicted based on GPA and
Months Worked, follow these steps:

Step 1: Define the Linear Regression Model

The model for predicting annual salary (Y) based on GPA (X1) and months worked (X2) is:

where:

 Y is the annual salary,

 X1 is the GPA,
 X2 is the number of months worked,
 β0, β1, and β2 are the coefficients we need to estimate.

Step 2: Prepare the Data for Analysis

From the table:

Example no. Annual Salary ($) GPA Months Worked

1 20000 2.8 48
2 24500 3.4 24
3 23000 3.2 24
4 25000 3.8 24
5 20000 3.2 48
6 22500 3.4 36
7 27500 4.0 24
8 19000 2.6 48
Example no. Annual Salary ($) GPA Months Worked
9 24000 3.2 36
10 28500 3.8 12

Step 3: Perform Linear Regression

1. Input Variables: GPA and Months Worked.

2. Target Variable: Annual Salary.

Using a statistical tool (such as Python's statsmodels or sklearn), you can run a multiple
linear regression to estimate the coefficients β0, β1, and β2 .

import pandas as pd

import statsmodels.api as sm

# Create DataFrame with the data

data = {

'Annual_Salary': [20000, 24500, 23000, 25000, 20000, 22500, 27500, 19000, 24000, 28500],

'GPA': [2.8, 3.4, 3.2, 3.8, 3.2, 3.4, 4.0, 2.6, 3.2, 3.8],

'Months_Worked': [48, 24, 24, 24, 48, 36, 24, 48, 36, 12]

df = pd.DataFrame(data)

# Define independent and dependent variables

X = df[['GPA', 'Months_Worked']]

Y = df['Annual_Salary']

# Add a constant to the independent variables (for intercept)

X = sm.add_constant(X)

# Fit the model

model = sm.OLS(Y, X).fit()

# Print the summary

print(model.summary())

Step 4: Interpretation of Results

After running the regression model, you would obtain values for β0, β1, and β2. Your final
equation might look like:

Annual Salary=β0+β1⋅GPA+β2⋅Months Worked

For example, if the regression analysis yields:

 β0=15000,
 β1=2500,
 β2=100,

Then the model would be:

Annual Salary=15000+2500⋅GPA+100⋅Months Worked

Using this equation, you can predict the annual salary based on any given GPA and months
worked.

Step 5: Use the Model for Prediction

Using this equation, we can predict the annual salary for any given values of GPA and
months worked. For instance, if someone has a GPA of 3.5 and has worked for 30 months,
we can substitute these values into the equation:

Annual Salary=12000+2500⋅3.5+150⋅30\text{Annual Salary} = 12000 + 2500 \cdot 3.5 +

150 \cdot 30Annual Salary=12000+2500⋅3.5+150⋅30

Calculating this:

1. 2500⋅3.5=87502500 \cdot 3.5 = 87502500⋅3.5=8750

2. 150⋅30=4500150 \cdot 30 = 4500150⋅30=4500
3. 12000+8750+4500=2525012000 + 8750 + 4500 = 2525012000+8750+4500=25250

So, the predicted annual salary would be $25,250.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Exercises 695 Clas
No ratings yet
Exercises 695 Clas
3 pages
EE418 HW5 Solutions
No ratings yet
EE418 HW5 Solutions
12 pages
Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
13.10 Last Year The Diamond Manufacturing Company Purchased Over $10 Million Worth of
No ratings yet
13.10 Last Year The Diamond Manufacturing Company Purchased Over $10 Million Worth of
7 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Density & Grid based clustering
100% (1)
Density & Grid based clustering
21 pages
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Bit Plane Slicing and Bit Plane Compression
No ratings yet
Bit Plane Slicing and Bit Plane Compression
5 pages
Mini Project HPC
No ratings yet
Mini Project HPC
17 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Data Binning
No ratings yet
Data Binning
9 pages
Data Mining Worksheet One
No ratings yet
Data Mining Worksheet One
2 pages
Data Structures and Algorithms: Assignment 1
No ratings yet
Data Structures and Algorithms: Assignment 1
4 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Data Mining PDF
No ratings yet
Data Mining PDF
67 pages
Tools of Structured Analysis
100% (1)
Tools of Structured Analysis
23 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
Weka Lab Record Experiments
No ratings yet
Weka Lab Record Experiments
21 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
SOC Lab Manual
No ratings yet
SOC Lab Manual
11 pages
ML - Unit 2
No ratings yet
ML - Unit 2
15 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
CA2-Question Bank MCQ (PEC-CSBS601D)
No ratings yet
CA2-Question Bank MCQ (PEC-CSBS601D)
9 pages
CS 2032 - Data Warehousing and Data Mining PDF
No ratings yet
CS 2032 - Data Warehousing and Data Mining PDF
3 pages
CB3402 - OPERATING SYSTEM AND SECURITY RECORD
No ratings yet
CB3402 - OPERATING SYSTEM AND SECURITY RECORD
77 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Data Mining Lab Questions
100% (1)
Data Mining Lab Questions
47 pages
Branch and Bound NOV 2021
No ratings yet
Branch and Bound NOV 2021
38 pages
A-Simple-Neural-Network-From-Scratch - Jupyter Notebook
No ratings yet
A-Simple-Neural-Network-From-Scratch - Jupyter Notebook
9 pages
N Queen Problem
No ratings yet
N Queen Problem
12 pages
Decision Tables Exercises
100% (1)
Decision Tables Exercises
3 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Sample Questions Pattern Recognition
No ratings yet
Sample Questions Pattern Recognition
8 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
Course Plan Natural Language Processing
No ratings yet
Course Plan Natural Language Processing
5 pages
Topic:-Alpha-Beta Pruning: Artificial Intelligence and Expert System
No ratings yet
Topic:-Alpha-Beta Pruning: Artificial Intelligence and Expert System
15 pages
ccs341-data-warehousing-lab-manual2021 (1)
No ratings yet
ccs341-data-warehousing-lab-manual2021 (1)
48 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
Jntuk R20 ML Unit-Iii
100% (1)
Jntuk R20 ML Unit-Iii
21 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
AD3461 ML lab manual
No ratings yet
AD3461 ML lab manual
32 pages
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
No ratings yet
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
9 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Lesson Money Rich
No ratings yet
Lesson Money Rich
5 pages
Wind Energy
No ratings yet
Wind Energy
5 pages
Demolition Permit Checklist
No ratings yet
Demolition Permit Checklist
1 page
The Area: Google Customer Solutions (GCS)
No ratings yet
The Area: Google Customer Solutions (GCS)
2 pages
Session-6, Req. Engg.
No ratings yet
Session-6, Req. Engg.
21 pages
Eaton Airflex Clutches
100% (1)
Eaton Airflex Clutches
23 pages
Reference Paper - FTIR Automatic Density Peaks Clustering Based On Cosine Similarity
No ratings yet
Reference Paper - FTIR Automatic Density Peaks Clustering Based On Cosine Similarity
7 pages
Sigma 13
No ratings yet
Sigma 13
2 pages
SunGiga 215kWh_User Manual
No ratings yet
SunGiga 215kWh_User Manual
34 pages
VRIO Framework
No ratings yet
VRIO Framework
15 pages
11 Worksheet 1 (3) - YANSON
No ratings yet
11 Worksheet 1 (3) - YANSON
4 pages
Sanket Shah ji VCCI Visiting Card
No ratings yet
Sanket Shah ji VCCI Visiting Card
2 pages
List of Facility Head-7
No ratings yet
List of Facility Head-7
5 pages
Exercise Income Approach
No ratings yet
Exercise Income Approach
2 pages
Jurnal Internasional 1
No ratings yet
Jurnal Internasional 1
11 pages
Science Dashain Project
No ratings yet
Science Dashain Project
8 pages
Standard P13 06
No ratings yet
Standard P13 06
3 pages
10
No ratings yet
10
8 pages
Glovebox Guide To Evs Esf
No ratings yet
Glovebox Guide To Evs Esf
20 pages
Notes To Pharmacovigilance
100% (1)
Notes To Pharmacovigilance
58 pages
Tofani
No ratings yet
Tofani
2 pages
15647/Ltt Guwahati Ex Sleeper Class (SL)
No ratings yet
15647/Ltt Guwahati Ex Sleeper Class (SL)
2 pages
DLP - Urinary System
No ratings yet
DLP - Urinary System
6 pages
Baa3023-Project MGMT in Construction 21213
No ratings yet
Baa3023-Project MGMT in Construction 21213
6 pages
Introduction To Research Methods
No ratings yet
Introduction To Research Methods
52 pages
Parts Guide Manual: Bizhub C250
No ratings yet
Parts Guide Manual: Bizhub C250
112 pages
A Complete Manual of Safety Engineering (Seventh Semester)
No ratings yet
A Complete Manual of Safety Engineering (Seventh Semester)
146 pages
7699-Văn Bản Của Bài Báo-12940-1-10-20230105
No ratings yet
7699-Văn Bản Của Bài Báo-12940-1-10-20230105
9 pages
Dulce Et Decorum Est
No ratings yet
Dulce Et Decorum Est
2 pages