0% found this document useful (0 votes)
2 views

supervised_learning

The document provides an overview of machine learning, focusing on supervised learning, which involves training models using labeled data to make predictions. It explains the concepts of features and labels, the types of supervised learning tasks (classification and regression), and various algorithms used for each task. Additionally, it outlines the steps for building a machine learning model, including data collection, preparation, model selection, training, testing, evaluation, and deployment.

Uploaded by

sravanrocks19
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

supervised_learning

The document provides an overview of machine learning, focusing on supervised learning, which involves training models using labeled data to make predictions. It explains the concepts of features and labels, the types of supervised learning tasks (classification and regression), and various algorithms used for each task. Additionally, it outlines the steps for building a machine learning model, including data collection, preparation, model selection, training, testing, evaluation, and deployment.

Uploaded by

sravanrocks19
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Machine learning is like teaching a computer to learn from experience, just like humans do.

FEATURES AND LABELS

Think of features and labels like ingredients and the final dish in a recipe.

 Features are the ingredients—the details that help make a decision. For example, if
you’re predicting whether an email is spam, features could be things like the number of
links, certain words in the subject, or the sender's address.

 Labels are the final dish—the correct answer. In the email example, the label would be
"spam" or "not spam."

In machine learning, we train a model using features (inputs) to predict the correct labels
(outputs).

SUPERVISED LEARNING

Supervised learning is like learning with a teacher.

Imagine you’re learning to recognize animals, and someone gives you pictures labeled "dog" or
"cat." You study these labeled examples and learn the patterns that distinguish dogs from cats.

Later, when you see a new picture (without a label), you use what you’ve learned to guess if it’s
a dog or a cat.

In technical terms, supervised learning is a type of machine learning where a model is trained
using labeled data—meaning each input has a known correct output. The goal is for the model to
learn patterns so it can correctly predict outcomes for new, unseen data.

Classification and regression are two types of machine learning tasks, and the difference is in
the kind of answers they predict.

 Classification is about sorting things into categories. Example: "Is this email spam or
not?" (Yes/No), or "What animal is in this picture?" (Cat, Dog, or Bird). The answer is
always a category or label.
 Regression is about predicting numbers. Example: "What will the temperature be
tomorrow?" or "How much will this house sell for?" The answer is always a number.

In short: Classification → Categories Regression → Numbers.


Supervised learning has two main types:

1. Classification → When the goal is to categorize data into different classes or labels.
o Example:
 Spam detection (Spam or Not Spam)
 Disease diagnosis (Healthy or Sick)
 Handwriting recognition (A, B, C, etc.)
2. Regression → When the goal is to predict a continuous numerical value.
o Example:
 Predicting house prices based on size and location
 Forecasting stock market prices
 Estimating a person’s weight based on height and age

Both types use labeled data to train the model, but classification predicts categories, while
regression predicts numbers.

There are several types of regression in supervised learning, depending on the problem you’re
solving. Here are the main ones in simple words:

1. Linear Regression – The simplest type, where we draw a straight line to predict values.
o Example: Predicting house prices based on size (bigger house → higher price).

2. Multiple Linear Regression – Like linear regression but with multiple factors (features).
o Example: Predicting a car’s price based on its age, mileage, and brand.

3. Polynomial Regression – Instead of a straight line, it uses curves to fit the data better.
o Example: Predicting population growth over time (which might not be linear).

4. Ridge & Lasso Regression – Special types of regression that prevent overfitting
(learning too much from training data and failing on new data).
o Example: Used in financial forecasting where too many factors might confuse the
model.

5. Logistic Regression – Technically used for classification, but it predicts probabilities.


o Example: Predicting whether a student will pass (Yes/No) based on study hours.
Types of Classification in Machine Learning

1. Binary Classification (Two categories)

 Example: Is an email spam or not? (Yes/No)


 Algorithms: Logistic Regression, Decision Trees, SVM, etc.

2. Multi-Class Classification (More than two categories)

 Example: What type of animal is in the picture? (Cat, Dog, or Bird)


 Algorithms: Random Forest, Neural Networks, Naïve Bayes, etc.

3. Multi-Label Classification (Each item can belong to multiple categories)

 Example: A news article can be about both politics and sports.


 Algorithms: Neural Networks, K-Nearest Neighbors (KNN), etc.

4. Imbalanced Classification (One category is much more common than the others)

 Example: Fraud detection (only a few cases of fraud in millions of transactions).


 Techniques: Oversampling, Undersampling, SMOTE, etc.

Supervised learning models learn from labeled data (where we provide both inputs and correct
outputs) to make predictions. These models are mainly divided into:
1. Classification Models (Predict categories)
2. Regression Models (Predict numbers)

1. Classification Models (For sorting data into groups)

Used when the output is a category or label (e.g., "Spam" or "Not Spam").

 Logistic Regression → Predicts probability of a class (e.g., will a student pass? Yes/No).
 Decision Tree → A tree-like model that splits data step by step (e.g., diagnosing a disease).
 Random Forest → A collection of decision trees for better accuracy.
 Support Vector Machine (SVM) → Finds the best boundary between categories.
 K-Nearest Neighbors (KNN) → Compares data points with the closest neighbors.
 Naïve Bayes → Uses probability to classify (e.g., spam filters).
 Neural Networks (MLP Classifier) → Works like the human brain to recognize patterns (e.g.,
face recognition).

2. Regression Models (For predicting numbers)

Used when the output is a continuous value (e.g., price, temperature, salary).

 Linear Regression → Predicts a straight-line relationship (e.g., house price based on size).
 Multiple Linear Regression → Uses multiple factors to predict (e.g., salary based on experience,
education, location).
 Polynomial Regression → Uses curves instead of straight lines for better accuracy.
 Ridge & Lasso Regression → Prevents overfitting (when the model memorizes data instead of
learning).

Steps in Machine Learning


Building a machine learning model is like teaching a child how to recognize things. Here are the
main steps:

1. Collect Data 📊

 Gather data from files, databases, or online sources.


 Example: A spreadsheet of house prices with size, location, and number of rooms.

2. Prepare the Data 🧹

 Clean the data (remove missing or incorrect values).


 Organize it into features (inputs) and labels (outputs).
 Example: Features → House size & location, Label → House price.

3. Split the Data ✂️

 Divide the data into:


o Training Set (used to teach the model).
o Testing Set (used to check how well it learned).
 Example: 80% for training, 20% for testing.

4. Choose a Model 🤖

 Pick a model based on the problem:


o Classification (for categories, like "Spam" or "Not Spam").
o Regression (for numbers, like predicting house prices).

5. Train the Model 🎓

 Feed the training data into the model so it can learn patterns.

6. Test the Model ✅


 Give the model new test data to see how well it makes predictions.

7. Evaluate the Model 📈

 Check how accurate it is using metrics like:


o Accuracy (for classification problems).
o Mean Squared Error (for regression problems).

8. Improve the Model 🔄

 Try different models or tweak settings (hyperparameters).


 Add more data or remove unnecessary features.

9. Deploy the Model 🚀

 Use the model in real applications (like a chatbot, recommendation system, or self-driving car).

Linear regression example in sklearn


import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

# 1️⃣ Sample Data (House Size in Square Feet vs. Price in $1000s)

house_size = np.array([500, 600, 700, 800, 900, 1000, 1100, 1200]).reshape(-1, 1)

house_price = np.array([150, 180, 210, 240, 270, 300, 330, 360])

# 2️⃣ Split Data into Training & Testing Sets

X_train, X_test, y_train, y_test = train_test_split(house_size, house_price, test_size=0.2,


random_state=42)

# 3️⃣ Train the Model

model = LinearRegression()

model.fit(X_train, y_train)

# 4️⃣ Predict Price for a New House (e.g., 1050 sq. ft)

new_house = np.array([[1050]])

predicted_price = model.predict(new_house)

print(f"Predicted Price for 1050 sq.ft house: ${predicted_price[0]*1000:.2f}")

# 5️⃣ Visualize the Results

plt.scatter(house_size, house_price, color="blue", label="Actual Prices")

plt.plot(house_size, model.predict(house_size), color="red", label="Regression Line")

plt.scatter(new_house, predicted_price, color="green", marker="o", label="Prediction (1050 sq.ft)")

plt.xlabel("House Size (sq. ft)")

plt.ylabel("Price ($1000s)")

plt.legend()

plt.show()
Real-Time Example of Logistic Regression in Scikit-Learn

Let’s build a Logistic Regression model to predict whether a student will pass or fail an exam
based on the number of study hours.

Steps

1. Import necessary libraries.


2. Create sample data (Study hours vs. Pass/Fail).
3. Train a Logistic Regression model.
4. Predict if a new student (e.g., 5 hours of study) will pass.
5. Visualize the results.

import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

# 1️⃣ Sample Data (Hours Studied vs. Exam Result: 1 = Pass, 0 = Fail)

study_hours = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)

exam_result = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1]) # 0 = Fail, 1 = Pass

# 2️⃣ Split Data into Training & Testing Sets

X_train, X_test, y_train, y_test = train_test_split(study_hours, exam_result, test_size=0.2,


random_state=42)

# 3️⃣ Train the Logistic Regression Model

model = LogisticRegression()

model.fit(X_train, y_train)

# 4️⃣ Predict if a Student Studying 5.5 Hours Will Pass

new_student = np.array([[5.5]])

predicted_result = model.predict(new_student)

print(f"Predicted Outcome for 5.5 Hours of Study: {'Pass' if predicted_result[0] == 1 else 'Fail'}")

# 5️⃣ Visualize the Results

plt.scatter(study_hours, exam_result, color="blue", label="Actual Data")

plt.plot(study_hours, model.predict_proba(study_hours)[:, 1], color="red", label="Pass Probability


Curve")

plt.scatter(new_student, model.predict_proba(new_student)[:, 1], color="green", marker="o",


label="Prediction (5.5 hours)")

plt.xlabel("Hours Studied")

plt.ylabel("Probability of Passing")

plt.legend()
plt.show()

Decision Tree in Simple Words 🌳


A Decision Tree is like a flowchart that helps computers make decisions by asking a series of
yes/no questions.

How It Works?

Imagine you want to decide whether to go outside based on the weather.

1️⃣ Is it raining? 🌧

 Yes → Take an umbrella. ☔


 No → Go to the next question.

2️⃣ Is it too hot? 🌡

 Yes → Wear sunglasses. 🕶


 No → Just go outside! 😎

It splits the data step by step based on conditions.

Where is it Used?

✅ Medical Diagnosis → Is the patient sick or healthy?


✅ Spam Detection → Is an email spam or not?
✅ Loan Approval → Should a bank approve a loan?

Key Benefits

✔ Easy to understand (like a decision-making chart).


✔ Works well with both numbers & categories.
✔ Fast and efficient for small datasets.

🔹 Summary

A Decision Tree is a simple yet powerful machine learning model that helps make decisions
based on conditions. 🌳

Real-Time Example of Decision Tree in Scikit-Learn

Let's build a Decision Tree Classifier to predict if a person will buy a car based on their age
and income.
📌 Steps

1. Import necessary libraries.


2. Create sample data (Age, Income vs. Buy/Not Buy).
3. Train a Decision Tree model.
4. Predict if a new person (e.g., 30 years old, $50K income) will buy a car.
5. Visualize the Decision Tree.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.tree import DecisionTreeClassifier, plot_tree

# 1️⃣ Sample Data: [Age, Income ($1000s)] → Target: 1 = Buy, 0 = Not Buy

data = np.array([

[25, 30, 0], # Age 25, Income 30K → Not Buy

[30, 50, 1], # Age 30, Income 50K → Buy

[35, 40, 0],

[40, 60, 1],

[45, 80, 1],

[50, 20, 0],

[55, 90, 1],

[60, 70, 1]

])

# Splitting Features (X) and Target (y)

X = data[:, :2] # Age & Income


y = data[:, 2] # Buy (1) or Not Buy (0)

# 2️⃣ Train Decision Tree Model

model = DecisionTreeClassifier()

model.fit(X, y)

# 3️⃣ Predict if a New Person (Age 30, Income 50K) Will Buy a Car

new_person = np.array([[30, 50]])

prediction = model.predict(new_person)

print(f"Will a 30-year-old with $50K income buy a car? {'Yes' if prediction[0] == 1 else 'No'}")

# 4️⃣ Visualize the Decision Tree

plt.figure(figsize=(8,5))

plot_tree(model, feature_names=["Age", "Income"], class_names=["Not Buy", "Buy"], filled=True)

plt.show()

You might also like