supervised_learning
supervised_learning
Think of features and labels like ingredients and the final dish in a recipe.
Features are the ingredients—the details that help make a decision. For example, if
you’re predicting whether an email is spam, features could be things like the number of
links, certain words in the subject, or the sender's address.
Labels are the final dish—the correct answer. In the email example, the label would be
"spam" or "not spam."
In machine learning, we train a model using features (inputs) to predict the correct labels
(outputs).
SUPERVISED LEARNING
Imagine you’re learning to recognize animals, and someone gives you pictures labeled "dog" or
"cat." You study these labeled examples and learn the patterns that distinguish dogs from cats.
Later, when you see a new picture (without a label), you use what you’ve learned to guess if it’s
a dog or a cat.
In technical terms, supervised learning is a type of machine learning where a model is trained
using labeled data—meaning each input has a known correct output. The goal is for the model to
learn patterns so it can correctly predict outcomes for new, unseen data.
Classification and regression are two types of machine learning tasks, and the difference is in
the kind of answers they predict.
Classification is about sorting things into categories. Example: "Is this email spam or
not?" (Yes/No), or "What animal is in this picture?" (Cat, Dog, or Bird). The answer is
always a category or label.
Regression is about predicting numbers. Example: "What will the temperature be
tomorrow?" or "How much will this house sell for?" The answer is always a number.
1. Classification → When the goal is to categorize data into different classes or labels.
o Example:
Spam detection (Spam or Not Spam)
Disease diagnosis (Healthy or Sick)
Handwriting recognition (A, B, C, etc.)
2. Regression → When the goal is to predict a continuous numerical value.
o Example:
Predicting house prices based on size and location
Forecasting stock market prices
Estimating a person’s weight based on height and age
Both types use labeled data to train the model, but classification predicts categories, while
regression predicts numbers.
There are several types of regression in supervised learning, depending on the problem you’re
solving. Here are the main ones in simple words:
1. Linear Regression – The simplest type, where we draw a straight line to predict values.
o Example: Predicting house prices based on size (bigger house → higher price).
2. Multiple Linear Regression – Like linear regression but with multiple factors (features).
o Example: Predicting a car’s price based on its age, mileage, and brand.
3. Polynomial Regression – Instead of a straight line, it uses curves to fit the data better.
o Example: Predicting population growth over time (which might not be linear).
4. Ridge & Lasso Regression – Special types of regression that prevent overfitting
(learning too much from training data and failing on new data).
o Example: Used in financial forecasting where too many factors might confuse the
model.
4. Imbalanced Classification (One category is much more common than the others)
Supervised learning models learn from labeled data (where we provide both inputs and correct
outputs) to make predictions. These models are mainly divided into:
1. Classification Models (Predict categories)
2. Regression Models (Predict numbers)
Used when the output is a category or label (e.g., "Spam" or "Not Spam").
Logistic Regression → Predicts probability of a class (e.g., will a student pass? Yes/No).
Decision Tree → A tree-like model that splits data step by step (e.g., diagnosing a disease).
Random Forest → A collection of decision trees for better accuracy.
Support Vector Machine (SVM) → Finds the best boundary between categories.
K-Nearest Neighbors (KNN) → Compares data points with the closest neighbors.
Naïve Bayes → Uses probability to classify (e.g., spam filters).
Neural Networks (MLP Classifier) → Works like the human brain to recognize patterns (e.g.,
face recognition).
Used when the output is a continuous value (e.g., price, temperature, salary).
Linear Regression → Predicts a straight-line relationship (e.g., house price based on size).
Multiple Linear Regression → Uses multiple factors to predict (e.g., salary based on experience,
education, location).
Polynomial Regression → Uses curves instead of straight lines for better accuracy.
Ridge & Lasso Regression → Prevents overfitting (when the model memorizes data instead of
learning).
1. Collect Data 📊
4. Choose a Model 🤖
Feed the training data into the model so it can learn patterns.
Use the model in real applications (like a chatbot, recommendation system, or self-driving car).
# 1️⃣ Sample Data (House Size in Square Feet vs. Price in $1000s)
model = LinearRegression()
model.fit(X_train, y_train)
# 4️⃣ Predict Price for a New House (e.g., 1050 sq. ft)
new_house = np.array([[1050]])
predicted_price = model.predict(new_house)
plt.ylabel("Price ($1000s)")
plt.legend()
plt.show()
Real-Time Example of Logistic Regression in Scikit-Learn
Let’s build a Logistic Regression model to predict whether a student will pass or fail an exam
based on the number of study hours.
Steps
import numpy as np
import matplotlib.pyplot as plt
# 1️⃣ Sample Data (Hours Studied vs. Exam Result: 1 = Pass, 0 = Fail)
model = LogisticRegression()
model.fit(X_train, y_train)
new_student = np.array([[5.5]])
predicted_result = model.predict(new_student)
print(f"Predicted Outcome for 5.5 Hours of Study: {'Pass' if predicted_result[0] == 1 else 'Fail'}")
plt.xlabel("Hours Studied")
plt.ylabel("Probability of Passing")
plt.legend()
plt.show()
How It Works?
1️⃣ Is it raining? 🌧
Where is it Used?
Key Benefits
🔹 Summary
A Decision Tree is a simple yet powerful machine learning model that helps make decisions
based on conditions. 🌳
Let's build a Decision Tree Classifier to predict if a person will buy a car based on their age
and income.
📌 Steps
import numpy as np
import pandas as pd
# 1️⃣ Sample Data: [Age, Income ($1000s)] → Target: 1 = Buy, 0 = Not Buy
data = np.array([
[60, 70, 1]
])
model = DecisionTreeClassifier()
model.fit(X, y)
# 3️⃣ Predict if a New Person (Age 30, Income 50K) Will Buy a Car
prediction = model.predict(new_person)
print(f"Will a 30-year-old with $50K income buy a car? {'Yes' if prediction[0] == 1 else 'No'}")
plt.figure(figsize=(8,5))
plt.show()