0% found this document useful (0 votes)
8 views

mla_unit-5'2 (1)

The document discusses text classification using Naive Bayes and Support Vector Machine (SVM) algorithms, detailing the processes of binary, multiclass, and multi-label classification. It explains the mechanics of Naive Bayes, Bayes theorem, and performance evaluation metrics such as accuracy, precision, recall, and F1 score. Additionally, it covers SVM's functionality in classifying news articles, including the use of TF-IDF for feature extraction and the implementation of SVM for achieving high classification accuracy.

Uploaded by

siri.brdvl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

mla_unit-5'2 (1)

The document discusses text classification using Naive Bayes and Support Vector Machine (SVM) algorithms, detailing the processes of binary, multiclass, and multi-label classification. It explains the mechanics of Naive Bayes, Bayes theorem, and performance evaluation metrics such as accuracy, precision, recall, and F1 score. Additionally, it covers SVM's functionality in classifying news articles, including the use of TF-IDF for feature extraction and the implementation of SVM for achieving high classification accuracy.

Uploaded by

siri.brdvl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

TEXT CLASSIFICATION

USING
NAIVES BAYES
TEXT CLASSIFICATION

Text classification is the process of assigning predefined categories


or labels to text data.
It’s a form of supervised machine learning where the input is a
piece of text, and the output is the class it belongs to.
This process can be applied to various types of text such as
documents, emails, reviews, or social media posts.
TYPESOFCLASSIFICATION

Machine learning classification can be categorized into


Binary Classification
Multiclass Classification
Multi-label Classification

BINARY
CLASSIFICATION:
Involves only two categories,
such as positive/negative.
It is the simplest form of text classification, where the model decides
between two possible outcomes.
MULTI-CLASS CLASSIFICATION:
Text is classified into one category out of more than two possible
classes (e.g., classifying news articles as sports, politics, tech).
Each text input gets only one label, even if others may seem
relevant.
MULTI-LABEL CLASSIFICATION:
A single text can be assigned multiple labels (e.g., a tweet labeled as
sports and health).
It reflects the real-world scenarios where content may belong to more
than one category simultaneously.
NAIVE BAYES

• The naïve bayes classifier belongs to the family of probabilistic classifiers that
computes the probabilities of each predictive feature of the data belonging to each
class in order to make a prediction of probability distribution over all classes, besides
the most likely class that the data sample is associated with.
•Bayes: It maps the probabilities of observing input features given belonging classes,
to the probability distribution over classes based on bayes theorem.
•Naïve: It simplifies probability computations by assuming that predictive features are
mutually independent.
BAYES THEOREM

•Let A and B denote two events. An event can be that it will rain tomorrow ,two
kings are drawn from a deck of cards, a person has cancer.
•P(A/B) is the probability that A occurs given B is true can be computed by:
• P(A/B)=p(B/A)p(A)/p(B)
•Where p(B/A) is the probability of observing B given A occurs, and p(A),p(B) the
probability of A occurs and B occurs respectively.
PERFORMANCE EVALUATION

Confusion Matrix: It is a table that is used to define the performance of a


classification algorithms.
• It visualizes and summarizes the performance of a classification.
True positive : The model or test correctly predicts positive when it is
actually positive.
True negative : The model or test correctly predicts negative when it is
actually negative.
False positive : The model or test incorrectly predicts positive but it is
actually negative.
False negative : The model or test incorrectly predicts negative when it is
actually positive.
• Accuracy : It is the probability of number of correct predictions to the
total number of predictions.
Accuracy=TP+TN/TP+TN+FP+FN
• Precision : It is a metric that gives the proportion of true positives to the
amount of total positives that the model predicts.
Precision=TP/TP+FP
• Recall : It focuses on how good the model is finding all the positives. It is
also called as true positive rate.
Recall=TP/TP+FN
• F1 Score : It is a measure that combines recall and precision. It is used to
measure how effectively models make the trade off between precision and
recall.
F1=2*(precision*recall)/(precision+recall)
Email Spam
Detection using
Naive Bayes
Classification:
• Spam email detection is basically a machine learning
classification problem.
• Classification is one of the main instances of supervised
learning in machine learning.
• A trained classification model will be generated by
learning from the features and targets of training
samples.
Types of Classification:
• Machine learning classification can
be categorized into binary
Classification, multiclass
Classification, and multi-label
Classification
BINARY CLASSIFICATION:
Binary classification is the problem of
classifying observations into one of
the two possible classes.
One frequently mentioned example is
email spam filtering , which identifies
email messages as spam or not spam.
Multiclass Classification:
It is also called as multinominal
Classification, allows more than
two possible classes, as
opposed to only two classes in
binary cases.
• Handwritten digit recognition
is common instance and it has
a long history of research and
development since the early
1900s
Multi-label Classification:
It is different from first two
types of classification where
target classes are disjointed.
Naïve bayes:
• The naïve bayes classifier belongs to the family of
probabilistic classifiers that computes the probabilities
of each predictive feature of the data belonging to each
class in order to make a prediction of probability
distribution over all classes, besides the most likely class
that the data sample is associated with.
• Bayes: It maps the probabilities of observing input
features given belonging classes, to the probability
distribution over classes based on bayes theorem.
• Naïve: It simplifies probability computations by
assuming that predictive features are mutually
independent.
Bayes theorem:
• Let A and B denote two events. An event can be
that it will rain tomorrow ,two kings are drawn
from a deck of cards, a person has cancer.
• P(A/B) is the probability that A occurs given B is
true can be computed by:
• P(A/B)=p(B/A)p(A)/p(B)
• Where p(B/A) is the probability of observing B
given A occurs, and p(A),p(B) the probability of
A occurs and B occurs respectively.
Performance evaluation:
• Confusion Matrix: It is a table that is used to define the
performance of a classification algorithms.
• It visualizes and summarizes the performance of a
classification.
• True positive : The model or test correctly predicts
positive when it is actually positive.
• True negative : The model or test correctly predicts
negative when it is actually negative.
• False positive : The model or test incorrectly predicts
positive but it is actually negative.
• False negative : The model or test incorrectly predicts
negative when it is actually positive.
• Accuracy : It is the probability of number of correct
predictions to the total number of predictions.
Accuracy=TP+TN/TP+TN+FP+FN
• Precision : It is a metric that gives the proportion of true
positives to the amount of total positives that the model
predicts.
Precision=TP/TP+FP
• Recall : It focuses on how good the model is finding all
the positives. It is also called as true positive rate.
Recall=TP/TP+FN
• F1 Score : It is a measure that combines recall and
precision. It is used to measure how effectively models
make the trade off between precision and recall.
F1=2*(precision*recall)/(precision+recall)
News Topic Classification with
Support Vector Machine
…………………………………………………………………………………..
This chapter focuses on classifying news articles by topic using the Support Vector Machine (SVM)
algorithm, building upon the 20 Newsgroups dataset introduced earlier. It offers a practical approach
to understanding machine learning classification with text data.

Key topics covered include:


• TF-IDF (Term Frequency–Inverse Document Frequency): A method to transform text into numerical
features.
• Support Vector Machine (SVM): A robust classifier suitable for high-dimensional data like text.
• SVM Mechanics: The theory behind how SVM separates classes with a decision boundary.
• SVM Implementation: Practical steps to apply SVM in code.
• Multiclass Classification: How SVM handles multiple topic categories.
• Nonlinear Kernels: Using kernels like the Gaussian (RBF) to handle non-linear data.
• Choosing Kernels: When to use linear vs. Gaussian kernels.
• Overfitting: Challenges with model generalization and how to prevent it.
• News Topic Classification Example: Applying SVM to classify real-world news data.
• Hyperparameter Tuning: Using grid search and cross-validation to optimize SVM performance.
In the previous chapter, spam email detection was done using a Naive Bayes classifier on a feature space
represented by term frequency (tf) — counting how often each word appeared in individual documents.

Limitation: This method didn't account for how widely terms appeared across the entire collection. Common
words (like “the”, “get”, “make”) may appear frequently, reducing their usefulness for classification.

Solution: TF-IDF (Term Frequency–Inverse Document Frequency)


TF-IDF improves text feature extraction by assigning a weight to each term that:
•Increases with term frequency within a document.
•Decreases with the number of documents containing that term in the corpus.
IDF formula:

Where nD is the total number of documents, is the number of documents containing t,


and the 1 is added to avoid division by zero.
This reduces the influence of common words and highlights rarer, more meaningful terms.

We can test the effectiveness of tf-idf on our existing spam email detection model, by
simply
replacing the tf feature extractor, CountVectorizer, with the tf-idf feature extractor,
TfidfVectorizer, from scikit-learn. W
The best averaged 10-fold AUC 0.9943 is achieved, which outperforms 0.9856 obtained
based on tf features.
Support Vector Machine (SVM)
•SVM is a powerful classifier often used for text data classification
tasks.
•In classification, SVM finds an optimal hyperplane that separates data
points from different classes.
•A hyperplane is a decision boundary in an n-dimensional feature
space:
• In 2D, it’s a line.
• In 3D, it’s a surface.
• In n dimensions, it’s an (n-1)-dimensional plane.
•The goal is to find the hyperplane that maximizes the margin — the
distance between the hyperplane and the nearest data points from
each class.
•These nearest points to the hyperplane are called support vectors.
•Support vectors are critical because they define the position and
orientation of the hyperplane.
The Mechanics of SVM
•There can be infinite possible hyperplanes that separate data points from different
classes.
•The task is to find the optimal separating hyperplane — the one that correctly divides
the data and maximizes the margin.
•Scenario 1: Identifying the Separating Hyperplane
• A valid hyperplane must successfully separate data points based on their labels.
• In an example with hyperplanes A, B, and C:
• Only Hyperplane C correctly separates the classes.
• Hyperplanes A and B fail to segregate them properly.
•Mathematical Definition:
• In 2D, a line (hyperplane) is defined by:
• A slope vector w (a 2D vector)
• An intercept b
• In n-dimensional space, a hyperplane is similarly defined by:
• An n-dimensional vector w
• An intercept b
• Any point x lying on the hyperplane satisfies the equation:
• A hyperplane is a separating hyperplane if:
• For any data point x from one class, it satisfies

• For any data point x from another class, it satisfies

There can be countless possible solutions for w and b. So, next we will learn how to
identify
the best hyperplane among possible separating hyperplanes.
Scenario 2 — Determining the Optimal Hyperplane
•Among many separating hyperplanes, the optimal hyperplane is the one that maximizes the margin between classes.
•Margin = the sum of:
• The distance from the nearest data point on the positive side to the hyperplane.
• The distance from the nearest data point on the negative side to the hyperplane.
•These nearest points from each class define two additional hyperplanes:
• Positive hyperplane: passes through the closest positive class point(s).
• Negative hyperplane: passes through the closest negative class point(s).
•The perpendicular distance between the positive and negative hyperplanes is called the margin.
•A decision hyperplane is optimal when this margin is maximized.
•The points that lie exactly on the margin boundaries (on the positive or negative hyperplane) are called support vectors.
•Support vectors are the critical data points that influence the position and orientation of the optimal hyperplane.
can be portrayed as the distance from the data point to the decision
hyperplane, and also interpreted as the confidence of prediction: the higher the value, the
further away from the decision boundary, the more certainty of the prediction.
Although we cannot wait to implement the SVM algorithm, let's take a step back and look
at a frequent scenario where data points are not perfectly linearly separable.
Scenario 3 - handling outliers
To deal with a set of observations containing outliers that make it unable to linearly segregate the entire dataset, we
allow misclassification of such outliers and try to minimize the introduced error.

The misclassification error (also called hinge loss) for a sample

For a training set of m samples , where the parameter C


controls the trade-off between two terms.
When C of large value is chosen, the penalty for misclassification becomes relatively high, which makes the thumb rule of
data segregation stricter and the model prone to overfitting.
An SVM model with a large C has a low bias, but it might suffer high variance.
Conversely, when the value of C is sufficiently small, the influence of misclassification becomes relatively low, which allows
more misclassified data points and thus makes the separation less strict. An SVM model with a small C has a low variance,
but it might compromise with high bias.
The implementations of SVM
Apply it right away on news topic classification. We start with a binary case classifying two topics,
comp.graphics and sci.space:
First, load the training and testing subset of the computer graphics and science space news
data respectively:
>>> categories = ['comp.graphics', 'sci.space']
>>> data_train = fetch_20newsgroups(subset='train',
categories=categories, random_state=42)
>>> data_test = fetch_20newsgroups(subset='test',
categories=categories, random_state=42)

Again, don't forget to specify a random state for reproducing experiments.


Clean the text data and retrieve label information:
>>> cleaned_train = clean_text(data_train.data)
>>> label_train = data_train.target
>>> cleaned_test = clean_text(data_test.data)
>>> label_test = data_test.target
>>> len(label_train), len(label_test)

As a good practice, check whether the classes are imbalanced:


>>> from collections import Counter
>>> Counter(label_train)
Counter({1: 593, 0: 584})
>>> Counter(label_test)
Counter({1: 394, 0: 389})

Next, extract tf-idf features using the TfidfVectorizer extractor that we just acquired:
>>> tfidf_vectorizer = TfidfVectorizer(sublinear_tf=True,
max_df=0.5, stop_words='english', max_features=8000)
>>> term_docs_train =
tfidf_vectorizer.fit_transform(cleaned_train)
>>> term_docs_test = tfidf_vectorizer.transform(cleaned_test)
Now we can apply our SVM algorithm with features ready. Initialize an SVC model with the
kernel parameter set to linear (we will explain this shortly) and penalty C set to the
default value 1:
>>> from sklearn.svm import SVC
>>> svm = SVC(kernel='linear', C=1.0, random_state=42)

Then fit our model on the training set:


>>> svm.fit(term_docs_train, label_train)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto',
kernel='linear',max_iter=-1, probability=False, random_state=42,
shrinking=True, tol=0.001, verbose=False)

And then predict on the testing set with the trained model and obtain the prediction
accuracy directly:
>>> accuracy = svm.score(term_docs_test, label_test)
>>> print('The accuracy on testing set is:
{0:.1f}%'.format(accuracy*100))
The accuracy on testing set is: 96.4%

Our first SVM model just works so well with 96.4% accuracy achieved
Scenario 4 - dealing with more than two classes
SVM and many other classifiers can be generalized to the multiple class case by two
common approaches, one-vs-rest (also called one-vs-all) and one-vs-one.
In scikit-learn, classifiers handle multiclass cases internally and we do not need to explicitly
write any additional codes to enable it. We can see how simple it is in the following
example of classifying five topics comp.graphics, sci.space, alt.atheism,
talk.religion.misc, and rec.sport.hockey:
>>> categories = [
... 'alt.atheism',
... 'talk.religion.misc',
... 'comp.graphics',
... 'sci.space',
... 'rec.sport.hockey'
... ]

>>> data_train = fetch_20newsgroups(subset='train',


categories=categories, random_state=42)
>>> data_test = fetch_20newsgroups(subset='test',
categories=categories, random_state=42)
>>> cleaned_train = clean_text(data_train.data)
>>> label_train = data_train.target
>>> cleaned_test = clean_text(data_test.data)
>>> label_test = data_test.target
>>> term_docs_train =
tfidf_vectorizer.fit_transform(cleaned_train)
>>> term_docs_test = tfidf_vectorizer.transform(cleaned_test)
In SVC, multiclass support is implicitly handled according to the one-vs-one scheme:
>>> svm = SVC(kernel='linear', C=1.0, random_state=42)
>>> svm.fit(term_docs_train, label_train)
>>> accuracy = svm.score(term_docs_test, label_test)
>>> print('The accuracy on testing set is:
{0:.1f}%'.format(accuracy*100))
The accuracy on testing set is: 88.6%

We check how it performs for individual classes as follows:


>>> from sklearn.metrics import classification_report
>>> prediction = svm.predict(term_docs_test)
>>> report = classification_report(label_test, prediction)
>>> print(report)

precision recall f1-score support


0 0.81 0.77 0.79 319
1 0.91 0.94 0.93 389
2 0.98 0.96 0.97 399
3 0.93 0.93 0.93 394
4 0.73 0.76 0.74 251
avg / total 0.89 0.89 0.89 1752

Not bad! And we could, as usual, tweak the value of the parameters kernel='linear' and
C=1.0 as specified in our SVC model. We discussed that parameter C controls the strictness
of separation, and it can be tuned to achieve the best trade-off between bias and variance.
The kernels of SVM
Scenario 5 - solving linearly non-separable problems
The hyperplane we have looked at till now is linear, for example, a line in a two dimensional feature space, a surface in a three dimensional one.
However, in frequently seen scenarios like the following one, we are not able to find any linear hyperplane to
separate two classes.
Again, can be fine-tuned via cross-validation to obtain the best performance.
Some other common kernel functions include the polynomial kernel and sigmoid
kernel:
Choosing between the linear and RBF kernel
The rule of thumb, of course, is linear separability. However, this is most of the time very
difficult to identify, unless you have sufficient prior knowledge or the features are of low
dimension (1 to 3)
Case 1: both the numbers of features and instances are large (more than 104 or 105). As the
dimension of the feature space is high enough, additional features as a result of RBF
transformation will not provide any performance improvement, but will increase
computational expense. Some examples from the UCI Machine Learning Repository are of
this type:
• URL Reputation Data Set: https://archive.ics.uci.edu/ml/datasets/URL+Reputation (number of instances: 2396130,
number of features: 3231961) for malicious URL detection based on their lexical and host information
• YouTube Multiview Video Games Data Set:
https://archive.ics.uci.edu/ml/datasets/YouTube+Multiview+Video+Games+Dataset (number of instances:
120000,number of features: 1000000) for topic classification.
Case 2: the number of features is noticeably large compared to the number of training
samples. Apart from the reasons stated in Scenario 1, the RBF kernel is significantly more
prone to overfitting. Such a scenario occurs in, for example:
• Dorothea Data Set: https://archive.ics.uci.edu/ml/datasets/Dorothea (number of instances: 1950, number of
features: 100000) for drug discovery that classifies chemical compounds as active or inactive by structural molecular
features .
• Arcene Data Set: https://archive.ics.uci.edu/ml/datasets/Arcene
(number of instances: 900, number of features: 10000) a mass-spectrometry
dataset for cancer detection
Case 3: the number of instances is significantly large compared to the number of features.
For a dataset of low dimension, the RBF kernel will, in general, boost the performance by
mapping it to a higher dimensional space. However, due to the training complexity, it
usually becomes no longer efficient on a training set with more than 106 or 107 samples.
Some exemplar datasets include:
• Heterogeneity Activity Recognition Data Set:
https://archive.ics.uci.edu/ml/datasets/Heterogeneity+Activity+Recognition (number of
instances: 43930257, number of features: 16) for human activity recognition
• HIGGS Data Set: https://archive.ics.uci.edu/ml/datasets/HIGGS (number of instances:
11000000, number of features: 28) for distinguishing between a signal process
producing Higgs bosons or a background process
News topic classification with support vector machine

Load and Clean the Data:


categories = None
data_train = fetch_20newsgroups(subset='train', categories=categories, random_state=42)
data_test = fetch_20newsgroups(subset='test', categories=categories, random_state=42)
cleaned_train = clean_text(data_train.data)
label_train = data_train.target
cleaned_test = clean_text(data_test.data)
label_test = data_test.target
term_docs_train = tfidf_vectorizer.fit_transform(cleaned_train)
term_docs_test = tfidf_vectorizer.transform(cleaned_test)

Train the SVM Model with Cross-validation:

from sklearn.svm import SVC


from sklearn.model_selection import GridSearchCV
parameters = {'C': (0.1, 1, 10, 100)}
svc_libsvm = SVC(kernel='linear')
grid_search = GridSearchCV(svc_libsvm, parameters, n_jobs=-1, cv=3)
grid_search.fit(term_docs_train, label_train)
print("--- %0.3fs seconds ---" % (timeit.default_timer() - start_time))
print(grid_search.best_params_)
print(grid_search.best_score_)
Evaluate the Model:

svc_libsvm_best = grid_search.best_estimator_
accuracy = svc_libsvm_best.score(term_docs_test, label_test)
print(f'The accuracy on testing set is: {accuracy * 100:.1f}%’)

Comparison with LinearSVC:

from sklearn.svm import LinearSVC


svc_linear = LinearSVC()
grid_search = GridSearchCV(svc_linear, parameters, n_jobs=-1, cv=3)
grid_search.fit(term_docs_train, label_train)
print(grid_search.best_params_)
print(grid_search.best_score_)
accuracy = grid_search.best_estimator_.score(term_docs_test, label_test)
print(f'The accuracy on testing set is: {accuracy * 100:.1f}%')
Optimize Using a Pipeline (TF-IDF + LinearSVC):

from sklearn.pipeline import Pipeline


pipeline = Pipeline([
('tfidf', TfidfVectorizer(stop_words='english')),
('svc', LinearSVC()),
])
parameters_pipeline = {
'tfidf__max_df': (0.25, 0.5),
'tfidf__max_features': (40000, 50000),
'tfidf__sublinear_tf': (True, False),
'tfidf__smooth_idf': (True, False),
'svc__C': (0.1, 1, 10, 100),
}
grid_search = GridSearchCV(pipeline, parameters_pipeline, n_jobs=-1, cv=3)
grid_search.fit(cleaned_train, label_train)
print("--- %0.3fs seconds ---" % (timeit.default_timer() - start_time))
print(grid_search.best_params_)
Stock Price
Prediction
Using Linear Regression
Introduction

• Stock price prediction is a critical task in the financial industry. Using


machine learning models like Linear Regression, we can forecast future
prices based on historical data, identify trends, and support strategic
investment decisions.
Supervised Learning

• Supervised learning is a type of machine learning where the model is


trained using labeled data. The algorithm learns the mapping between
input features and the output (target) value.

• Linear Regression is a common example of supervised learning.


What is Linear Regression?

• Linear Regression is a supervised learning algorithm that models the


relationship between a dependent variable and one or more independent
variables. It fits a straight line through the data to predict future values based
on trends.
• The first thing we think of is linear regression. It explores the
linear relationship between observations and targets and the
relationship is represented in a linear equation or weighted
sum function. Given a data sample x with n features , x1,x2,..,
Xn, represents a feature vector and
• x = (x1,x2, ..., xn)), and weights (also called coefficients) of the
linear regression model w (w represents a vector
(w1,w2,.....,wn)), the target y is expressed as
Formula: Y = w0 +w1x1+w2x2+……+wnxn = (w^T)*x
CODE
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns from sklearn.linear_model
import LinearRegression from sklearn.model_selection
import train_test_split from sklearn.metrics
import mean_squared_error, r2_score

df = pd.read_csv("NSE-TATAGLOBAL.csv")
df.columns = df.columns.str.strip
print(df.columns)
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True) df.set_index('Date', inplace=True)
df.dropna(inplace=True)
plt.figure(figsize=(14, 6))
plt.plot(df['Close'], color='blue’)
plt.title('TATAGLOBAL Closing Price Over Time’)
plt.xlabel('Date') plt.ylabel('Closing Price (INR)’)
plt.grid(True)
plt.show()
df['Open-Close'] = df['Open'] - df['Close’]
df['High-Low'] = df['High'] - df['Low’]
df['7day MA'] = df['Close'].rolling(window=7).mean()
df['14day MA'] = df['Close'].rolling(window=14).mean()
df.dropna(inplace=True)

df['Target'] = df['Close'].shift(-1)
df.dropna(inplace=True)
X = df[features]
y = df['Target’]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False
print("Training Set:", X_train.shape)
print("Test Set:", X_test.shape)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"R2 Score: {r2:.2f}")
print(f"RMSE: {rmse:.2f}")

OUTPUT-
R2 Score: 0.99
RMSE: 0.94
plt.figure(figsize=(14, 6))
plt.plot(y_test.index, y_test, label='Actual', color='blue’)
plt.plot(y_test.index, y_pred, label='Predicted', color='orange’)
plt.title("Actual vs Predicted Stock Prices")
plt.xlabel("Date")
plt.ylabel("Price (INR)")
plt.legend()
plt.grid(True)
plt.show()
next_day_input = X.tail(1)
next_day_prediction = model.predict(next_day_input)
print(f"Predicted next day's closing price: ₹{next_day_prediction[0]:.2f}")

OUTPUT-
Predicted next day's closing price: ₹155.44
Applications of Stock Prediction
using Linear Regression

1. Investment Decision Support


2. Trend Analysis
3. Risk Management
4. Portfolio Optimization
5. Algorithmic Trading
6. Stock Screening
Decision tree regression
After linear regression, the next regression algorithm we will be learning is decision tree
regression, which is also called regression tree.In classification, the decision tree is
constructed by recursive binary splitting and growing each node into left and right children.
In each partition, it greedily searches for the most significant combination of features and
its value as the optimal splitting point.
The quality of separation is measured by the weighted purity of labels of two resulting
children, specifically via metric Gini impurity or information gain. In regression, the tree
construction process is almost identical to the classification one, with only two differences
due to the fact that the target becomes continuous:
The quality of the splitting point is now measured by the weighted mean squared error
(MSE) of two children; the MSE of a child is equivalent to the variance of all target values
and the smaller the weighted MSE, the better split.The average value of targets in a terminal
node becomes the leaf value, instead of the majority of labels in a classification tree.

You might also like