0% found this document useful (0 votes)
28 views49 pages

J1(SkillDzire)

Uploaded by

saiprajwal7244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views49 pages

J1(SkillDzire)

Uploaded by

saiprajwal7244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

SKILL DZIRE

MACHINE LEARNING
INTERNSHIP
An Internship Report Submitted at the end of seventh semester

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted By
Y SAI PRAJWAL
(21981A05J1)
Under the esteemed guidance of
Srikanth Muppala

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

RAGHU ENGINEERING COLLEGE


(AUTONOMOUS)
Affiliated to JNTU-Gurjada, Vizianagaram
Approved by AICTE, Accredited by NBA,NAAC with ‘A+’
grade
2024-2025
RAGHU ENGINEERING COLLEGE
(AUTONOMOUS)
Affiliated to JNTU-Gurjada, Vizianagaram
Approved by AICTE, Accredited by NBA,NAAC with ‘A+’
grade

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


CERTIFICATE
This is to certify that this project entitled “Machine learning” done by

“Y Sai Prajwal(21981A05J1)”, is a student of B.Tech in the Department of


Computer Science and Engineering, Raghu Engineering College. During the period 2021-
2025,in partial fulfilment for the award of the degree of Bachelor of Technology in
Computer Science and Engineering to the Jawaharlal Nehru Technological University,
Gurjada, Vizianagaram is a record of bonafide work carried out under my guidance and
supervision.

The results embodied in the internship report have not been submitted to any
other
University or Institute for the award of any Degree.
INTERNAL GUIDE HEAD OF DEPARTMENT
P Mounika, Dr. R.Sivaranjani
Assistant Professor Professor
Dept of CSE, Dept of CSE,
Raghu Engineering College Raghu Engineering College
Dakamarri(v),Visakhapatnam. Dakamarri(v),Visakhapatnam.

EXTERNAL EXAMINER
I
DISSERTATION APPROVAL

SHEET This is to certify that the dissertation


titled MACHINE LEARNING

BY
Y SAI PRAJWAL
(21981A05J1)
Is approved for the degree of Bachelor of
Technology

P Mounika
Project Guide
(Assistant Professor)

Internal Examiner

External Examiner

Dr.R.Sivaranjani
HOD
(Professor)

Date:
II
DECLARATION
This is to certify that this internship titled "Machine Learning" is bonafide work done by
me, impartial fulfilment of the requirements for the award of the degree B.Tech
and submitted to the Department of Computer Science and Engineering,
Raghu Engineering College, Dakamarri.

I also declare that this internship is a result of my own effort and that has not been copied
from anyone and I have taken only citations from the sources which are mentioned in the
references.

This work was not submitted earlier at any other University or Institute for the reward of
any degree.

Date: Y SAI PRAJWAL

Place: (21981A05J1)

I
CERTIFICATE

IV
ACKNOWLEDGEMENT
I express sincere gratitude to my esteemed Institute "Raghu Engineering
College",
which has provided us an opportunity to fulfill the most cherished desire to reach my goal.

I take this opportunity with great pleasure to put on record our ineffable
personal indebtedness to Mr. Raghu Kalidindi, Chairman of Raghu Engineering
College for providing necessary departmental facilities.

I thank the Principal Dr.Ch.Srinivasu of “Raghu Engineering College”, for providing


the requisite facilities to carry out projects on campus. Your expertise in the subject matter
and dedication towards our project have been a source of inspiration for all of us.

I sincerely express our deep sense of gratitude to Dr.R. Sivaranjani, Professor, Head of
Department in Department of Computer Science and Engineering, Raghu Engineering
College, for her perspicacity, wisdom and sagacity coupled with compassion and patience.
It is my great pleasure to submit this work under her wing. I thank you for guiding us for
the successful completion of this project work.

I would like to thank the team of Srikanth Muppala for providing the technical guidance
to carry out the module assigned. Your expertise in the subject matter and
dedication towards our project have been a source of inspiration for all of us.

I extend my deep hearted thanks to all faculty members of the Computer Science
department for their value based imparting of theory and practical subjects, which were
used in the project.

Regards
Y SAI PRAJWAL

(21981A05J1)

V
TABLE OF CONTENTS

1. Introduction 1

2. Module-1: Python Introduction 2

3. Module-2: OOPs in Python


4

4. Module-3: Python ML libraries


10

5. Module-4: Data Analysis


13

6. Module-5: Introduction to ML
15

7. Module-6: Applications of ML
19

8. Module-7: Types of ML
23
9. Module-8: Computer Vision,CNN
27
10. Module-9:Algorithms(YOLO)
31
11. Module-10:NLP processing
33
12.
Annexure
38
13.
Conclusion 41
INTRODUCTION
Machine learning (ML) is a subset of artificial intelligence (AI) focused on
developing algorithms that enable computers to learn from and make decisions based on
data. Unlike traditional programming, where a system follows explicit instructions,
machine learning allows systems to identify patterns, adapt to new data, and improve
performance over time without being explicitly programmed.

The need for machine learning arises from the exponential growth of data and the
complexity of extracting meaningful insights from it. In today's data driven world,
organisations generate vast amounts of information from various sources, such as social
media, sensors, and transaction logs. Machine learning algorithms can process and analyse
this data to uncover hidden patterns, predict future trends, and optimise decision making
processes.

By leveraging machine learning, industries ranging from healthcare to finance can achieve
significant advancements. For instance, ML models can enhance diagnostic accuracy
in medical imaging, detect fraudulent activities in financial transactions, and
personalise customer experiences in ecommerce. As a result, machine learning is
increasingly becoming a critical tool for innovation and competitiveness in the modern
technological landscape.

In this internship the prominent concepts of machine learning are explained


and demonstrated using relevant software tools and latest algorithms. We are
fundamentally going to use Python as a programming language to interact with the data
and visualise the data. The internship is divided into modules.

1
MODULE 1: INTRODUCTION TO PYTHON

OBJECTIVES:
1.Introduction to python

2.Data types of python

3.Applications of Python

4.Conditional statements

5.Python functions

1.Lambda function

2. Kwargs function

INTRODUCTION TO PYTHON:
Python is an interpreted, highlevel, dynamically typed programming language known for
its emphasis on code readability and simplicity. It features a multi paradigm
approach, supporting procedural, object oriented, and functional programming styles.
Python is implemented using CPython, which is written in C, and it utilises
automatic memory management and a builtin garbage collector. The language
provides extensive standard libraries and frameworks, making it suitable for a wide range
of applications, from web development to data analysis and artificial intelligence. Python
adheres to the principles outlined in the Zen of Python, promoting simplicity and clarity in
coding practices.

The vibrant community contributes to a rich ecosystem of third party libraries and
frameworks, particularly for data science and machine learning. Python's cross platform
compatibility and integration capabilities make it suitable for diverse environments
and applications.

2
DATA TYPES IN
PYTHON
1.Integers/Floating point numbers(Real numbers)/Complex numbers
2.Strings
3.Lists
4.Dictionary
5.Sets(Maps)
For the development of the machine learning models we will be particularly using the lists,
dictionaries and sets

Dictionary: It’s a pairwise collection of data, where the there is key value pairs ex:
dict={“hi”:”hello”,1:”next”}, here keys need to be unique and can be of any data type

Lists: it’s a linear sequential collection of data of any data type,which have
methods associated to it, where it can be iterated, sorted, appended , l=[2,3,”char”]

Sets: It’s a data type where it’s possible to have only unique values,we can perform all the
set operations of union,intersection and difference on two sets. No duplicate values
are tolerated, and the values are not sequentially arranged in memory.

Applications of Python: With its ability to perform cross platform python is used to create
web applications as well as to analyse data and work on the machine learning and
furthermore.

Functions: Python functions, take parameters and work on the logic, Functions are
initiated only through function calls making it modular for operations, with discreet
allocation of memory, it’s functional for operation

Lambda Functions: These are special functions,these are anonymous functions where the
logic is distributed to the iterable parameter of the lambda functions

3
MODULE 2: OOPS IN PYTHON
Objective:
1. OOPs concept in Python
● Pillars of OOPs and examples
ObjectOriented Programming (OOP) in Python is a programming paradigm
centred around objects, which encapsulate data and behaviour. The four fundamental
principles of OOP are:

1.Encapsulation: Bundling data (attributes) and methods (functions) that operate on the
data into a single unit, or object. This promotes data hiding and protects the integrity of the
object's state.

2.Abstraction: Simplifying complex systems by exposing only the relevant attributes and
methods, allowing users to interact with objects without needing to understand their
internal workings.

3.Inheritance: Enabling a new class (subclass) to inherit attributes and methods from an
existing class (superclass), promoting code reuse and hierarchical_relationships.

4.Polymorphism: Allowing different classes to be treated as instances of the same class


through a common interface, typically achieved via method overriding or operator
overloading.

In Python, classes are defined using the `class` keyword, and objects are instantiated from
these classes. OOP facilitates modularity, code reusability, and easier maintenance in
software development.

OOPs concepts can be implemented as the following code below, where we class
has
functions(methods) and data(attributes) associated with it.

4
Encapsulation: Encapsulation in Python, is the feature of hiding classes(objects) and its
attributes and methods within other classes, this controls the access of the attributes and
methods outside the class.

Snippet-1: Demonstration of encapsulation.


Here the attribute balance cannot be accessed directly by the account object as there
it’s a private attribute and trying to access gives an AttributeError. This is the
encapsulation property.

5
Inheritance:Inheritance in Python allows a class to inherit attributes and methods from
another class, promoting code reuse and establishing a hierarchical relationship between
classes. Here’s an overview of the types of inheritance possible in Python.

Single Inheritance: One parent, one child. Multiple

Inheritance: One child, multiple parents. Multilevel

Inheritance: Grandparent →Parent →Child.

Hierarchical Inheritance: One parent, multiple children.

Hybrid Inheritance: Combination of multiple inheritance types.


But there is ambiguity about the multiple inheritance when there are the same methods in
parent classes which are inherited by the Child clas and there are no classifiers like private
and protected like in C++.

Single or Simple Inheritance: This is when there is a single Parent class the Child class
inherits, this can be demonstrated as the following.

Snippet-2: Demonstration of Inheritance in Python

6
Abstraction:Abstraction concept is the idea of having a template,blueprint for
classes which could be reused to create more classes where the abstracted base class is
inherited to access abstract methods, in Python we have to import the library
abc(Abstract Base Class), Abstraction can be explained as following example.

Area:50 Area:154

Snippet-3: Demonstration of Abstraction in Python

Shape is abstract base class and it’s inherited by Rectangle class and Circle class, and the
abstract methods area and perimeter are appended.

7
Polymorphism: Polymorphism in Python allows different classes to be treated as
instances of the same class through a common interface. It typically manifests in two ways:
through method overriding and method overloading. However, Python does not support
traditional method overloading (as seen in some other languages), but you can
achieve similar functionality using default arguments or variable arguments.

There are types of polymorphism

1.Method overloading

2.Method overriding

3.Polymorphism in collections

Method overloading: Method overriding occurs when a subclass provides a specific


implementation of a method that is already defined in its superclass. This allows a subclass
to define its behaviour.

Snippet-4:Demonstration of Method overloading in Python

8
Method overriding: This method is when we are going to change the logic of the method
that is being inherited without changing the name of the method, Here the Animal class’s
method sound is overridden by class Dog and Cat

Snippet-5: Demonstration of Method Overriding in Python

9
MODULE 3: PANDAS,NUMPY LIBRARIES INTRODUCTION
Objective:

1. Introduction to Numpy
2. Introduction to Pandas
3. Applications of these Libraries

NumPy

Introduction:

NumPy (Numerical Python) is a fundamental library for scientific computing


in Python. It provides support for arrays, matrices, and a variety of
mathematical functions to operate on these data structures.

Key features include:


○ Ndimensional arrays: NumPy's primary feature is its powerful
Ndimensional array object called ndarray, which allows for efficient storage
and manipulation of large datasets.
○ Mathematical functions: NumPy offers a wide range of mathematical
functions for operations like linear algebra, Fourier transforms, and statistical
calculations.

Use in Machine Learning:

Data Manipulation: NumPy arrays are often used to represent datasets, enabling
efficient mathematical computations.

Performance: NumPy is optimised for performance and memory efficiency,


making it ideal for handling large datasets commonly encountered in machine
learning.

Preprocessing: Many machine learning algorithms require data preprocessing (e.g.,


normalization, standardization), which can be efficiently implemented
using NumPy.

1
The demonstration of the this library can be seen here

Snippet-6:Demonstration of Numpy library

Pandas

Introduction:
Pandas is a powerful data manipulation and analysis library that provides data
structures like Series and DataFrames, which are built on top of NumPy.

Key features include:

○ DataFrames: Twodimensional labelled data structure, similar to a


spreadsheet or SQL table, which allows for easy data manipulation.
○ Data cleaning and preparation: Tools for handling missing data, filtering,
and reshaping datasets.

Use in Machine Learning:

Data Handling: Pandas is often used for data loading, cleaning, and preparation,
which are critical steps in the machine learning workflow.

Exploratory Data Analysis (EDA): Provides functionalities to analyze and


visualize data, helping to understand patterns and insights before applying
machine learning models.

1
Feature Engineering: Pandas makes it easy to create, transform, and manipulate
features that can be used as inputs to machine learning algorithms.

The demonstration of this library can be seen as the following

Snippet-7: Demonstration of frames in Pandas

Why Pandas when we are having Numpy?

While NumPy is excellent for numerical computing and is very efficient for operations on
homogeneous data, Pandas is designed for data analysis and manipulation, especially with
heterogeneous, labelled data. If you're working with structured data (like tables) or
performing data analysis tasks, Pandas is often the more convenient and powerful choice.
In practice, you might use both libraries together: leveraging NumPy for numerical
computations and Pandas for data manipulation and analysis.

1
MODULE 4: DATAANALYSIS
Objective:

1.Function

2.Formula

3.Charts

4.Pivots
Data analysis is an important primary task to understand basic statistical parameters which
could give us attributes and parameters to consider to predict the movement of the overall
data in any period of time.

Function: In Excel, a function is a predefined formula that performs calculations or


operations on data. Functions can take one or more arguments (inputs) and return a single
result. They are used to simplify complex calculations and can operate on numbers, text,
dates, and other data types.

1.SUM
2.AVG
3.MAX
4.CONCATENATE
Formula:
In Excel, a formula is an expression that performs calculations on values in your
spreadsheet. Formulas can include functions, operators, cell references, and
constants. They allow you to perform a wide variety of mathematical and logical
operations on your data.

1
The following are the basic demonstration of formula in Excel

=A1 + B1 // Adds the values in cells A1 and B1

=C1 D1 // Subtracts the value in D1 from C1

=E1 * F1 // Multiplies the values in E1 and F1


=G1 / H1 // Divides the value in G1 by H1

Operators and functions are part of Formula

Charts:

Charts in Excel are powerful tools for visually representing data, making it easier
to identify trends, patterns, and comparisons. Excel offers a variety of chart types, each
suited for different types of data analysis. The types of charts:

1.Column Chart

2.Bar chart

3.Pie Chart

4.Line Chart

5.Scatter Plot

Excel acts as a primary software tool which could help with the raw data and with it’s
manipulation of data using some of the prominent features mentioned above useful
inferences can be reproduced, to predict the possibility of the near future and that
prediction falls under the category of machine learning which is unravelled in the next
module

1
MODULE 5: INTRODUCTION TO MACHINE LEARNING
Objective:
1.Introduction to machine learning

2.Evolution of machine learning

3.Why Python? Libraries and Frameworks for ML

Machine learning:

Machine learning is a subset of artificial intelligence that focuses on developing algorithms


that enable computers to learn from and make predictions or decisions based on
data. Instead of being explicitly programmed for a task, machine learning systems improve
their performance as they are exposed to more data over time.

Depending on the outcomes and data, machine learning is divided into the
following categories

1.Supervised Learning The model is trained on labelled data, meaning the input data is
paired with the correct output.

Classification: Assigning categories to data points (e.g., email spam detection).

Regression: Predicting continuous values (e.g., house price prediction).

2.Unsupervised Learning: The model is trained on unlabeled data and must find patterns
or structures within the data on its own.

Clustering: Grouping similar data points (e.g., customer segmentation).

Dimensionality Reduction: Reducing the number of features in a dataset (e.g., PCA


Principal Component Analysis).

3.Reinforcement Learning: The model learns by interacting with an environment


and receiving feedback in the form of rewards or penalties. It aims to maximize cumulative
rewards over time.

Examples:

1
Training agents to play games (e.g., AlphaGo).

Robotics, where a robot learns to navigate and perform tasks through trial and error.
These types of machine learning cater to different kinds of problems and datasets, each
with its unique methodologies and applications.

Evolution of Machine learning:

1. 1950s Early Concepts:


1950: Alan Turing proposes the Turing Test, introducing ideas about machine intelligence.

1957: Frank Rosenblatt develops the Perceptron, an early neural network model.

2. 1960s Symbolic AI:


Focus shifts to symbolic approaches and rule based systems, with less emphasis on
learning from data.

3. 1980s Neural Networks Resurgence:


The backpropagation algorithm is popularised, leading to renewed interest in neural
networks and their capabilities.

4. 1990s The Rise of Statistics:


Statistical approaches gain traction with algorithms like decision trees and support vector
machines, emphasising data driven methods.

5. 2000s Big Data and Advancements:

Growth of the internet leads to the availability of large datasets.


Algorithms improve in efficiency and capability, with techniques like ensemble methods
and boosting.

6. 2010s Deep Learning Boom:

Introduction of deep learning, driven by advancements in neural network architectures and


increased computational power (GPUs).

Achievements in image and speech recognition, natural language processing, and game
playing AI.

7. 2020s Widespread Adoption:

1
Machine learning becomes mainstream across various sectors, including healthcare,
finance, and autonomous systems.

Focus on explainability, fairness, and ethical considerations in AI.


Tools required to get the understanding of Machine Learning:

1.Mathematics: Statistics,Linear Algebra,Calculus

2.Programming Languages: Python, R

3.Tools and Libraries:Pytorch, TensorFlow,Scikit,NumPy,Pandas

4.Data Visualization tools: Seaborn, MatPlotLib,Tableau

5.Version Control: Git

Why Python is chosen and what are the libraries and toolkits that are used:

Python is generally much preferred because of its readability of the code and its simplicity
makes it user friendly and also makes it less intimidating to new learners, and its strong
community keeps this language relevant to this field.

Libraries which are used here in this are NumPy and Pandas which are already discussed in
the module 3.MatPlotLib and Scikit are libraries of python where we are using it for data
visualization.
Frameworks like TensorFlow and PyTorch:

TensorFlow: TensorFlow is an open source machine learning framework developed


by Google. It provides a comprehensive ecosystem for building and deploying
machine learning models , particularly deep learning models.

Flexibility: TensorFlow supports various machine learning tasks, including supervised and
unsupervised learning, reinforcement learning.

HighLevel APIs: TensorFlow includes high level APIs like Keras, which make it easier to
build and train neural networks without dealing with low level details.

Scalability: It can run on various platforms, from mobile devices to large scale distributed
systems, making it suitable for both research and production environments.

Tensor Operations: The core of TensorFlow is based on tensors, which are


multidimensional arrays. Tensor operations are optimized for performance, allowing
efficient computations.

1
PyTorch: PyTorch is an open source machine learning library developed by Facebook’s
AI Research lab. It’s particularly popular for its dynamic computation graph and ease of
use, making it a favourite among researchers and practitioners in deep learning. Here are
some of its key features and uses:

Key Features of PyTorch:

1. Dynamic Computation Graphs: Allows for changes to be made on the fly during
model training, making it easier to debug and experiment.
2. Easy to Use: PyTorch's syntax is intuitive and resembles standard Python, making
it accessible for beginners.
3. Rich Ecosystem: Supports a wide range of libraries for tasks like computer vision
(TorchVision) and natural language processing (TorchText).
4. Strong Community Support: A growing community that contributes to a variety of
tutorials, resources, and libraries.

Though both the frameworks serve similar purposes, these are different in the functionality
and the learning curves. Trends of this Frameworks are as the following in the past 5 years

Fig1: Graph demonstrating the trends of Pytorch and TensorFlow

MODULE 6: APPLICATIONS OF MACHINE LEARNING


Objectives:

1
1.Machine Learning Responsibilities

2.Application of Machine Learning

3.Process of machine learning

4.Popular Algorithms in ML

Machine Learning Responsibilities:

These are the results and responsibilities that are expected of Machine Learning

1. Improved DecisionMaking

2. Automation of Tasks

3. Personalization

4. Predictive Analytics

5. Advancements in Healthcare

6. Natural Language Processing

7. Autonomous Systems

8. Ethical and Fair AI

9.Enhanced Security

10.Research and Development

How are the outputs of Machine Learning interpreted and can be used to make an inference
on the subject.Machine learning outputs can be conveyed in various forms,
including numbers, graphs, and charts. Here’s how these outputs typically look and how to
interpret them:

1. Numerical Outputs

Predictions: For regression tasks, you might receive a single numerical value, such as a
predicted price (e.g., $250,000 for a house).

1
Classification Probabilities: For classification tasks, outputs may look like:

`Class A: 0.7`

`Class B: 0.3`

This means there’s a 70% probability that the instance belongs to Class A.

For regression, the predicted value can be directly compared to actual values to
assess accuracy.

For classification probabilities, you can set a threshold (e.g., 0.5) to decide which class to
assign.

2. Graphs and Charts

Scatter Plots: Used for regression analysis, displaying actual vs. predicted values.
The closer points are to the diagonal line, the better the model performs. A good fit will
show points clustered near the diagonal, indicating accurate predictions.

Confusion Matrix: For classification problems, a confusion matrix displays true positives,
true negatives, false positives, and false negatives.

From the confusion matrix, you can derive metrics like accuracy, precision, recall, and F1
score:

Accuracy: (TP + TN) / Total

Precision: TP / (TP + FP)

Recall: TP / (TP + FN)

3. Heatmaps

Correlation Matrix: Shows relationships between features.

Strong correlations (close to 1 or 1) can indicate redundant features or potential predictors.


2
Understanding how to interpret these outputs involves recognizing the context of
the problem, evaluating model performance metrics, and analyzing visualizations to
glean insights from the data. Each output type provides different perspectives on model
accuracy, feature importance, and decision making, guiding you in refining models
or making informed business decisions.

Applications of Machine Learning:

1. Automation: It automates tasks that typically require human intelligence, like data
analysis and decision making.
2. Personalization: Used in recommendation systems (like Netflix or Amazon)
to tailor content and products to individual users.
3. Predictive Analytics: Helps in forecasting trends, such as sales predictions, stock
market trends, and customer behaviour.
4. Image and Speech Recognition: Powers technologies in facial recognition, voice
assistants (like Siri and Alexa), and autonomous vehicles.
5. Healthcare: Assists in diagnosing diseases, predicting patient outcomes, and
personalizing treatment plans.
6. Fraud Detection: Identifies unusual patterns and behaviours in financial
transactions, helping to combat fraud.
7. Natural Language Processing: Enables applications like chatbots, translation
services, and sentiment analysis.

Process of Machine Learning: The process of ML can be divided into 7 stages


of refinement and deductions which can seen in the below figure.

2
Fig-2: 7 stages of Machine Learning

Popular Algorithms in ML:


1. Linear Regression

2. Logistic Regression

3. Decision Tree

4. SVM

5. Naive Bayes

6. KNN

7. K-Means

8. Random Forest

9. Dimensionality Reduction Algorithms

10. Gradient Boosting algorithms

MODULE 7: SUPERVISED & UNSUPERVISED LEARNING

2
Objective:

1.Introduction to types of ML

2.Linear Regression

3.Decision Tree

4.Neural Networks

5.Clustering

Introduction to types of Machine Learning:

As discussed in previous modules, machine learning is broadly classified into supervised


and unsupervised learning and this classification is made on the basis of the inputs that are
fed into the training data.

Supervised Learning: This is the type of learning where there are labelled inputs mapping
to labelled defined outputs, where we know all the possible co-domains of the knowledge.
Supervised learning provides a singular output for the model trained under it.

Supervised learning is divided into Regression and Classification.

Classification:These algorithms are used when the output variable is categorical (discrete).
The goal is to assign inputs to one of several predefined categories.

Regression:These algorithms are used when the output variable is continuous. The goal is
to predict a numerical value.

Common Regression Algorithms:

Linear Regression: Models the relationship between dependent and independent


variables using a linear equation.

Polynomial Regression: Extends linear regression by modeling relationships as a


polynomial.

Ridge and Lasso Regression: Regularized versions of linear regression to prevent


overfitting.

Support Vector Regression (SVR): A regression version of SVM that predicts


continuous values.
2
Decision Tree Regression: A tree-based model for predicting continuous
outcomes.

Random Forest Regression: An ensemble of decision trees for regression tasks.

Common Classification Algorithms:

Logistic Regression: Used for binary classification problems.

Decision Trees: Tree-like models that make decisions based on feature values.

Random Forest: An ensemble of decision trees that improves accuracy and reduces
overfitting.

Support Vector Machines (SVM): Finds the optimal hyperplane that


separates different classes.

K-Nearest Neighbors (KNN): Classifies based on the majority class among the k
nearest neighbours in the feature space.

Naive Bayes: A probabilistic classifier based on Bayes' theorem,


assuming independence among predictors.

Neural Networks: Used for complex classification tasks, particularly in deep


learning.
Unsupervised learning: Unsupervised learning is a type of machine learning where the
model is trained on data without labelled outputs. The goal is to identify patterns,
structures, or relationships within the data. Here are the main types of unsupervised
learning techniques:

1. Clustering: Clustering algorithms group similar data points together based on


their features. Common clustering methods include:

K-Means Clustering: Partitions data into k clusters by minimizing the variance


within each cluster.

Hierarchical Clustering: Builds a tree of clusters by either merging or splitting


existing clusters based on distance metrics.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise):


Groups together points that are closely packed while marking as outliers points that
lie alone in low-density regions.
2
2. Dimensionality Reduction

These techniques reduce the number of features in a dataset while preserving its essential
characteristics.

3. Association Rule Learning

This technique identifies relationships between variables in large datasets, often used in
market basket analysis. Common algorithms include:

Apriori Algorithm: Finds frequent itemsets and generates association rules based
on support and confidence.

Eclat Algorithm: A more efficient algorithm for finding frequent itemsets that
uses a depth-first search strategy.

4. Anomaly Detection

Anomaly detection algorithms identify outliers or unusual patterns in data. These can be
useful in fraud detection, network security, and quality control. Techniques include:

Isolation Forest: An algorithm that isolates anomalies based on their


characteristics.

One-Class SVM: A support vector machine variant that identifies outliers by


modelling the normal data points.

5. Self-Organizing Maps (SOM)

These are a type of artificial neural network that uses unsupervised learning to produce a
low-dimensional representation of the input space, helping to visualize and understand the
structure of high-dimensional data.

Linear Regression: Linear regression is one of the simplest and most commonly
used algorithms in machine learning for predictive modelling. It is used to establish
a relationship between one or more independent variables (features) and a
continuous dependent variable (target). Here’s a comprehensive overview:
2
The basic idea of linear regression is to find a linear relationship between the input features
and the target variable.

The relationship can be represented mathematically as:

y=β0+β1x1+β2x2+…+βnxn +ϵ

Here y is the predicted dependent variable and xi, i>0 are the independent variables.

Sample Project: This is to predict the house price prediction using linear regression

Requirements: Python Jupyter, 4-8GB RAM PC , Excel.

Libraries:NumPy,Pandas, MatPlotlib,sklearn,seaborn

References: Modelling House Price Prediction using Regression Analysis and


Particle
Swarm Optimization Case Study : Malang, East Java, Indonesia (IJSCSA).

Output: we run test data test.csv to check the prediction accuracy of the linear regression.

Neural Networks: Neural networks are a class of machine learning models inspired by the
structure and function of the human brain. They are designed to recognize patterns and
make decisions based on input data.

Types of Neural networks:

1.Feedforward Neural Networks

2.Convolutional Neural Networks (CNNs)

3,Transformers

MODULE 8: COMPUTER VISION AND CNN

(CONVOLUTION NEURAL NETWORK)

2
Objective:

2
1.Computer Vision

2.CNN

3.Process CNN

4.Applications of Computer Vision and CNN

Computer Vision:

Computer Vision is a field that enables machines to interpret and understand


visual information from the world. In machine learning, computer vision algorithms are
used to analyze and make decisions based on images and videos.

OpenCV (Open Source Computer Vision Library) is a powerful library designed


specifically for computer vision tasks. It provides numerous tools and functions for image
and video processing, including:

● Image manipulation (filtering, transformations).


● Feature detection (edges, corners, key points).
● Object tracking.
● Machine learning functionalities specifically tailored for computer vision.

Key Features:

● Efficiency: Optimized for real-time applications.


● Cross-Platform: Available on multiple operating systems.
● Wide Range of Functions: From basic image processing to advanced computer
vision techniques.

Convolutional Neural Networks: CNNs are a specialized type of neural network


designed for processing structured grid data, such as images. They have become the
go-to architecture for image recognition and computer vision tasks due to their
ability to automatically learn spatial hierarchies of features.Great made it till here.

2
A CNN typically consists of several layers, each designed to perform specific functions.
The main components of a CNN include:

1. Convolutional Layer:
○ Function: Applies convolution operations to the input data using filters
(kernels). Each filter detects specific features, such as edges or textures.
○ Output: Produces feature maps that represent the presence of certain
features in the input.
2. Activation Function:
○ Commonly uses the Rectified Linear Unit (ReLU) activation function, which
introduces non-linearity to the model.
3. Pooling Layer:
○ Function: Reduces the dimensionality of the feature maps (downsampling)
while retaining important information. Common pooling methods
include max pooling and average pooling.
○ Benefit: Helps to make the network invariant to small translations in
the input.
4. Fully Connected Layer:
○ After several convolutional and pooling layers, the network typically flattens
the feature maps and connects them to a fully connected layer, where each
neuron is connected to all neurons in the previous layer.
○ Function: Makes the final classification based on the learned features.
5. Output Layer:
○ Produces the final predictions, often using a softmax function for multi-class
classification tasks.

2
Fig 3: Demonstration of working CNN

CNN Algorithms: While the structure of CNNs can vary, the most common
training algorithm used is backpropagation in conjunction with gradient descent. Here's
a brief outline of the process:

1. Forward Pass: Input data is passed through the layers, and the output is computed.
2. Loss Calculation: The output is compared to the actual label using a loss function
(e.g., cross-entropy loss).
3. Backward Pass: The gradients of the loss with respect to the weights are
computed using backpropagation.
4. Weight Update: The weights are updated based on the computed gradients and a
learning rate, typically using optimization algorithms like Stochastic Gradient
Descent (SGD) or Adam.

Application of CNN:

Image Classification: Classifying images into categories (e.g., identifying objects in


photos).

Object Detection: Locating and identifying objects within an image (e.g., bounding boxes
around detected objects).

3
Image Segmentation: Dividing an image into segments to classify each part (e.g.,
identifying different regions in a medical image).

Facial Recognition: Recognizing and verifying faces in images

Applications of computer vision:

1. Image Classification: Identifying the category of objects in an image (e.g.,


classifying images of animals).
2. Object Detection: Locating and identifying multiple objects within an image (e.g.,
detecting cars and pedestrians in a street scene).
3. Image Segmentation: Dividing an image into segments to classify each part (e.g.,
identifying different tissues in a medical image).
4. Facial Recognition: Identifying and verifying individuals based on facial features.
5. Activity Recognition: Understanding and classifying actions in video sequences
(e.g., recognizing gestures or sports activities).

Relation between Computer Vision and CNN:

Computer vision is integration of hardware and software solutions to real world problems
using the CNNs as the logic to achieve the goals of recognition and image processing. In a
way CNNs are subsets of Computer Vision.

3
MODULE 9: REAL TIME OBJECT DETECTION USING
YOLO,YOLOV3 ALGORITHM
Objective:

1.Computer vision application

2.Real time object detection yolo algo

3.Process of detection
Computer vision application: Real time of detection and recognition are quite valuable in
the aspect of business applications. These are the applications of computer vision. To
achieve this there are certain algorithms to recognize/detect. The efficiency of computer
vision is the latency it takes to recognise and detect.

Traditional Methods

Haar Cascades: A machine learning object detection method that uses a cascade of
classifiers to identify objects based on features derived from Haar-like
features. Commonly used for face detection.

Sliding Window: A method that involves moving a fixed-size window across the
image to detect objects at different scales. It is computationally expensive and has
largely been replaced by more efficient methods.

Deep Learning Methods

R-CNN (Regions with CNN features): A pioneering approach that combines


region proposal methods with CNNs. It generates region proposals and then
classifies them using CNNs.

YOLO (You Only Look Once): A real-time object detection system that frames
detection as a single regression problem, predicting bounding boxes and class
probabilities directly from images. It is known for its speed and efficiency.

3
Bounding Box: coordinates (usually in the form of x, y, width, and height) and is drawn
around an object of interest in an image. It indicates the position and extent of the object.

YOLO(You only look once): Prior detection systems repurpose classifiers or localizers to
perform detection. They apply the model to an image at multiple locations and scales. High
scoring regions of the image are considered detections. YOLO uses a totally
different approach.

Apply a single neural network to the full image. This network divides the image
into regions and predicts bounding boxes and probabilities for each region. These
bounding boxes are weighted by the predicted probabilities.It looks at the whole image at
test time so its predictions are informed by global context in the image.

YOLO uses the Darknet network. The YOLO network splits the input image into a grid of
SxS cells.

Each grid cell predicts B number of bounding boxes and their objectness score along with
their class predictions. Coordinates of B bounding boxes-YOLO predicts 4 coordinates for
each bounding box (bx,by,bw,bh) with respect to the corresponding grid cell.

Fig 4: Static Image Fig 5: Sectioned image

Objectness Score(P0) indicates the probability of the presence of an object in the given
grid(rectangle). YOLO is quite fast to realize the objects in the captured images. Versions
of YOLO algorithms are used in self-driving cars, back cameras to detect the space for
parking.

3
MODULE 10: NATURAL LANGUAGE PROCESSING (NLP)
Objective:
1.Introduction to NLP

2.Subsets of NLP

3.Algorithms of NLP

4.Working of NLP

Introduction to Natural Language Processing(NLP): NLP is a subfield of


artificial intelligence and machine learning focused on the interaction between
computers and humans through natural language. The goal of NLP is to enable machines
to understand, interpret, generate, and respond to human language in a meaningful way.
NLP is the whole process of turning unstructured data into structured data.

Fig-6:Subset Structure of NLP

NLP can be expressed as the two technologies, where the common language is divided into
structured logical relationships among the sentences which could help the computer predict
the sentences and improve its model. The ability to understand natural language paves a
path to more inventions such as chat bots, voice recognition and voice assistants.

3
Fig-7: Demonstration of NLP

Types of Machine Learning in NLP:It primarily involves supervised and unsupervised


learning

Supervised Learning: Used for tasks where labelled data is available (e.g., text
classification, sentiment analysis).

Unsupervised Learning: Used for tasks without labelled data (e.g.,


topic modelling, clustering).

Natural Language Understanding (NLU)

NLU refers to the ability of a computer system to comprehend and interpret


human language in a meaningful way. It involves several tasks, including:

1. Intent Recognition: Understanding the user’s intention behind a piece of text or


speech (e.g., identifying whether a user wants to book a flight or check the
weather).
2. Entity Recognition: Identifying and classifying key entities within the text (e.g.,
names, dates, locations).
3. Sentiment Analysis: Determining the emotional tone or sentiment expressed in the
text (e.g., positive, negative, neutral).

3
4. Contextual Understanding: Grasping the context in which words or phrases are
used to derive meaning, often involving disambiguation of words with
multiple meanings.
5. Parsing: Analyzing the grammatical structure of sentences to understand their
components and relationships.

Natural Language Generation (NLG)

NLG focuses on producing human-like text based on given inputs or data. It


involves several tasks, including:

1. Text Generation: Creating coherent and contextually relevant text from structured
data or prompts (e.g., generating news articles, summaries, or conversational
responses).
2. Template-Based Generation: Using predefined templates to produce
structured text (e.g., generating reports or product descriptions based on specific
parameters).
3. Dialogue Generation: Crafting responses in a conversational setting, such as
in chatbots or virtual assistants, ensuring the output is contextually relevant
and engaging.
4. Content Creation: Generating creative content such as stories, poems, or
social media posts.

Relationship Between NLU and NLG

Complementary Roles: NLU and NLG often work together in applications


like conversational agents. NLU processes the user's input to understand their intent
and extract relevant information, while NLG generates appropriate and
coherent responses.

Feedback Loop: Insights gained from NLU can inform NLG models,
allowing them to generate more contextually relevant and accurate outputs.

3
Algorithms Used in NLP

1. Traditional Algorithms:
○ Naive Bayes: Often used for text classification and sentiment analysis.
○ Support Vector Machines (SVM): Effective for classification tasks.
○ Decision Trees: Can be used for text classification.
2. Vectorization Techniques:
○ Bag of Words (BoW): Represents text as a collection of words disregarding
grammar and order.
○ TF-IDF (Term Frequency-Inverse Document Frequency): Weighs the
importance of words based on their frequency in a document relative to
a collection of documents.
3. Deep Learning Models:
○ Recurrent Neural Networks (RNNs): Suitable for sequential data like text;
often used for language modelling and sequence generation.
○ Long Short-Term Memory (LSTM): A type of RNN that handles long-
term dependencies better, commonly used for tasks like machine translation.
○ Convolutional Neural Networks (CNNs): Can be adapted for text
classification tasks.
○ Transformers: A powerful architecture used in models like BERT and GPT
that excels at understanding context and relationships in text.
4. Pre-trained Models:
○ BERT (Bidirectional Encoder Representations from Transformers):
Designed to understand the context of words in a sentence.
○ GPT (Generative Pre-trained Transformer): Focused on text generation
and conversation.

In Natural Language Processing (NLP), parsing, syntactic analysis, and semantic


analysis are crucial components that help machines understand the structure and meaning
of human language. Since computers can’t understand human language we break it down
into logical structures and for that we use parsing and semantic analysis similar to that of
the compiler.

3
Example of sentence is logically structured:“The cat sat on the mat."

Step-by-Step Breakdown of NLP Processing

1. Tokenization:The sentence is split into individual words or tokens.


○ Output: ["The", "cat", "sat", "on", "the", "mat"]
2. Part-of-Speech Tagging: Each word is tagged with its part of speech (noun, verb,
etc.).
○ Output:
■ "The" - Determiner (DT)
■ "cat" - Noun (NN)
■ "sat" - Verb (VBD)
■ "on" - Preposition (IN)
■ "the" - Determiner (DT)
■ "mat" - Noun (NN)
3. Syntactic Parsing
○ The grammatical structure of the sentence is analyzed, often producing a
parse tree.
○ Output: (S

(NP (DT The) (NN cat))

(VP (VBD sat)

(PP (IN on)

(NP (DT the) (NN mat)))))

ANNEXURE

Linear Regression in Machine Learning:

3
We are to be creating a Machine Learning Linear Regression Model. You will be analysing
a house price prediction dataset for finding out the price of house on different parameters.
You will do Exploratory Data Analysis, split the training and testing data, Model
Evaluation and Predictions.

Problem Statement:A real estate agent wants the help to predict the house price
for regions in the USA. He gave you the dataset to work on and you decided to use the
Linear Regression Model. Create a model which will help him to estimate what the house
would sell for. Dataset contains 7 columns and 5000 rows with CSV extension. The data
contains the following columns :

'Avg. Area Income','Avg. Area House Age' 'Avg. Area Number of Rooms':'Avg.
Area
Number of Bedrooms': 'Area Population''Price''Address'

Requirements: Python Jupyter, 4-8GB RAM PC , Excel.

Libraries:NumPy,Pandas, MatPlotlib,sklearn,seaborn

References: Modelling House Price Prediction using Regression Analysis and


Particle
Swarm Optimization Case Study : Malang, East Java, Indonesia (IJSCSA).

Snippet-8 import libraries

3
Schema of the dataframe:

39
Splitting the data and check the effectiveness of the division through the coefficients, data
with varied information is quite helpful to test the limits of the model.

Conclusion:From the distribution of the data points which if forming the line shape
indicates the data is performing good with the linear regression model.

4
CONCLUSION:
This internship has covered all the prominent features of Machine Learning and its
applications in real life. Machine learning is a subset of Artificial Intelligence and its idea
is built on the emphasis of pattern formation and prediction which is associated with Deep
Learning. Machine Learning is applicable to any form of data, whether it’s image,
numerical data, text, audio, or video, with the mathematical relationship within this digital
media, it’s predictable to manipulate data and understand the underlying patterns. Machine
Learning is one of the earliest discovered concepts which was majorly researched in the
early 1950s to understand patterns for facial recognition to distinguish between male and
female. Machine learning will advance further in its ability to understand information and
make better blueprint of the information without violating the ethics of intelligence and
considering potential biases.

You might also like