0% found this document useful (0 votes)
6 views39 pages

Report Intership Chapters

Shree Drashti Infotech LLP, established in 2019, offers embedded and software solutions, including a free online Python internship aimed at bridging the skills gap for engineering students. The internship focuses on hands-on programming exposure, foundational skills in Python, and industry practices in data analysis and machine learning. The document also outlines the use of various libraries such as NumPy, Pandas, and Scikit-learn for data manipulation and machine learning tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views39 pages

Report Intership Chapters

Shree Drashti Infotech LLP, established in 2019, offers embedded and software solutions, including a free online Python internship aimed at bridging the skills gap for engineering students. The internship focuses on hands-on programming exposure, foundational skills in Python, and industry practices in data analysis and machine learning. The document also outlines the use of various libraries such as NumPy, Pandas, and Scikit-learn for data manipulation and machine learning tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Chapter: 1 Introduction to Industrial Internship

Shree Drashti Infotech LLP is an Embedded and Software solutions providing "StartUp Gujarat
Awarded" Company established in 2019.

The main purpose of Drashti Infotech is to provide an extensive range of services and solutions
in the ground of Embedded System Design and Programing with python, .net, android etc.
Shree Drashti Infotech LLP also offers application-oriented training to students, and thereby
bridging the gap between industry's requirement and students' skill sets.

Python has emerged as a versatile programming language for Embedded Systems, Web
Application Development, Data Science and Artificial Intelligence. Shree Drashti infotech LLP
offers one of the best online free python internship programs for engineering students which
will enable you to launch your career in any of the above-mentioned domains. Rather than
focusing on teaching syntax of Python, this free python internship will make you a hands-on
Python programmer. You will be equipping yourself with strong foundational skills like
Algorithms, problem solving, OO programming by getting hands-on with Python.

GOALS:
 Get hands-on programming exposure in Python programming the way Industry works.
 Build good project based on Python programming by exploring all Python
programming constructs.
 Obtain Government of India approved from AICT Internship
 Lay foundations for a long-term career in Data Science / Machine Learning
 Get exposure in Standard Industry Practices of Data Analyst

1
Chapter: 2 Major Component of Internship

2.1 OOPS CONCEPT


2.2.1 Conditions
Python supports the usual logical conditions from mathematics:
 Equals: a == b
 Not Equals: a != b
 Less than: a < b
 Less than or equal to: a <= b
 Greater than: a > b
 Greater than or equal to: a >= b
If single
If (Condition):
<tab> output
If with else
If (Condition):
<tab> output
else:
<tab> output
If with else if
If (Condition):
<tab> output
elif (Condition):
<tab> output
else:
<tab> output
If bit operator
If (Condition) and (condition):
<tab> output
If (Condition) or (condition):
<tab> output
equal
If (a==10):

2
<tab>output
Not equal
If (a!=10):
<tab>output
2.1.2 Loops
Python has two primitive loop commands:
 while loops
 for loops
while
while (condition):
<tab>output
while with else
while (condition):
<tab>output
else:
<tab>output
while with break
while (condition):
<tab> if (condition):
<tab> <tab> break
<tab>output
for with range
for i in range(end):
<tab>output
for i in range(start,end):
<tab>output
for i in range(start,end,spacing):
<tab>output
for with custom list
list=[“item1”, “item2”,..]
for i in list:
<tab>output
for with continue
list = ["item1", "item2", "item3"]
for x in list:
<tab>if x == "item2":
<tab><tab>continue

3
<tab>print(x)
Operator Meaning Example
+ Add two operands or unary plus x + y+ 2
- Subtract right operand from the left or unary minus x - y- 2
* Multiply two operands x*y
/ Divide left operand by the right one (always results into x/y
float)
% Modulus - remainder of the division of left operand by x % y (remainder
the right of x/y)
// Floor division - division that results into whole number x // y
adjusted to the left in the number line
** Exponent - left operand raised to the power of right x**y (x to the
power y)

2.2 NUMPY LIBRARY


What is NumPy?
NumPy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the fundamental
package for scientific computing with Python. It is open-source software. It contains various
features including these important ones:
 A powerful N-dimensional array object
 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined using Numpy which allows
NumPy to seamlessly and speedily integrate with a wide variety of databases. Installation:
Mac and Linux users can install NumPy via pip command:
pip install numpy
Windows does not have any package manager analogous to that in linux or mac.Please
download the pre-built windows installer for NumPy (according to your system configuration
and Python version). And then install the packages manually.
Arrays in NumPy: NumPy’s main object is the homogeneous multidimensional array.
• It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.

4
• In NumPy dimensions are called axes. The number of axes is rank.
• NumPy’s array class is called ndarray. It is also known by the alias array.

2.3 PANDAS LIBRARY


The following is a list of common tasks along with pandas functions.
Utility Functions
Extract Column Names df.columns
Select first 2 rows df.iloc[:2]
Select first 2 columns df.iloc[:,:2]
Select columns by name df.loc[:,["col1","col2"]]
Select random no. of rows df.sample(n = 10)
Select fraction of random rows df.sample(frac = 0.2)
Rename the variables df.rename( )
Selecting a column as index df.set_index( )
Removing rows or columns df.drop( )
Sorting values df.sort_values( )
Grouping variables df.groupby( )
Filtering df.query( )
Finding the missing values df.isnull( )
Dropping the missing values df.dropna( )
Removing the duplicates df.drop_duplicates( )
Creating dummies pd.get_dummies( )
Ranking df.rank( )
Cumulative sum df.cumsum( )
Quantiles df.quantile( )
Selecting numeric variables df.select_dtypes( )
Concatenating two dataframes pd.concat()
Merging on basis of common variable pd.merge( )

Importing pandas library


You need to import or load the Pandas library first in order to use it. By "Importing a library",
it means loading it into the memory and then you can use it. Run the following code to import
pandas library:

5
import pandas as pd

The "pd" is an alias or abbreviation which will be used as a shortcut to access or call pandas
functions. To access the functions from pandas library, you just need to
type pd.function instead of pandas. Function every time you need to apply it.

2.4 SCIKIT-LEARN LIBRARY


In this section, we introduce the machine learning vocabulary that we use throughout scikit-
learn and give a simple learning example.
The sklearn.preprocessing package provides several common utility functions and transformer
classes to change raw feature vectors into a representation that is more suitable for the
downstream estimators.
Read data:
 Loading Iris data:

Check unique value:

 Remove null value.

 Converting Categorical Variables into Numerical

6
 Convert in array.

 Model selection

1). X_train - This includes your all independent variables,these will be used to train the model,
also as we have specified the test_size = 0.20, this means 80% of observations from our
complete data will be used to train/fit the model and rest 20% will be used to test the model.
2). X_test - This is remaining 20% portion of the independent variables from the data which
will not be used in the training phase and will be used to make predictions to test the accuracy
of the model.
3). y_train - This is our dependent variable which needs to be predicted by this model, this
includes category labels against our independent variables, we need to specify our dependent
variable while training/fitting the model.
4). y_test - This data has category labels for our test data, these labels will be used to test the
accuracy between actual and predicted categories.

7
We will use Scikit-Lean’s support vector classifier(svc) to train an SVM model on this data
Support vector machines (SVMs) are powerful yet flexible supervised machine learning
algorithms which are used both for classification and regression. But generally, they are used in
classification problems. SVMs have their unique way of implementation as compared to other
machine learning algorithms. Lately, they are extremely popular because of their ability to
handle multiple continuous and categorical variables.

 Predicted to the x test value

• Final test of parameter

 Other Models

8
9
Chapter: 3 Methodology Adopted to carry out an internship

3.1 Machine Learning

3.2 Regression

10
Chapter: 4 Tools and Technology Used

4.1. REQUIREMENT AND SPECIFICATION


4.1.1. Hardware Requirement
Processor: Intel(R) Processor Minimum 1.80 GHz

RAM: Minimum 4GB

Hard Disk: Minimum 10GB

4.1.2. Software Requirement


OS: Windows 8

System 64 bit
Type:

4.1.3. Tools
4.2 SOFTWARE STUDY:
4.2.1 Python 3.7
Python is a high-level programming language that is interpreted. Programming language
Python was designed by Guido van Rossum in 1991 and is based on a design philosophy that
emphasizes code readability via the use of considerable whitespace. To aid programmers in
writing concise, logical code for small and big projects, its language features and object-
oriented approach use object-oriented design principles.
Python is a dynamically typed, garbage-collected programming language. It is compatible with
a variety of programming paradigms, including procedural, object- oriented, and functional.
Because of its vast standard library, Python is sometimes referred to as a "batteries included"
language.
Python is a versatile programming language that may be used in a variety of ways. Met
programming and met objects (magic methods) allow for functional and aspect- oriented
programming, as well as object-oriented and structured programming.
Using dynamic typing, reference counting, and a garbage collector that identifies cycles,
Python manages memory effectively. There's also late binding, which binds method and
variable names while the programme is being performed, allowing for dynamic name

11
resolution.
Lisp-like functional programming in the manner of Python can be done because of its design.
Also included are list comprehensions (dictionaries), sets (and generator expressions), and
filtering and mapping methods. [54] In the form of a formal paraphrase Both intercools and
functions implement functional tools taken from Haskell or Standard ML in the standard
library.
Beautiful is better than ugly, says the Zen of Python (PEP 20), a statement summarizing the
philosophy of Python.
 Explicit is superior to implicit in this case.
 It's better to keep things simple than complicated.
 Complex rather than convoluted is preferred.
 Additionally, readability is critical.
Python was designed to be highly extensible, rather than having all of its capabilities built into
its core. Existing applications often benefit from programmable interfaces, because to their
small modularity. Because of his dissatisfaction with ABC's approach, Van Rossum came up
with the idea of a compact core language with a huge standard library and an easily expandable
interpreter.
4.3 SCIKIT-LEARN
Python machine learning library Scikit-learn is free. NumPy and SciPy's numerical and
scientific libraries NumPy and SciPy are supported, and the software is designed to interact
with a variety of classification, regression, and clustering techniques.
4.4. PANDAS
Pandas is a Python library that is available as a free open-source download and that enables you
to modify data. Pandas is a Python library that was developed on top of NumPy; as a result, it
needs NumPy to function properly. Pandas make it easy to produce, modify, and organize data
sets in a variety of formats. Pandas is also a beautiful option for dealing with time series data.
Pandas is popular among data scientists because of the following advantages:
In addition, it provides a sophisticated time series tool for working with data; it makes use of
Sequence for only one data structures and Data file for multi- dimensional data structures; it
provides an efficient way to slice the data; and it provides various way to combine, combines,
or reshape the data.
Pandas is, in a nutshell, a convenient data analysis library. It has the capability of manipulating
and analyzing information. As well as the ability to execute operations on data structures that
are both powerful and easy to use, Pandas also provides the ability to do operations on data

12
structures in a short amount of time.

13
Chapter: 5 Snapshots

5.1 Iris Data:

Reading data:

Spilt data:

Training confusion matrix SVM:

Testing Classification Report SVM:

14
5.2 heart disease:

Reading data:

Spilt data:

Training confusion matrix SVM:

Testing Classification Report SVM:

15
5.3 Chronical Kidney Disease:

Reading data:

Spilt data:

Training confusion matrix SVM:

Testing Classification Report SVM:

16
17
5.4 Bank Additional:
Reading data:

Spilt data:

Training confusion matrix SVM:

Testing Classification Report SVM:

18
5.5 Bike Bayer’s:

Reading data:

Spilt data:

Training confusion matrix SVM:

Testing Classification Report SVM:

19
20
5.6 Crop Recommendation:

Reading data:

Spilt data:

Training confusion matrix SVM:

21
Testing Classification Report SVM:

5.7 Instagram Fake User:

Reading data:

Spilt data:

Training confusion matrix SVM:

22
Testing Classification Report SVM:

5.8 Adani Share Price:

Reading data:

Spilt data:

parameters LR:

23
5.9 Covid-19 Cases:

Reading data:

Spilt data:

parameters LR:

24
25
Chapter: 6 Observation

6.1 Iris Data:

KNN:

DT:

RF:

26
NB:

6.2 heart disease:

KNN:

DT:

27
RF:

NB:

6.3 Chronical Kidney Disease:

KNN:

28
DT:

RF:

NB:

6.4 Bank Additional:

KNN:

29
DT:

RF:

NB:

6.5 Bike Bayer’s:

KNN:

30
DT:

RF:

NB:

31
6.6 Crop Recommendation:

KNN:

DT:

32
RF:

NB:

33
6.7 Instagram Fake User:

KNN:

DT:

34
RF:

NB:

Chapter: 7 Results and Discussions

7.1 Iris Data:


Model Accuracy Precision Recall F1-Score
SVM 1.00 1.00 1.00 1.00
KNN 1.00 1.00 1.00 1.00
DT 1.00 1.00 1.00 1.00
RF 1.00 1.00 1.00 1.00
NB 1.00 1.00 1.00 1.00

7.2 heart disease:

35
Model Accuracy Precision Recall F1-Score
SVM 0.74 0.76 0.67 0.71
KNN 0.70 0.68 0.66 0.70

DT 0.59 0.57 0.57 0.60


RF 0.89 0.89 0.87 0.89
NB 0.89 0.89 0.87 0.89

7.3 Chronical Kidney Disease:


Model Accuracy Precision Recall F1-Score
SVM 0.75 0.87 0.60 0.69
KNN 0.75 0.72 0.65 0.73
DT 1.00 1.00 1.00 1.00
RF 1.00 1.00 1.00 1.00
NBR 1.00 1.00 1.00 1.00

7.4 Bank Additional:


Model Accuracy Precision Recall F1-Score
SVM 0.90 0.76 0.60 0.87
KNN 0.88 0.68 0.62 0.87
DT 0.84 0.60 0.61 0.84
RF 0.90 0.80 0.58 0.87
NB 0.84 0.64 0.69 0.85

7.5 Bike Bayer’s:

Model Accuracy Precision Recall F1-Score


SVM 0.66 0.64 0.64 0.66
KNN 0.70 0.69 0.70 0.70
DT 0.69 0.68 0.69 0.69
RF 0.65 0.65 0.65 0.65
NB 0.63 0.62 0.63 0.63

7.6 Crop Recommendation:

Model Accuracy Precision Recall F1-Score

36
SVM 0.98 0.99 0.98 0.98
KNN 0.98 0.98 0.98 0.98
DT 0.98 0.98 0.98 0.98
RF 0.84 0.84 0.86 0.81
NB 0.99 0.99 0.99 0.99

7.7 Instagram Fake User:


Model Accuracy Precision Recall F1-Score
SVM 0.50 0.74 0.53 0.37
KNN 0.50 0.74 0.53 0.37
DT 0.88 0.88 0.88 0.88
RF 0.97 0.97 0.97 0.97
NB 0.67 0.79 0.69 0.64

7.8 Adani Share Price:


Model MSE RMSE MAE R2-score
LR 4020.12 63.40 48.32 0.72
DT 55.64 7.45 4.95 0.99
KNR 2707.31 52.03 34.12 0.81

7.9 Covid-19 cases


Model MSE RMSE MAE R2-score
LR 308007.37 551320.68 308007.37 0.15
DT 284854.26 563366.78 308007.37 0.11
KNR 315891.38 610957.62 308007.37 0.037

37
Chapter: 8 Conclusion and Future Scope

In the internship I have learn about machine learning and regression models and their working
on different datasets. As we can show that for Iris dataset, Heart disease, kidney disease etc.
dataset the Python ML is best, Similarly in regression ADANIPORTS, COVID-19 dataset the
RF classification model error is less as compared to others.
In future we can build any model with big dataset and can make AutoML approaches which
can select best ML models among all from single test.

38
References

1. https://pypi.org/
2. https://data-flair.training/blogs/python-introduction/
3. https://scikit-learn.org/
4. https://www.w3schools.com/python/pandas/default.asp
5. https://www.geeksforgeeks.org/introduction-to-pandas-in-python/

39

You might also like