0% found this document useful (0 votes)
5 views

ML_LAB_MANUAL

The document provides an overview of various Python libraries and techniques for data analysis and machine learning, including statistics, linear regression, decision trees, K-nearest neighbors, and logistic regression. It includes code examples for computing central tendency and dispersion measures, using libraries like NumPy, SciPy, Pandas, and Matplotlib, as well as implementing machine learning models with scikit-learn. Additionally, it covers performance analysis of classification algorithms using metrics such as accuracy and F1-score.

Uploaded by

lagishettisuresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

ML_LAB_MANUAL

The document provides an overview of various Python libraries and techniques for data analysis and machine learning, including statistics, linear regression, decision trees, K-nearest neighbors, and logistic regression. It includes code examples for computing central tendency and dispersion measures, using libraries like NumPy, SciPy, Pandas, and Matplotlib, as well as implementing machine learning models with scikit-learn. Additionally, it covers performance analysis of classification algorithms using metrics such as accuracy and F1-score.

Uploaded by

lagishettisuresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1.

Compute Central Tendency and Dispersion Measures


import statistics as stats

# Sample Data
data = [10, 20, 20, 40, 50, 50, 50, 80, 90]

# Central Tendency
mean = stats.mean(data)
median = stats.median(data)
mode = stats.mode(data)

# Dispersion
variance = stats.variance(data)
std_dev = stats.stdev(data)

# Display Results
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")

OUTPUT:

Mean: 45.55555555555556
Median: 50
Mode: 50
Variance: 727.7777777777778
Standard Deviation: 26.977356760397743
2. Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy.

 Python Math Library

The math module is a standard module in Python and is always available. To use mathematical functions
under this module, you have to import the module using import math. It gives access to the underlying C
library functions. This module does not support complex datatypes. The math module is the complex
counterpart.

List of Functions in Python Math Module


Function Description
ceil(x) Returnsthesmallestinteger greaterthanorequalto x.
copysign(x,y) Returnsxwith thesignofy

fabs(x) Returnstheabsolutevalueof x
factorial(x) Returnsthefactorialofx
floor(x) Returnsthelargestintegerlessthanorequaltox
fmod(x,y) Returnsthe remainderwhen xisdivided byy
frexp(x) Returnsthemantissa andexponent ofxasthe pair(m,
e)
fsum(iterable) Returnsan accuratefloatingpointsum of values in the
iterable
isfinite(x) Returns Trueif xis neither an infinitynor aNaN (Not
aNumber)
isinf(x) ReturnsTrueifxisa positiveornegativeinfinity
isnan(x) ReturnsTrueif x is a NaN
ldexp(x,i) Returnsx*(2**i)
modf(x) Returnsthefractionalandintegerpartsofx
trunc(x) Returnsthetruncated integervalueof x
exp(x) Returnse**x
expm1(x) Returnse**x-1

Program-1
 Python Scipy Library

SciPy is an Open Source Python-based library, which is used in mathematics, scientific computing,
Engineering, and technical computing. SciPy also pronounced as "SighPi."

SciPy contains varieties of sub packages which help to solve the most common issue related to Scientific
Computation.
SciPy is the most used Scientific library only second to GNU Scientific Library for C/C++ or Matlab's.
Easy to use and understand as well as fast computational power.
It can operate on an array of NumPylibrary.

Numpy VS SciPy

Numpy:
1. Numpy is written in C and used for mathematical or numerical calculation.
2. It is faster than other Python Libraries
3. Numpy is the most use full Library for Data Science to perform basic calculations.
4. Numpy contains nothing but array datatype which performs the most basic operation like
5. sorting,shaping,indexing,etc.

SciPy:

1. SciPy is built in top of the NumPy


2. SciPy is a fully-feature diversion of Linear Algebra while Numpy contains only a few features.
3. Most new Data Science features are available in Scipy rather than Numpy.

Linear Algebra with SciPy


1. Linear Algebra of SciPy is an implementation of BLAS and ATLASLA PACK libraries.
2. Performance of Linear Algebra is very fast compared to BLAS and LAPACK.
3. Linear algebra routine accepts two-dimensional array object and output is also a two-dimensional array.
4. Now let's do some test with scipy.linalg,

Calculating determinant of a two-dimensional matrix,

Program-1
3. Study of Python Basic Libraries such as Pandas and Matplotlib.

The primary two components of pandas are the Series and Data Frame.
A Series is essentially a column, and a Data Frame is a multi-dimensional table made up of a collection
ofSeries.
Data Frames and Series are quite similar in that many operations that you can do with one you can do with the
other, such as filling in null values and calculating the mean.

Reading data from CSVs

With CSV files all you need is a single line to loading the data:
df = pd.read_csv('purchases.csv')df

Let's load in the IMDB movies dataset to begin:


movies_df=pd.read_csv("IMDB-Movie-Data.csv",index_col="Title")
We're loading this dataset from a CSV and designating the movie titles to be our index.

Viewingyour data
The first thing to do when opening a new data set is print out a few rows to keep as a visual reference.We
accomplish this with head():
Movies_df.head()

Another fast and useful attribute is.shape,which outputs just a tuple of (rows,columns):
movies_df.shape
Note that. Shape has no parentheses and is a simple tuple of format (rows, columns). So we have1000 rows
and 11 columns in our movies Data Frame.
You'll be going to shape a lot when cleaning and transforming data.For example, you might filter some rows
based on some criteria and then want to know quickly how many rows were removed.

Program-1
We haven't defined an index in our example, but we see two columns in our output: The right column
contains our data, whereas the left column contains the index.Pandas created a default index starting with 0
going to 5, which is the length of the data minus 1.

dtype('int64'):The type int 64 tells us that Python is storing each value with in this column as a 64bitinteger

Program-2
Wecan directlyaccesstheindexandthe values of our Series S:

 Matplotlib Library

Pyplot is a module of Matplotlib which provides simple functions to add plot elements like lines, images,
text,etc. to the current axes in the current figure.

Make a simple plot


importmatplotlib.pyplotasplt import
numpy asnp
List of all the methods as they appeared.

plot(x-axis values,y-axis values)—plots a simple line graph with x-axis values against y-axis values
show()—displays the graph
title(―string ) — set the title of the plot as specified by the string
xlabel(―string )—set the label for x-axis as specified by the string
ylabel(―string )—set the label for y-axis as specified by thestring
figure()— used to control a figure level attributes
subplot(nrows,ncols,index)— Add a subplot to the current figure
suptitle(―string ) —It adds a common title to the figure specified by the string
subplots(nrows,ncols,figsize)—a convenient way to create sub plots, in a single call.It returns a figure
and number of axes.

set_title(―string )—an axes level method used to set the title of sub plots in a figure
bar(categorical variables, values, color) —used to create vertical bar graphs bar(categorical
variables, values, color) —used to create horizontal bar graphs legend(loc)—used to make
legend of the graph
xticks(index, categorical variables)—Get or set the current tick locations and labels of the x-axis
pie(value, categorical variables) —used to create a pie chart

hist(values,number of bins) —used to create a histogram


xlim(start value,endvalue)—used to set the limit of values of the x-axis
ylim(start value, end value)—used to set the limit of values of they-axis
scatter(x-axisvalues,y-axisvalues)—plots as catter plot with x-axisvalues against y-axisvalues axes()—
adds an axes to the current figure
set_xlabel(―string ) — axes level method used to set the x-label of the plot specified as a string
set_ylabel(―string )— axes level method used to set they-label of the plot specified as a string
scatter3D(x-axisvalues,y-axisvalues)—plots a three-dimensional scatter plot with x-axisvalues
against y-axisvalues
plot3D(x-axisvalues,y-axisvalues)—plots a three-dimensional line graph with x-axis values against y-
axis values

Here we import Matplotlib‘s Py plot module and Numpy library as most of the data that we will be working
with arrays only.

We pass two arrays as our input arguments to Pyplot‘s plot() method and use show()method to invoke the
required plot. Here note that the first array appears on the x-axis and second array appears on the y-axis of
the plot. Now that our first plot is ready,let us add the title and namex-axis and y-axis using methods title(),
x label() and y label()respectively.
4. Simple Linear Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample Data
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1, 3, 5, 7, 9])

# Linear Regression Model


model = LinearRegression()
model.fit(x, y)

# Prediction
y_pred = model.predict(x)

# Plot
plt.scatter(x, y, color='blue', label='Actual')
plt.plot(x, y_pred, color='red', label='Predicted')
plt.legend()
plt.show()

OUTPUT:
5. Multiple Linear Regression for House Price Prediction
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd

data = pd.DataFrame({
"Size": [1400, 1600, 1700, 1875],
"Bedrooms": [3, 3, 4, 3],
"Age": [20, 15, 18, 12],
"Price": [245000, 312000, 279000, 308000]
})

X = data[["Size", "Bedrooms", "Age"]]


y = data["Price"]

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Model
model = LinearRegression()
model.fit(X_train, y_train)

# Prediction
y_pred = model.predict(X_test)
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")

OUTPUT:

<ipython-input-5-04f1b0d55f88>:4: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pand
as 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability w
ith other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

import pandas as pd
Mean Squared Error: 1419700880.4400368
6. Decision Tree and Parameter Tuning
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

X = [[1, 1], [2, 2], [3, 3],[4,4],[5,5]]


y = [0, 0, 1,1,0]

# Decision Tree
clf = DecisionTreeClassifier()
parameters = {"max_depth": [1, 2, 3], "criterion": ["gini", "entropy"]}
grid_search = GridSearchCV(clf, parameters,cv=2)
grid_search.fit(X, y)

print(f"Best Parameters: {grid_search.best_params_}")

OUTPUT:

Best Parameters: {'criterion': 'gini', 'max_depth': 1}

7. K-Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier

# Data
X = [[1], [2], [3], [6], [7], [8]]
y = [0, 0, 0, 1, 1, 1]

# KNN Model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)

# Prediction
print(model.predict([[4]]))

OUTPUT:

[0]
8. Logistic Regression

from sklearn.linear_model import LogisticRegression

# Data
X = [[1], [2], [3], [6], [7], [8]]
y = [0, 0, 0, 1, 1, 1]

# Logistic Regression
model = LogisticRegression()
model.fit(X, y)

# Prediction
print(model.predict([[4]]))

OUTPUT:
[0]

9. K-Means Clustering
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [2, 3], [3, 4], [8, 9], [9, 10], [10, 11]])

# K-Means Clustering
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)

print(f"Cluster Centers: {kmeans.cluster_centers_}")

OUTPUT:

Cluster Centers: [[ 2. 3.]


[ 9. 10.]]
10. Performance Analysis of Classification Algorithms (Mini Project)
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# Loading Datsets
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize classifiers
models = {
"Decision Tree": DecisionTreeClassifier(max_depth=3, random_state=42),
"KNN": KNeighborsClassifier(n_neighbors=3),
"Logistic Regression": LogisticRegression(max_iter=200, random_state=42)
}

# Train and evaluate each model


results = []
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Evaluate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="weighted")
recall = recall_score(y_test, y_pred, average="weighted")
f1 = f1_score(y_test, y_pred, average="weighted")

results.append({
"Model": name,
"Accuracy": accuracy,
"Precision": precision,
"Recall": recall,
"F1-Score": f1
})

results_df = pd.DataFrame(results)
print(results_df)

OUTPUT:

Model Accuracy Precision Recall F1-Score


0 Decision Tree 1.0 1.0 1.0 1.0
1 KNN 1.0 1.0 1.0 1.0
2 Logistic Regression 1.0 1.0 1.0 1.0

You might also like