ML_LAB_MANUAL
ML_LAB_MANUAL
# Sample Data
data = [10, 20, 20, 40, 50, 50, 50, 80, 90]
# Central Tendency
mean = stats.mean(data)
median = stats.median(data)
mode = stats.mode(data)
# Dispersion
variance = stats.variance(data)
std_dev = stats.stdev(data)
# Display Results
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")
OUTPUT:
Mean: 45.55555555555556
Median: 50
Mode: 50
Variance: 727.7777777777778
Standard Deviation: 26.977356760397743
2. Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy.
The math module is a standard module in Python and is always available. To use mathematical functions
under this module, you have to import the module using import math. It gives access to the underlying C
library functions. This module does not support complex datatypes. The math module is the complex
counterpart.
fabs(x) Returnstheabsolutevalueof x
factorial(x) Returnsthefactorialofx
floor(x) Returnsthelargestintegerlessthanorequaltox
fmod(x,y) Returnsthe remainderwhen xisdivided byy
frexp(x) Returnsthemantissa andexponent ofxasthe pair(m,
e)
fsum(iterable) Returnsan accuratefloatingpointsum of values in the
iterable
isfinite(x) Returns Trueif xis neither an infinitynor aNaN (Not
aNumber)
isinf(x) ReturnsTrueifxisa positiveornegativeinfinity
isnan(x) ReturnsTrueif x is a NaN
ldexp(x,i) Returnsx*(2**i)
modf(x) Returnsthefractionalandintegerpartsofx
trunc(x) Returnsthetruncated integervalueof x
exp(x) Returnse**x
expm1(x) Returnse**x-1
Program-1
Python Scipy Library
SciPy is an Open Source Python-based library, which is used in mathematics, scientific computing,
Engineering, and technical computing. SciPy also pronounced as "SighPi."
SciPy contains varieties of sub packages which help to solve the most common issue related to Scientific
Computation.
SciPy is the most used Scientific library only second to GNU Scientific Library for C/C++ or Matlab's.
Easy to use and understand as well as fast computational power.
It can operate on an array of NumPylibrary.
Numpy VS SciPy
Numpy:
1. Numpy is written in C and used for mathematical or numerical calculation.
2. It is faster than other Python Libraries
3. Numpy is the most use full Library for Data Science to perform basic calculations.
4. Numpy contains nothing but array datatype which performs the most basic operation like
5. sorting,shaping,indexing,etc.
SciPy:
Program-1
3. Study of Python Basic Libraries such as Pandas and Matplotlib.
The primary two components of pandas are the Series and Data Frame.
A Series is essentially a column, and a Data Frame is a multi-dimensional table made up of a collection
ofSeries.
Data Frames and Series are quite similar in that many operations that you can do with one you can do with the
other, such as filling in null values and calculating the mean.
With CSV files all you need is a single line to loading the data:
df = pd.read_csv('purchases.csv')df
Viewingyour data
The first thing to do when opening a new data set is print out a few rows to keep as a visual reference.We
accomplish this with head():
Movies_df.head()
Another fast and useful attribute is.shape,which outputs just a tuple of (rows,columns):
movies_df.shape
Note that. Shape has no parentheses and is a simple tuple of format (rows, columns). So we have1000 rows
and 11 columns in our movies Data Frame.
You'll be going to shape a lot when cleaning and transforming data.For example, you might filter some rows
based on some criteria and then want to know quickly how many rows were removed.
Program-1
We haven't defined an index in our example, but we see two columns in our output: The right column
contains our data, whereas the left column contains the index.Pandas created a default index starting with 0
going to 5, which is the length of the data minus 1.
dtype('int64'):The type int 64 tells us that Python is storing each value with in this column as a 64bitinteger
Program-2
Wecan directlyaccesstheindexandthe values of our Series S:
Matplotlib Library
Pyplot is a module of Matplotlib which provides simple functions to add plot elements like lines, images,
text,etc. to the current axes in the current figure.
plot(x-axis values,y-axis values)—plots a simple line graph with x-axis values against y-axis values
show()—displays the graph
title(―string ) — set the title of the plot as specified by the string
xlabel(―string )—set the label for x-axis as specified by the string
ylabel(―string )—set the label for y-axis as specified by thestring
figure()— used to control a figure level attributes
subplot(nrows,ncols,index)— Add a subplot to the current figure
suptitle(―string ) —It adds a common title to the figure specified by the string
subplots(nrows,ncols,figsize)—a convenient way to create sub plots, in a single call.It returns a figure
and number of axes.
set_title(―string )—an axes level method used to set the title of sub plots in a figure
bar(categorical variables, values, color) —used to create vertical bar graphs bar(categorical
variables, values, color) —used to create horizontal bar graphs legend(loc)—used to make
legend of the graph
xticks(index, categorical variables)—Get or set the current tick locations and labels of the x-axis
pie(value, categorical variables) —used to create a pie chart
Here we import Matplotlib‘s Py plot module and Numpy library as most of the data that we will be working
with arrays only.
We pass two arrays as our input arguments to Pyplot‘s plot() method and use show()method to invoke the
required plot. Here note that the first array appears on the x-axis and second array appears on the y-axis of
the plot. Now that our first plot is ready,let us add the title and namex-axis and y-axis using methods title(),
x label() and y label()respectively.
4. Simple Linear Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Sample Data
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1, 3, 5, 7, 9])
# Prediction
y_pred = model.predict(x)
# Plot
plt.scatter(x, y, color='blue', label='Actual')
plt.plot(x, y_pred, color='red', label='Predicted')
plt.legend()
plt.show()
OUTPUT:
5. Multiple Linear Regression for House Price Prediction
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd
data = pd.DataFrame({
"Size": [1400, 1600, 1700, 1875],
"Bedrooms": [3, 3, 4, 3],
"Age": [20, 15, 18, 12],
"Price": [245000, 312000, 279000, 308000]
})
# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Model
model = LinearRegression()
model.fit(X_train, y_train)
# Prediction
y_pred = model.predict(X_test)
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")
OUTPUT:
<ipython-input-5-04f1b0d55f88>:4: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pand
as 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability w
ith other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
import pandas as pd
Mean Squared Error: 1419700880.4400368
6. Decision Tree and Parameter Tuning
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
# Decision Tree
clf = DecisionTreeClassifier()
parameters = {"max_depth": [1, 2, 3], "criterion": ["gini", "entropy"]}
grid_search = GridSearchCV(clf, parameters,cv=2)
grid_search.fit(X, y)
OUTPUT:
7. K-Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier
# Data
X = [[1], [2], [3], [6], [7], [8]]
y = [0, 0, 0, 1, 1, 1]
# KNN Model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)
# Prediction
print(model.predict([[4]]))
OUTPUT:
[0]
8. Logistic Regression
# Data
X = [[1], [2], [3], [6], [7], [8]]
y = [0, 0, 0, 1, 1, 1]
# Logistic Regression
model = LogisticRegression()
model.fit(X, y)
# Prediction
print(model.predict([[4]]))
OUTPUT:
[0]
9. K-Means Clustering
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [2, 3], [3, 4], [8, 9], [9, 10], [10, 11]])
# K-Means Clustering
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
OUTPUT:
# Loading Datsets
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# Initialize classifiers
models = {
"Decision Tree": DecisionTreeClassifier(max_depth=3, random_state=42),
"KNN": KNeighborsClassifier(n_neighbors=3),
"Logistic Regression": LogisticRegression(max_iter=200, random_state=42)
}
# Evaluate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="weighted")
recall = recall_score(y_test, y_pred, average="weighted")
f1 = f1_score(y_test, y_pred, average="weighted")
results.append({
"Model": name,
"Accuracy": accuracy,
"Precision": precision,
"Recall": recall,
"F1-Score": f1
})
results_df = pd.DataFrame(results)
print(results_df)
OUTPUT: