0% found this document useful (0 votes)
21 views32 pages

Indexml Merged

Uploaded by

Sujal Gahlawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

Indexml Merged

Uploaded by

Sujal Gahlawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Index

S.No Practical Name Date Sign

1 Write Python Program to Plot Frequency Polygon, Ogive,


Histogram of the Unemployment Rates Data provided to you.

2 Write a Python Program to generate some random data. Plot


Box plots using this random data.

3 Write Python Program to generate data from Binomial


distribution and show the pdf plot for various of n and p.

4 Write Python Program to generate Poisson distribution and


show the pdf plot for various values of Lambda.

5 Write Python Program to generate data from Normal


distribution and show the pdf plot for various of µ, σ.

6 Write Python Program to generate exponential distribution and


show the pdf plot for various values of Lambda.

7 Write a Python Program to import and export data using


Pandas. Demonstrate various data pre-processing techniques.

8 Write a Python Program to implement Simple and Multiple


Linear Regression.

9 Write a Python Program to implement Logistic Regression on


a given dataset.

10 Write a Python Program to implement a decision tree on a


given data set.

11 Write a Python Program to implement K-means clustering


algorithm.

12 Write a program for ANOVA test.

13 Write a program for z-testing and t-testing.


Practical 1
Write Python Program to Plot Frequency Polygon, Ogive, Histogram of the
Unemployment Rates Data provided to you.

Program:-
import matplotlib.pyplot as plt

import numpy as np

# Updated Unemployment Rates Data

unemployment_rates = [2.4, 2.8, 2.9, 3.0, 3.4, 3.6, 3.6, 3.9, 4.1, 4.4, 4.6, 4.6, 4.7, 4.7, 4.8, 5.4,5.5, 5.6, 5.9, 5.9,
6.0, 6.0, 6.3, 6.3, 6.4, 6.8, 6.8, 6.9, 7.0, 7.1, 7.1, 7.1,7.2, 7.2, 7.5, 7.5, 7.5, 7.6, 7.6, 7.6, 7.7, 7.8, 8.0, 8.1, 8.3, 8.4,
8.8, 9.1,9.5, 9.6, 9.7, 10.3, 10.4, 10.6, 11.0, 11.2, 11.3, 11.4, 12.0]

# Create a Histogram

plt.hist(unemployment_rates, bins=15, edgecolor='black')

plt.xlabel('Unemployment Rate (%)')

plt.ylabel('Frequency')

plt.title('Histogram of Unemployment Rates')

plt.show()

# Create a Frequency Polygon

plt.hist(unemployment_rates, bins=15, edgecolor='black', histtype='step')

plt.xlabel('Unemployment Rate (%)')

plt.ylabel('Frequency')

plt.title('Frequency Polygon of Unemployment Rates')

plt.show()

# Create an Ogive

counts, bins = np.histogram(unemployment_rates, bins=15)

cdf = np.cumsum(counts)

plt.plot(bins[1:], cdf)

plt.xlabel('Unemployment Rate (%)')

plt.ylabel('Cumulative Frequency')

plt.title('Ogive of Unemployment Rates')

plt.show()

OUTPUT :-
Practical 2
Write a Python Program to generate some random data. Plot Box plots using this
random data.

Program:-
import numpy as np

import matplotlib.pyplot as plt

# Set a seed for reproducibility

np.random.seed(42)

# Generate random data

data1 = np.random.randn(100)

data2 = np.random.randn(150) * 2 + 10

# Create a box plot

plt.boxplot([data1, data2], labels=['Data Set 1', 'Data Set 2'])

# Customize the plot

plt.title('Box Plot of Random Data')

plt.xlabel('Data Set')

plt.ylabel('Value')

# Display the plot

plt.show()

OUTPUT :-
Practical 3
Write Python Program to generate data from Binomial distribution and show the pdf
plot for various of n and p.

Program:-
import numpy as np

import matplotlib.pyplot as plt

from scipy.special import factorial

# Function to calculate the binomial distribution

def binomial_dist(n, p, x):

# Calculate the binomial coefficient(n choose x)

binom_coeff = factorial(n) / (factorial(x) * factorial(n - x))

# Calculate the PMF

pmf = binom_coeff * (p**x) * ((1 - p))**(n - x)

return pmf

# Values of n and p for the plots

n_values = [20, 40, 60]

p_values = [0.2, 0.5, 0.8]

# Create a figure with subplots

fig, axs = plt.subplots(len(n_values), len(p_values), figsize=(15, 12))

# Generate the plots for each combination of n and p

for i in range(len(n_values)):

for j in range(len(p_values)):

n = n_values[i]

p = p_values[j]

x = np.arange(0, n + 1)

result = binomial_dist(n, p, x)

axs[i, j].bar(x, result, color='blue')

axs[i, j].set_title(f'n={n}, p={p}')

axs[i, j].set_xlabel('Number of Successes')

axs[i, j].set_ylabel('Probability')

# Adjust the layout

fig.subplots_adjust(left=0.08, right=0.98, bottom=0.05, top=0.9, hspace=0.4, wspace=0.3)


# Show the plots

plt.show()

OUTPUT:-
Practical 4
Write Python Program to generate Poisson distribution and show the pdf

plot for various values of Lambda.

Program:-
import numpy as np

import math

import matplotlib.pyplot as plt

# lam : lambda

def poisson_dist(x, lam):

u = []

z = []
for i in range(len(x)):

u.append(math.factorial(i))

z.append(lam**i)

# combine calculations into one line for efficiency

wz = np.array(z) / np.array(u) # lambda^k/k!

prob_density = wz * np.exp(-lam)

return prob_density

lam = 1

x = np.arange(0, 40, 1)

result = poisson_dist(x, lam)

fig, axs = plt.subplots(2, 3)

axs[0, 0].set_title('lam=1')

axs[0, 0].bar(x, result)

axs[0, 0].set_xlabel('x')
axs[0, 0].set_ylabel('Probability')

lam = 2

result = poisson_dist(x, lam)


axs[0, 1].set_title('lam=2')
axs[0, 1].bar(x, result)

axs[0, 1].set_xlabel('x')

axs[0, 1].set_ylabel('Probability')

lam = 5

result = poisson_dist(x, lam)

axs[0, 2].set_title('lam=5')

axs[0, 2].bar(x, result)

axs[0, 2].set_xlabel('x')
axs[0, 2].set_ylabel('Probability')

lam = 10

result = poisson_dist(x, lam)

axs[1, 0].set_title('lam=10')

axs[1, 0].bar(x, result)

axs[1, 0].set_xlabel('x')

axs[1, 0].set_ylabel('Probability')

lam = 15

result = poisson_dist(x, lam)

axs[1, 1].set_title('lam=15')

axs[1, 1].bar(x, result)

axs[1, 1].set_xlabel('x')

axs[1, 1].set_ylabel('Probability')
lam = 20

result = poisson_dist(x, lam)


axs[1, 2].set_title('lam=20')

axs[1, 2].bar(x, result)

axs[1, 2].set_xlabel('x')

axs[1, 2].set_ylabel('Probability')

fig.subplots_adjust(left=0.08, right=0.98, bottom=0.05, top=0.9,


hspace=0.6, wspace=0.8)

plt.show()

OUTPUT:-
Practical 5
Write Python Program to generate data from Normal distribution and show the pdf
plot for various of µ, σ.

Program :-
import numpy as np

import matplotlib.pyplot as plt

# Function to calculate the normal distribution PDF

def normal_dist(x, mean, sd):

prob_density = (1 / (sd * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / sd) ** 2)

return prob_density

# Define a list of means and standard deviations to demonstrate

means = [0, 1, -1]

sds = [1, 2, 0.5]

# Create subplots

fig, axs = plt.subplots(len(means), len(sds), figsize=(15, 10))

# Generate and plot the normal distribution PDF for each combination of mean and standard deviation

for i, mean in enumerate(means):

for j, sd in enumerate(sds):

x = np.linspace(mean - 3 * sd, mean + 3 * sd, 100)

y = normal_dist(x, mean, sd)

axs[i, j].plot(x, y)

axs[i, j].set_title(f'µ={mean}, σ={sd}')

axs[i, j].grid(True)

# Add a legend, title, and labels

for ax in axs.flat:

ax.set(xlabel='x', ylabel='Probability Density')

# Hide x labels and tick labels for top plots and y ticks for right plots.

for ax in axs.flat:

ax.label_outer()

plt.tight_layout()

plt.show()
OUTPUT:-
Practical 6
Write Python Program to generate exponential distribution and show the pdf plot for
various values of Lambda.

Program:-
import numpy as np

import matplotlib.pyplot as plt

# Function to calculate the exponential distribution

def exp_dist(x, lam):

prob_density = lam * np.exp(-lam * x)

return prob_density

# Generate and plot the exponential distribution for various values of Lambda

def plot_exp_dist(lambdas, x_range):

fig, axs = plt.subplots(2, 3, figsize=(15, 10))

axs = axs.flatten() # Flatten the 2D array of axes for easy iteration

for i, lam in enumerate(lambdas):

x = np.arange(0, x_range, 0.1)

result = exp_dist(x, lam)

axs[i].scatter(x, result)

axs[i].set_title(f'λ={lam}')

axs[i].set_xlabel('x')

axs[i].set_ylabel('Probability Density')

# Adjust layout to prevent overlap

fig.tight_layout()

plt.show()

# List of Lambda values to demonstrate

lambdas = [0.5, 1.0, 1.5, 2.0, 2.5, 3.0]

x_range = 10 # Range of x values

# Call the function to generate and plot the exponential distribution

plot_exp_dist(lambdas, x_range)

OUTPUT:-
Practical 7
Write a Python Program to import and export data using Pandas. Demonstrate various
data pre-processing techniques.

Program:-
import pandas as pd

import numpy as np

# Importing data from a CSV file

data = pd.read_csv(‘Salary_Data1.csv')

# Display the first few rows of the dataframe

print("Initial Data:")

print(data.head())

# Exporting data to a CSV file

data.to_csv('exported_data.csv', index=False)

# Data Pre-processing Techniques:

# 1.Handling missing values for numeric columns only

numeric_columns = data.select_dtypes(include=[np.number]).columns

data['Age'] = data['Age'].fillna(data['Age'].mean())

data['EstimatedSalary'] = data['EstimatedSalary'].fillna(data['EstimatedSalary'].mean())

# 2. Feature scaling

# Min-Max Scaling

data['Age'] = (data['Age'] - data['Age'].min()) / (data['Age'].max() - data['Age'].min())

data['EstimatedSalary'] = (data['EstimatedSalary'] - data['EstimatedSalary'].min()) /


(data['EstimatedSalary'].max() - data['EstimatedSalary'].min())

# Standardization (Z-score normalization)

data['Age'] = (data['Age'] - data['Age'].mean()) / data['Age'].std()

data['EstimatedSalary'] = (data['EstimatedSalary'] - data['EstimatedSalary'].mean()) /


data['EstimatedSalary'].std()

# 4. Removing duplicates

data.drop_duplicates(inplace=True)

# 5. Encoding ordinal features

# Assuming there is an ordinal column named 'Education_Level'

data['Education_Level'] = data['Education_Level'].map({'Bachelor': 1, 'Master': 2, 'PhD': 3})

# 6. One-hot encoding
# Assuming there is a categorical column named 'Gender'

data = pd.get_dummies(data, columns=['Gender'])

# Display the pre-processed data

print("\nPre-processed Data:")

print(data.head())

OUTPUT:-
Practical 8
Write a Python Program to implement Simple and Multiple Linear Regression.

Simple Linear Regression

Program:-
import numpy as np

import matplotlib.pyplot as plt

# Sample data

x = np.array([1, 2, 3, 4, 5])

y = np.array([2, 4, 5, 4, 5])

# Calculate the slope and intercept

m = (np.mean(x)*np.mean(y) - np.mean(x*y)) / (np.mean(x)**2 - np.mean(x**2))

b = np.mean(y) - m*np.mean(x)

# Predict values

y_pred = m*x + b

# Plot the data and regression line

plt.scatter(x, y)

plt.plot(x, y_pred, color='red')

plt.xlabel('x')

plt.ylabel('y')

plt.title('Simple Linear Regression')

plt.show()

OUTPUT:-
Multiple Linear Regression

Program:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

# Sample data

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])

y = np.dot(X, np.array([1, 2])) + 3 # y = 1 + 2x1 + 3x2

# Create a linear regression model

model = LinearRegression()

# Fit the model to the data

model.fit(X, y)

# Make predictions

y_pred = model.predict(X)

# Print the coefficients

print("Coefficients:", model.coef_)

print("Intercept:", model.intercept_)

# Plot the data and regression plane (for 2D data)

plt.scatter(X[:, 1], y)

plt.plot(X[:, 1], y_pred, color='red')

plt.xlabel('x1')

plt.ylabel('y')

plt.title('Multiple Linear Regression')

plt.show()

OUTPUT
Practical 09
Write a Python Program to implement Logistic Regression on a given dataset.
Program:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.colors import ListedColormap

# Function to split the dataset


def split_dataset(X, y, test_size, random_state):
np.random.seed(random_state)
indices = np.random.permutation(len(X))
test_set_size = int(len(X) * test_size)
test_indices = indices[:test_set_size]
train_indices = indices[test_set_size:]
return X[train_indices], X[test_indices], y[train_indices], y[test_indices]

# Function to scale features


def scale_features(X):
mean = np.mean(X, axis=0)
std = np.std(X, axis=0)
return (X - mean) / std, mean, std

# Logistic Regression Class


class LogisticRegression:
def __init__(self, learning_rate=0.01, n_iterations=1000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations

def sigmoid(self, z):


return 1 / (1 + np.exp(-z))

def fit(self, X, y):


self.weights = np.zeros(X.shape[1])
self.bias = 0
for _ in range(self.n_iterations):
model = self.sigmoid(np.dot(X, self.weights) + self.bias)
error = model - y
dw = np.dot(X.T, error) / len(X)
db = np.sum(error) / len(X)
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db

def predict(self, X):


model = self.sigmoid(np.dot(X, self.weights) + self.bias)
return model.round()

# Function to create a confusion matrix


def confusion_matrix(y_true, y_pred):
y_true = y_true.astype(int)
y_pred = y_pred.astype(int)
cm = np.zeros((2, 2))
for i in range(len(y_true)):
cm[y_true[i]][y_pred[i]] += 1
return cm

# Function to calculate accuracy score


def accuracy_score(y_true, y_pred):
correct = np.sum(y_true == y_pred)
return correct / len(y_true)

# Function to visualize the classification results


def visualize_classification(X_set, y_set, classifier, title, mean_train, std_train):
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.1),
np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.1))
Z = classifier.predict(np.array([X1.ravel(), X2.ravel()]).T)
Z = Z.reshape(X1.shape)

plt.contourf(X1, X2, Z, alpha=0.75, cmap=ListedColormap(('red', 'green')))


ticks_x = np.arange(X_set[:, 0].min() - 1, X_set[:, 0].max() + 1, 1)
ticks_y = np.arange(X_set[:, 1].min(), X_set[:, 1].max() + 1, 1)
plt.xticks(ticks_x, (ticks_x * std_train[0] + mean_train[0]).astype(int))
plt.yticks(ticks_y, (ticks_y * std_train[1] + mean_train[1]).astype(int))

for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c=np.array([ListedColormap(('red', 'green'))(i)]), label=j)
plt.title(title)
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Load dataset
dataset = pd.read_csv('Program_09.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = split_dataset(X, y, test_size=0.25, random_state=0)

# Feature Scaling
X_train, mean_train, std_train = scale_features(X_train)
X_test = (X_test - mean_train) / std_train

# Logistic Regression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

# Confusion Matrix and Accuracy Score


cm = confusion_matrix(y_test, y_pred)
print('Confusion Matrix\n', cm)
print('Accuracy Score:', accuracy_score(y_test, y_pred))

# Visualising the Training set results


visualize_classification(X_train, y_train, classifier, 'Logistic Regression (Training set)', mean_train, std_train)

# Visualising the Test set results


visualize_classification(X_test, y_test, classifier, 'Logistic Regression (Test set)', mean_train, std_train)
Output:
Practical 10
Write a Python Program to implement a decision tree on a given data set.
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
from pprint import pprint

# Set plot style


%matplotlib inline
sns.set_style("darkgrid")

# Load dataset and preprocess


df = pd.read_csv("Program_10.csv")
df = df.drop("Id", axis=1)
df = df.rename(columns={"species": "label"})
df.head()

# Train-test split function


def train_test_split(df, test_size):
if isinstance(test_size, float):
test_size = round(test_size * len(df))
indices = df.index.tolist()
test_indices = random.sample(population=indices, k=test_size)
test_df = df.loc[test_indices]
train_df = df.drop(test_indices)
return train_df, test_df

random.seed(0)
train_df, test_df = train_test_split(df, test_size=20)

# Visualization
sns.FacetGrid(df, hue="label", height=5, aspect=1.5).map(plt.scatter, "sepal_width", "sepal_length").add_legend()
plt.show()

sns.FacetGrid(df, hue="label", height=7, aspect=1.5).map(plt.scatter, "petal_width", "petal_length").add_legend()


max_petal_width = df["petal_width"].max()
plt.xticks(np.arange(0, max_petal_width + 0.2, 0.2))
plt.show()

# Descriptive statistics
statistics = ['min', 'max', 'mean', 'median', 'std']
df.groupby('label').agg({
'sepal_length': statistics,
'sepal_width': statistics,
'petal_length': statistics,
'petal_width': statistics
}).round(2).T

# Helper functions for decision tree


def check_purity(data):
label_column = data[:, -1]
unique_classes = np.unique(label_column)
return len(unique_classes) == 1
def classify_data(data):
label_column = data[:, -1]
unique_classes, counts_unique_classes = np.unique(label_column, return_counts=True)
index = counts_unique_classes.argmax()
return unique_classes[index]

def get_potential_splits(data):
potential_splits = {}
_, n_columns = data.shape
for column_index in range(n_columns - 1):
values = data[:, column_index]
unique_values = np.unique(values)
potential_splits[column_index] = [(unique_values[i] + unique_values[i-1]) / 2 for i in range(1, len(unique_values))]
return potential_splits

def split_data(data, split_column, split_value):


split_column_values = data[:, split_column]
data_below = data[split_column_values <= split_value]
data_above = data[split_column_values > split_value]
return data_below, data_above

def calculate_entropy(data):
label_column = data[:, -1]
_, counts = np.unique(label_column, return_counts=True)
probabilities = counts / counts.sum()
return sum(probabilities * -np.log2(probabilities))

def calculate_overall_entropy(data_below, data_above):


n = len(data_below) + len(data_above)
p_data_below = len(data_below) / n
p_data_above = len(data_above) / n
return p_data_below * calculate_entropy(data_below) + p_data_above * calculate_entropy(data_above)

def determine_best_split(data, potential_splits):


overall_entropy = 9999
for column_index in potential_splits:
for value in potential_splits[column_index]:
data_below, data_above = split_data(data, split_column=column_index, split_value=value)
current_overall_entropy = calculate_overall_entropy(data_below, data_above)
if current_overall_entropy < overall_entropy:
overall_entropy = current_overall_entropy
best_split_column = column_index
best_split_value = value
return best_split_column, best_split_value

# Decision tree algorithm


def decision_tree_algorithm(df, counter=0, min_samples=2, max_depth=5):
if counter == 0:
global COLUMN_HEADERS
COLUMN_HEADERS = df.columns
data = df.values
else:
data = df

if check_purity(data) or len(data) < min_samples or counter == max_depth:


return classify_data(data)

else:
counter += 1
potential_splits = get_potential_splits(data)
split_column, split_value = determine_best_split(data, potential_splits)
data_below, data_above = split_data(data, split_column, split_value)
feature_name = COLUMN_HEADERS[split_column]
question = "{} <= {}".format(feature_name, split_value)
sub_tree = {question: []}

yes_answer = decision_tree_algorithm(data_below, counter, min_samples, max_depth)


no_answer = decision_tree_algorithm(data_above, counter, min_samples, max_depth)

if yes_answer == no_answer:
return yes_answer
else:
sub_tree[question].append(yes_answer)
sub_tree[question].append(no_answer)
return sub_tree

def classify_example(example, tree):


question = list(tree.keys())[0]
feature_name, _, value = question.split()

if example[feature_name] <= float(value):


answer = tree[question][0]
else:
answer = tree[question][1]

if not isinstance(answer, dict):


return answer
else:
return classify_example(example, answer)

def calculate_accuracy(df, tree):


df["classification"] = df.apply(classify_example, axis=1, args=(tree,))
df["classification_correct"] = df["classification"] == df["label"]
return df["classification_correct"].mean() * 100

# Build and evaluate the decision tree


train_df, test_df = train_test_split(df, test_size=20)
tree = decision_tree_algorithm(train_df, max_depth=3)
accuracy = calculate_accuracy(test_df, tree)
pprint(tree)
print(f"Accuracy: {accuracy}%")
Output:
Practical 11
Write a Python Program to implement K-means clustering algorithm.
Program:
import numpy as np

def k_means(X, k, max_iters=100):


# Initialize centroids randomly
centroids = X[np.random.choice(range(len(X)), k, replace=False)]

for _ in range(max_iters):
# Assign each data point to the nearest centroid
labels = np.argmin(np.linalg.norm(X[:, np.newaxis] - centroids, axis=-1), axis=-1)

# Update centroids
new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])

# Check convergence
if np.all(centroids == new_centroids):
break

centroids = new_centroids

return labels, centroids

# Generate random data


np.random.seed(42)
X = np.random.rand(100, 2)

# Run k-means clustering


k=4
labels, centroids = k_means(X, k)

print("Cluster labels:", labels)


print("Centroids:", centroids)

Output:
Experiment 12
Write a program for ANOVA test.
Program:
import numpy as np
def one_way_anova(*args):
# Number of groups
k = len(args)
# Total number of observations
N = sum(len(group) for group in args)
# Group means
group_means = [np.mean(group) for group in args]
# Overall mean
grand_mean = np.mean([item for group in args for item in group])

# Between-group sum of squares (SSB)


SSB = sum(len(group) * ((group_mean - grand_mean) ** 2) for group, group_mean in zip(args, group_means))

# Within-group sum of squares (SSW)


SSW = sum(sum((item - group_mean) ** 2 for item in group) for group, group_mean in zip(args, group_means))
# Between-group degrees of freedom (dfB)
dfB = k - 1
# Within-group degrees of freedom (dfW)
dfW = N - k
# Mean square between (MSB)
MSB = SSB / dfB
# Mean square within (MSW)
MSW = SSW / dfW

# F statistic
F = MSB / MSW

return F, dfB, dfW

# Example usage:
# Data for three groups
group1 = [6, 8, 4, 5, 3, 4]
group2 = [8, 12, 9, 11, 6, 8]
group3 = [13, 9, 11, 8, 7, 12]

# Perform one-way ANOVA


F_statistic, df_between, df_within = one_way_anova(group1, group2, group3)
print("\nOne-way ANOVA")
print(f"F-statistic: {F_statistic}")
print(f"Degrees of freedom between groups: {df_between}")
print(f"Degrees of freedom within groups: {df_within}")

# Function to calculate two-way ANOVA


def two_way_anova(data, alpha=0.05):
# Number of levels for each factor
a = len(data)
b = len(data[0])
# Total number of observations
n = len(data[0][0])
N=a*b*n
# Calculate sum of squares
SS_total = sum(((x - np.mean(data)) ** 2) for group in data for subgroup in group for x in subgroup)
SS_factor_A = b * n * sum((np.mean(group) - np.mean(data)) ** 2 for group in data)
SS_factor_B = a * n * sum((np.mean(subgroup) - np.mean(data)) ** 2 for group in data for subgroup in group)
SS_error = SS_total - SS_factor_A - SS_factor_B
# Calculate degrees of freedom
df_total = N - 1
df_factor_A = a - 1
df_factor_B = b - 1
df_error = df_total - df_factor_A - df_factor_B
# Calculate mean squares
MS_factor_A = SS_factor_A / df_factor_A
MS_factor_B = SS_factor_B / df_factor_B
MS_error = SS_error / df_error
# Calculate F-statistics
F_factor_A = MS_factor_A / MS_error
F_factor_B = MS_factor_B / MS_error

return {
'SS_total': SS_total,
'SS_factor_A': SS_factor_A,
'SS_factor_B': SS_factor_B,
'SS_error': SS_error,
'df_total': df_total,
'df_factor_A': df_factor_A,
'df_factor_B': df_factor_B,
'df_error': df_error,
'MS_factor_A': MS_factor_A,
'MS_factor_B': MS_factor_B,
'MS_error': MS_error,
'F_factor_A': F_factor_A,
'F_factor_B': F_factor_B
}
# Example data for two-way ANOVA
data = [
[[3, 2, 1], [4, 5, 6]],
[[5, 6, 7], [8, 9, 10]],
[[7, 8, 9], [10, 11, 12]]
]
print("\nTwo-way ANOVA")
# Perform two-way ANOVA
results = two_way_anova(data)
for key, value in results.items():
print(f"{key}: {value}")
Output:
Practical 13
Write a program for z-testing and t-testing.
Program:
#Program 13
# Import the necessary libraries
import numpy as np
import scipy.stats as stats

print("\nOne Tailed Test\n")

print("A school claimed that the students who study there are more intelligent than the average school. "
"On calculating the IQ scores of 50 students, the average turns out to be 110. "
"The mean of the population IQ is 100 and the standard deviation is 15. "
"State whether the claim of the principal is right or not at a 5% significance level.")

# Given information
sample_mean = 110
population_mean = 100
population_std = 15
sample_size = 50
alpha = 0.05

print("Sample Mean:", sample_mean)


print("Population Mean:", population_mean)
print("Population standard deviation:", population_std)
print("Sample size:", sample_size)
print("Level of significance:", alpha)

# Compute the z-score


z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
print('Z-Score:', z_score)

print("\nUsing Critical Z-Score\n")

# Critical Z-Score
z_critical = stats.norm.ppf(1 - alpha)
print('Critical Z-Score:', z_critical)

# Hypothesis
if z_score > z_critical:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")

print("\nApproach 2: Using P-value\n")

# P-Value: Probability of getting less than a Z-score


p_value = 1 - stats.norm.cdf(z_score)

print('P-value:', p_value)

# Hypothesis
if p_value < alpha:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")

print("\nTwo-sampled z-test:\n")
print("There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, "
"while Group B has studied online classes. After the examination, the score of each student comes. "
"Now we want to determine whether the online or offline classes are better.\n"
"Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10\n"
"Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12\n"
"Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the
online and offline classes.")

# Group A (Offline Classes)


n1 = 50
x1 = 75
s1 = 10

# Group B (Online Classes)


n2 = 60
x2 = 80
s2 = 12

# Null Hypothesis = mu_1 - mu_2 = 0

# Hypothesized difference (under the null hypothesis)


D=0

# Calculate the test statistic (z-score)


z_score = ((x1 - x2) - D) / np.sqrt((s1**2 / n1) + (s2**2 / n2))
print('Z-Score:', np.abs(z_score))

# Calculate the critical value


z_critical = stats.norm.ppf(1 - alpha/2)
print('Critical Z-Score:', z_critical)

# Compare the test statistic with the critical value


if np.abs(z_score) > z_critical:
print("Reject the null hypothesis. There is a significant difference between the online and offline classes.")
else:
print("Fail to reject the null hypothesis. There is not enough evidence to suggest a significant difference between the online and
offline classes.")

# Approach 2: Using P-value


p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))
print('P-Value:', p_value)

# Compare the p-value with the significance level


if p_value < alpha:
print("Reject the null hypothesis. There is a significant difference between the online and offline classes.")
else:
print("Fail to reject the null hypothesis. There is not enough evidence to suggest significant difference between the online and
offline classes.")

print("\nIndependent samples t-test:\n")

# Sample
sample_A = np.array([1, 2, 4, 4, 5, 5, 6, 7, 8, 8])
sample_B = np.array([1, 2, 2, 3, 3, 4, 5, 6, 7, 7])

# Perform independent sample t-test


t_statistic, p_value = stats.ttest_ind(sample_A, sample_B)

# Compute the degrees of freedom (df)


df = len(sample_A) + len(sample_B) - 2

# Calculate the critical t-value


critical_t = stats.t.ppf(1 - alpha/2, df)
# Print the results
print("T-value:", t_statistic)
print("P-Value:", p_value)
print("Critical t-value:", critical_t)

# Decision
print('With T-value:')
if np.abs(t_statistic) > critical_t:
print('There is significant difference between two groups')
else:
print('No significant difference found between two groups')

print('With P-value:')
if p_value > alpha:
print('No evidence to reject the null hypothesis of a significant difference between the two groups')
else:
print('Evidence found to reject the null hypothesis of a significant difference between the two groups')

print("\nPaired sample t-test\n")

# Create the paired samples


math1 = np.array([4, 4, 7, 16, 20, 11, 13, 9, 11, 15])
math2 = np.array([15, 16, 14, 14, 22, 22, 23, 18, 18, 19])

# Perform the paired sample t-test


t_statistic, p_value = stats.ttest_rel(math1, math2)

# Compute the degrees of freedom (df)


df = len(math2) - 1

# Calculate the critical t-value


critical_t = stats.t.ppf(1 - alpha/2, df)

# Print the results


print("T-value:", t_statistic)
print("P-Value:", p_value)
print("Critical t-value:", critical_t)

# Decision
print('With T-value:')
if np.abs(t_statistic) > critical_t:
print('There is significant difference between math1 and math2')
else:
print('No significant difference found between math1 and math2')

print('With P-value:')
if p_value > alpha:
print('No evidence to reject the null hypothesis of a significant difference between math1 and math2')
else:
print('Evidence found to reject the null hypothesis of a significant difference between math1 and math2')

print("\nOne sample t-test\n")

# Define the population mean weight


population_mean = 45

# Define the sample mean weight and standard deviation


sample_mean = 75
sample_std = 25

# Define the sample size


sample_size = 25

# Calculate the t-statistic


t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))

# Define the degrees of freedom


df = sample_size - 1

# Calculate the critical t-value


critical_t = stats.t.ppf(1 - alpha, df)

# Calculate the p-value


p_value = 1 - stats.t.cdf(t_statistic, df)

# Print the results


print("T-Statistic:", t_statistic)
print("Critical t-value:", critical_t)
print("P-Value:", p_value)

# Decision
print('With T-value:')
if t_statistic > critical_t:
print("There is a significant difference in weight before and after the camp. The fitness camp had an effect.")
else:
print("There is no significant difference in weight before and after the camp. The fitness camp did not have a significant
effect.")

print('With P-value:')
if p_value > alpha:
print("There is a significant difference in weight before and after the camp. The fitness camp had an effect.")
else:
print("There is no significant difference in weight before and after the camp. The fitness camp did not have a significant
effect.")

Output:

You might also like