0% found this document useful (0 votes)

21 views32 pages

Indexml Merged

Uploaded by

Sujal Gahlawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views32 pages

Indexml Merged

Uploaded by

Sujal Gahlawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Index

S.No Practical Name Date Sign

1 Write Python Program to Plot Frequency Polygon, Ogive,

Histogram of the Unemployment Rates Data provided to you.

2 Write a Python Program to generate some random data. Plot

Box plots using this random data.

3 Write Python Program to generate data from Binomial

distribution and show the pdf plot for various of n and p.

4 Write Python Program to generate Poisson distribution and

show the pdf plot for various values of Lambda.

5 Write Python Program to generate data from Normal

distribution and show the pdf plot for various of µ, σ.

6 Write Python Program to generate exponential distribution and

show the pdf plot for various values of Lambda.

7 Write a Python Program to import and export data using

Pandas. Demonstrate various data pre-processing techniques.

8 Write a Python Program to implement Simple and Multiple

Linear Regression.

9 Write a Python Program to implement Logistic Regression on

a given dataset.

10 Write a Python Program to implement a decision tree on a

given data set.

11 Write a Python Program to implement K-means clustering

algorithm.

12 Write a program for ANOVA test.

13 Write a program for z-testing and t-testing.

Practical 1
Write Python Program to Plot Frequency Polygon, Ogive, Histogram of the
Unemployment Rates Data provided to you.

Program:-
import matplotlib.pyplot as plt

import numpy as np

# Updated Unemployment Rates Data

unemployment_rates = [2.4, 2.8, 2.9, 3.0, 3.4, 3.6, 3.6, 3.9, 4.1, 4.4, 4.6, 4.6, 4.7, 4.7, 4.8, 5.4,5.5, 5.6, 5.9, 5.9,
6.0, 6.0, 6.3, 6.3, 6.4, 6.8, 6.8, 6.9, 7.0, 7.1, 7.1, 7.1,7.2, 7.2, 7.5, 7.5, 7.5, 7.6, 7.6, 7.6, 7.7, 7.8, 8.0, 8.1, 8.3, 8.4,
8.8, 9.1,9.5, 9.6, 9.7, 10.3, 10.4, 10.6, 11.0, 11.2, 11.3, 11.4, 12.0]

# Create a Histogram

plt.hist(unemployment_rates, bins=15, edgecolor='black')

plt.xlabel('Unemployment Rate (%)')

plt.ylabel('Frequency')

plt.title('Histogram of Unemployment Rates')

plt.show()

# Create a Frequency Polygon

plt.hist(unemployment_rates, bins=15, edgecolor='black', histtype='step')

plt.xlabel('Unemployment Rate (%)')

plt.ylabel('Frequency')

plt.title('Frequency Polygon of Unemployment Rates')

plt.show()

# Create an Ogive

counts, bins = np.histogram(unemployment_rates, bins=15)

cdf = np.cumsum(counts)

plt.plot(bins[1:], cdf)

plt.xlabel('Unemployment Rate (%)')

plt.ylabel('Cumulative Frequency')

plt.title('Ogive of Unemployment Rates')

plt.show()

OUTPUT :-
Practical 2
Write a Python Program to generate some random data. Plot Box plots using this
random data.

Program:-
import numpy as np

import matplotlib.pyplot as plt

# Set a seed for reproducibility

np.random.seed(42)

# Generate random data

data1 = np.random.randn(100)

data2 = np.random.randn(150) * 2 + 10

# Create a box plot

plt.boxplot([data1, data2], labels=['Data Set 1', 'Data Set 2'])

# Customize the plot

plt.title('Box Plot of Random Data')

plt.xlabel('Data Set')

plt.ylabel('Value')

# Display the plot

plt.show()

OUTPUT :-
Practical 3
Write Python Program to generate data from Binomial distribution and show the pdf
plot for various of n and p.

Program:-
import numpy as np

import matplotlib.pyplot as plt

from scipy.special import factorial

# Function to calculate the binomial distribution

def binomial_dist(n, p, x):

# Calculate the binomial coefficient(n choose x)

binom_coeff = factorial(n) / (factorial(x) * factorial(n - x))

# Calculate the PMF

pmf = binom_coeff * (p**x) * ((1 - p))**(n - x)

return pmf

# Values of n and p for the plots

n_values = [20, 40, 60]

p_values = [0.2, 0.5, 0.8]

# Create a figure with subplots

fig, axs = plt.subplots(len(n_values), len(p_values), figsize=(15, 12))

# Generate the plots for each combination of n and p

for i in range(len(n_values)):

for j in range(len(p_values)):

n = n_values[i]

p = p_values[j]

x = np.arange(0, n + 1)

result = binomial_dist(n, p, x)

axs[i, j].bar(x, result, color='blue')

axs[i, j].set_title(f'n={n}, p={p}')

axs[i, j].set_xlabel('Number of Successes')

axs[i, j].set_ylabel('Probability')

# Adjust the layout

fig.subplots_adjust(left=0.08, right=0.98, bottom=0.05, top=0.9, hspace=0.4, wspace=0.3)

# Show the plots

plt.show()

OUTPUT:-
Practical 4
Write Python Program to generate Poisson distribution and show the pdf

plot for various values of Lambda.

Program:-
import numpy as np

import math

import matplotlib.pyplot as plt

# lam : lambda

def poisson_dist(x, lam):

u = []

z = []
for i in range(len(x)):

u.append(math.factorial(i))

z.append(lam**i)

# combine calculations into one line for efficiency

wz = np.array(z) / np.array(u) # lambda^k/k!

prob_density = wz * np.exp(-lam)

return prob_density

lam = 1

x = np.arange(0, 40, 1)

result = poisson_dist(x, lam)

fig, axs = plt.subplots(2, 3)

axs[0, 0].set_title('lam=1')

axs[0, 0].bar(x, result)

axs[0, 0].set_xlabel('x')
axs[0, 0].set_ylabel('Probability')

lam = 2

result = poisson_dist(x, lam)

axs[0, 1].set_title('lam=2')
axs[0, 1].bar(x, result)

axs[0, 1].set_xlabel('x')

axs[0, 1].set_ylabel('Probability')

lam = 5

result = poisson_dist(x, lam)

axs[0, 2].set_title('lam=5')

axs[0, 2].bar(x, result)

axs[0, 2].set_xlabel('x')
axs[0, 2].set_ylabel('Probability')

lam = 10

result = poisson_dist(x, lam)

axs[1, 0].set_title('lam=10')

axs[1, 0].bar(x, result)

axs[1, 0].set_xlabel('x')

axs[1, 0].set_ylabel('Probability')

lam = 15

result = poisson_dist(x, lam)

axs[1, 1].set_title('lam=15')

axs[1, 1].bar(x, result)

axs[1, 1].set_xlabel('x')

axs[1, 1].set_ylabel('Probability')
lam = 20

result = poisson_dist(x, lam)

axs[1, 2].set_title('lam=20')

axs[1, 2].bar(x, result)

axs[1, 2].set_xlabel('x')

axs[1, 2].set_ylabel('Probability')

fig.subplots_adjust(left=0.08, right=0.98, bottom=0.05, top=0.9,

hspace=0.6, wspace=0.8)

plt.show()

OUTPUT:-
Practical 5
Write Python Program to generate data from Normal distribution and show the pdf
plot for various of µ, σ.

Program :-
import numpy as np

import matplotlib.pyplot as plt

# Function to calculate the normal distribution PDF

def normal_dist(x, mean, sd):

prob_density = (1 / (sd * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / sd) ** 2)

return prob_density

# Define a list of means and standard deviations to demonstrate

means = [0, 1, -1]

sds = [1, 2, 0.5]

# Create subplots

fig, axs = plt.subplots(len(means), len(sds), figsize=(15, 10))

# Generate and plot the normal distribution PDF for each combination of mean and standard deviation

for i, mean in enumerate(means):

for j, sd in enumerate(sds):

x = np.linspace(mean - 3 * sd, mean + 3 * sd, 100)

y = normal_dist(x, mean, sd)

axs[i, j].plot(x, y)

axs[i, j].set_title(f'µ={mean}, σ={sd}')

axs[i, j].grid(True)

# Add a legend, title, and labels

for ax in axs.flat:

ax.set(xlabel='x', ylabel='Probability Density')

# Hide x labels and tick labels for top plots and y ticks for right plots.

for ax in axs.flat:

ax.label_outer()

plt.tight_layout()

plt.show()
OUTPUT:-
Practical 6
Write Python Program to generate exponential distribution and show the pdf plot for
various values of Lambda.

Program:-
import numpy as np

import matplotlib.pyplot as plt

# Function to calculate the exponential distribution

def exp_dist(x, lam):

prob_density = lam * np.exp(-lam * x)

return prob_density

# Generate and plot the exponential distribution for various values of Lambda

def plot_exp_dist(lambdas, x_range):

fig, axs = plt.subplots(2, 3, figsize=(15, 10))

axs = axs.flatten() # Flatten the 2D array of axes for easy iteration

for i, lam in enumerate(lambdas):

x = np.arange(0, x_range, 0.1)

result = exp_dist(x, lam)

axs[i].scatter(x, result)

axs[i].set_title(f'λ={lam}')

axs[i].set_xlabel('x')

axs[i].set_ylabel('Probability Density')

# Adjust layout to prevent overlap

fig.tight_layout()

plt.show()

# List of Lambda values to demonstrate

lambdas = [0.5, 1.0, 1.5, 2.0, 2.5, 3.0]

x_range = 10 # Range of x values

# Call the function to generate and plot the exponential distribution

plot_exp_dist(lambdas, x_range)

OUTPUT:-
Practical 7
Write a Python Program to import and export data using Pandas. Demonstrate various
data pre-processing techniques.

Program:-
import pandas as pd

import numpy as np

# Importing data from a CSV file

data = pd.read_csv(‘Salary_Data1.csv')

# Display the first few rows of the dataframe

print("Initial Data:")

print(data.head())

# Exporting data to a CSV file

data.to_csv('exported_data.csv', index=False)

# Data Pre-processing Techniques:

# 1.Handling missing values for numeric columns only

numeric_columns = data.select_dtypes(include=[np.number]).columns

data['Age'] = data['Age'].fillna(data['Age'].mean())

data['EstimatedSalary'] = data['EstimatedSalary'].fillna(data['EstimatedSalary'].mean())

# 2. Feature scaling

# Min-Max Scaling

data['Age'] = (data['Age'] - data['Age'].min()) / (data['Age'].max() - data['Age'].min())

data['EstimatedSalary'] = (data['EstimatedSalary'] - data['EstimatedSalary'].min()) /

(data['EstimatedSalary'].max() - data['EstimatedSalary'].min())

# Standardization (Z-score normalization)

data['Age'] = (data['Age'] - data['Age'].mean()) / data['Age'].std()

data['EstimatedSalary'] = (data['EstimatedSalary'] - data['EstimatedSalary'].mean()) /

data['EstimatedSalary'].std()

# 4. Removing duplicates

data.drop_duplicates(inplace=True)

# 5. Encoding ordinal features

# Assuming there is an ordinal column named 'Education_Level'

data['Education_Level'] = data['Education_Level'].map({'Bachelor': 1, 'Master': 2, 'PhD': 3})

# 6. One-hot encoding
# Assuming there is a categorical column named 'Gender'

data = pd.get_dummies(data, columns=['Gender'])

# Display the pre-processed data

print("\nPre-processed Data:")

print(data.head())

OUTPUT:-
Practical 8
Write a Python Program to implement Simple and Multiple Linear Regression.

Simple Linear Regression

Program:-
import numpy as np

import matplotlib.pyplot as plt

# Sample data

x = np.array([1, 2, 3, 4, 5])

y = np.array([2, 4, 5, 4, 5])

# Calculate the slope and intercept

m = (np.mean(x)np.mean(y) - np.mean(xy)) / (np.mean(x)2 - np.mean(x2))

b = np.mean(y) - m*np.mean(x)

# Predict values

y_pred = m*x + b

# Plot the data and regression line

plt.scatter(x, y)

plt.plot(x, y_pred, color='red')

plt.xlabel('x')

plt.ylabel('y')

plt.title('Simple Linear Regression')

plt.show()

OUTPUT:-
Multiple Linear Regression

Program:-
import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

# Sample data

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])

y = np.dot(X, np.array([1, 2])) + 3 # y = 1 + 2x1 + 3x2

# Create a linear regression model

model = LinearRegression()

# Fit the model to the data

model.fit(X, y)

# Make predictions

y_pred = model.predict(X)

# Print the coefficients

print("Coefficients:", model.coef_)

print("Intercept:", model.intercept_)

# Plot the data and regression plane (for 2D data)

plt.scatter(X[:, 1], y)

plt.plot(X[:, 1], y_pred, color='red')

plt.xlabel('x1')

plt.ylabel('y')

plt.title('Multiple Linear Regression')

plt.show()

OUTPUT
Practical 09
Write a Python Program to implement Logistic Regression on a given dataset.
Program:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.colors import ListedColormap

# Function to split the dataset

def split_dataset(X, y, test_size, random_state):
np.random.seed(random_state)
indices = np.random.permutation(len(X))
test_set_size = int(len(X) * test_size)
test_indices = indices[:test_set_size]
train_indices = indices[test_set_size:]
return X[train_indices], X[test_indices], y[train_indices], y[test_indices]

# Function to scale features

def scale_features(X):
mean = np.mean(X, axis=0)
std = np.std(X, axis=0)
return (X - mean) / std, mean, std

# Logistic Regression Class

class LogisticRegression:
def __init__(self, learning_rate=0.01, n_iterations=1000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations

def sigmoid(self, z):

return 1 / (1 + np.exp(-z))

def fit(self, X, y):

self.weights = np.zeros(X.shape[1])
self.bias = 0
for _ in range(self.n_iterations):
model = self.sigmoid(np.dot(X, self.weights) + self.bias)
error = model - y
dw = np.dot(X.T, error) / len(X)
db = np.sum(error) / len(X)
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db

def predict(self, X):

model = self.sigmoid(np.dot(X, self.weights) + self.bias)
return model.round()

# Function to create a confusion matrix

def confusion_matrix(y_true, y_pred):
y_true = y_true.astype(int)
y_pred = y_pred.astype(int)
cm = np.zeros((2, 2))
for i in range(len(y_true)):
cm[y_true[i]][y_pred[i]] += 1
return cm

# Function to calculate accuracy score

def accuracy_score(y_true, y_pred):
correct = np.sum(y_true == y_pred)
return correct / len(y_true)

# Function to visualize the classification results

def visualize_classification(X_set, y_set, classifier, title, mean_train, std_train):
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.1),
np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.1))
Z = classifier.predict(np.array([X1.ravel(), X2.ravel()]).T)
Z = Z.reshape(X1.shape)

plt.contourf(X1, X2, Z, alpha=0.75, cmap=ListedColormap(('red', 'green')))

ticks_x = np.arange(X_set[:, 0].min() - 1, X_set[:, 0].max() + 1, 1)
ticks_y = np.arange(X_set[:, 1].min(), X_set[:, 1].max() + 1, 1)
plt.xticks(ticks_x, (ticks_x * std_train[0] + mean_train[0]).astype(int))
plt.yticks(ticks_y, (ticks_y * std_train[1] + mean_train[1]).astype(int))

for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c=np.array([ListedColormap(('red', 'green'))(i)]), label=j)
plt.title(title)
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Load dataset
dataset = pd.read_csv('Program_09.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = split_dataset(X, y, test_size=0.25, random_state=0)

# Feature Scaling
X_train, mean_train, std_train = scale_features(X_train)
X_test = (X_test - mean_train) / std_train

# Logistic Regression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

# Confusion Matrix and Accuracy Score

cm = confusion_matrix(y_test, y_pred)
print('Confusion Matrix\n', cm)
print('Accuracy Score:', accuracy_score(y_test, y_pred))

# Visualising the Training set results

visualize_classification(X_train, y_train, classifier, 'Logistic Regression (Training set)', mean_train, std_train)

# Visualising the Test set results

visualize_classification(X_test, y_test, classifier, 'Logistic Regression (Test set)', mean_train, std_train)
Output:
Practical 10
Write a Python Program to implement a decision tree on a given data set.
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
from pprint import pprint

# Set plot style

%matplotlib inline
sns.set_style("darkgrid")

# Load dataset and preprocess

df = pd.read_csv("Program_10.csv")
df = df.drop("Id", axis=1)
df = df.rename(columns={"species": "label"})
df.head()

# Train-test split function

def train_test_split(df, test_size):
if isinstance(test_size, float):
test_size = round(test_size * len(df))
indices = df.index.tolist()
test_indices = random.sample(population=indices, k=test_size)
test_df = df.loc[test_indices]
train_df = df.drop(test_indices)
return train_df, test_df

random.seed(0)
train_df, test_df = train_test_split(df, test_size=20)

# Visualization
sns.FacetGrid(df, hue="label", height=5, aspect=1.5).map(plt.scatter, "sepal_width", "sepal_length").add_legend()
plt.show()

sns.FacetGrid(df, hue="label", height=7, aspect=1.5).map(plt.scatter, "petal_width", "petal_length").add_legend()

max_petal_width = df["petal_width"].max()
plt.xticks(np.arange(0, max_petal_width + 0.2, 0.2))
plt.show()

# Descriptive statistics
statistics = ['min', 'max', 'mean', 'median', 'std']
df.groupby('label').agg({
'sepal_length': statistics,
'sepal_width': statistics,
'petal_length': statistics,
'petal_width': statistics
}).round(2).T

# Helper functions for decision tree

def check_purity(data):
label_column = data[:, -1]
unique_classes = np.unique(label_column)
return len(unique_classes) == 1
def classify_data(data):
label_column = data[:, -1]
unique_classes, counts_unique_classes = np.unique(label_column, return_counts=True)
index = counts_unique_classes.argmax()
return unique_classes[index]

def get_potential_splits(data):
potential_splits = {}
_, n_columns = data.shape
for column_index in range(n_columns - 1):
values = data[:, column_index]
unique_values = np.unique(values)
potential_splits[column_index] = [(unique_values[i] + unique_values[i-1]) / 2 for i in range(1, len(unique_values))]
return potential_splits

def split_data(data, split_column, split_value):

split_column_values = data[:, split_column]
data_below = data[split_column_values <= split_value]
data_above = data[split_column_values > split_value]
return data_below, data_above

def calculate_entropy(data):
label_column = data[:, -1]
_, counts = np.unique(label_column, return_counts=True)
probabilities = counts / counts.sum()
return sum(probabilities * -np.log2(probabilities))

def calculate_overall_entropy(data_below, data_above):

n = len(data_below) + len(data_above)
p_data_below = len(data_below) / n
p_data_above = len(data_above) / n
return p_data_below * calculate_entropy(data_below) + p_data_above * calculate_entropy(data_above)

def determine_best_split(data, potential_splits):

overall_entropy = 9999
for column_index in potential_splits:
for value in potential_splits[column_index]:
data_below, data_above = split_data(data, split_column=column_index, split_value=value)
current_overall_entropy = calculate_overall_entropy(data_below, data_above)
if current_overall_entropy < overall_entropy:
overall_entropy = current_overall_entropy
best_split_column = column_index
best_split_value = value
return best_split_column, best_split_value

# Decision tree algorithm

def decision_tree_algorithm(df, counter=0, min_samples=2, max_depth=5):
if counter == 0:
global COLUMN_HEADERS
COLUMN_HEADERS = df.columns
data = df.values
else:
data = df

if check_purity(data) or len(data) < min_samples or counter == max_depth:

return classify_data(data)

else:
counter += 1
potential_splits = get_potential_splits(data)
split_column, split_value = determine_best_split(data, potential_splits)
data_below, data_above = split_data(data, split_column, split_value)
feature_name = COLUMN_HEADERS[split_column]
question = "{} <= {}".format(feature_name, split_value)
sub_tree = {question: []}

yes_answer = decision_tree_algorithm(data_below, counter, min_samples, max_depth)

no_answer = decision_tree_algorithm(data_above, counter, min_samples, max_depth)

if yes_answer == no_answer:
return yes_answer
else:
sub_tree[question].append(yes_answer)
sub_tree[question].append(no_answer)
return sub_tree

def classify_example(example, tree):

question = list(tree.keys())[0]
feature_name, _, value = question.split()

if example[feature_name] <= float(value):

answer = tree[question][0]
else:
answer = tree[question][1]

if not isinstance(answer, dict):

return answer
else:
return classify_example(example, answer)

def calculate_accuracy(df, tree):

df["classification"] = df.apply(classify_example, axis=1, args=(tree,))
df["classification_correct"] = df["classification"] == df["label"]
return df["classification_correct"].mean() * 100

# Build and evaluate the decision tree

train_df, test_df = train_test_split(df, test_size=20)
tree = decision_tree_algorithm(train_df, max_depth=3)
accuracy = calculate_accuracy(test_df, tree)
pprint(tree)
print(f"Accuracy: {accuracy}%")
Output:
Practical 11
Write a Python Program to implement K-means clustering algorithm.
Program:
import numpy as np

def k_means(X, k, max_iters=100):

# Initialize centroids randomly
centroids = X[np.random.choice(range(len(X)), k, replace=False)]

for _ in range(max_iters):
# Assign each data point to the nearest centroid
labels = np.argmin(np.linalg.norm(X[:, np.newaxis] - centroids, axis=-1), axis=-1)

# Update centroids
new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])

# Check convergence
if np.all(centroids == new_centroids):
break

centroids = new_centroids

return labels, centroids

# Generate random data

np.random.seed(42)
X = np.random.rand(100, 2)

# Run k-means clustering

k=4
labels, centroids = k_means(X, k)

print("Cluster labels:", labels)

print("Centroids:", centroids)

Output:
Experiment 12
Write a program for ANOVA test.
Program:
import numpy as np
def one_way_anova(*args):
# Number of groups
k = len(args)
# Total number of observations
N = sum(len(group) for group in args)
# Group means
group_means = [np.mean(group) for group in args]
# Overall mean
grand_mean = np.mean([item for group in args for item in group])

# Between-group sum of squares (SSB)

SSB = sum(len(group) * ((group_mean - grand_mean) ** 2) for group, group_mean in zip(args, group_means))

# Within-group sum of squares (SSW)

SSW = sum(sum((item - group_mean) ** 2 for item in group) for group, group_mean in zip(args, group_means))
# Between-group degrees of freedom (dfB)
dfB = k - 1
# Within-group degrees of freedom (dfW)
dfW = N - k
# Mean square between (MSB)
MSB = SSB / dfB
# Mean square within (MSW)
MSW = SSW / dfW

# F statistic
F = MSB / MSW

return F, dfB, dfW

# Example usage:
# Data for three groups
group1 = [6, 8, 4, 5, 3, 4]
group2 = [8, 12, 9, 11, 6, 8]
group3 = [13, 9, 11, 8, 7, 12]

# Perform one-way ANOVA

F_statistic, df_between, df_within = one_way_anova(group1, group2, group3)
print("\nOne-way ANOVA")
print(f"F-statistic: {F_statistic}")
print(f"Degrees of freedom between groups: {df_between}")
print(f"Degrees of freedom within groups: {df_within}")

# Function to calculate two-way ANOVA

def two_way_anova(data, alpha=0.05):
# Number of levels for each factor
a = len(data)
b = len(data[0])
# Total number of observations
n = len(data[0][0])
N=a*b*n
# Calculate sum of squares
SS_total = sum(((x - np.mean(data)) ** 2) for group in data for subgroup in group for x in subgroup)
SS_factor_A = b * n * sum((np.mean(group) - np.mean(data)) ** 2 for group in data)
SS_factor_B = a * n * sum((np.mean(subgroup) - np.mean(data)) ** 2 for group in data for subgroup in group)
SS_error = SS_total - SS_factor_A - SS_factor_B
# Calculate degrees of freedom
df_total = N - 1
df_factor_A = a - 1
df_factor_B = b - 1
df_error = df_total - df_factor_A - df_factor_B
# Calculate mean squares
MS_factor_A = SS_factor_A / df_factor_A
MS_factor_B = SS_factor_B / df_factor_B
MS_error = SS_error / df_error
# Calculate F-statistics
F_factor_A = MS_factor_A / MS_error
F_factor_B = MS_factor_B / MS_error

return {
'SS_total': SS_total,
'SS_factor_A': SS_factor_A,
'SS_factor_B': SS_factor_B,
'SS_error': SS_error,
'df_total': df_total,
'df_factor_A': df_factor_A,
'df_factor_B': df_factor_B,
'df_error': df_error,
'MS_factor_A': MS_factor_A,
'MS_factor_B': MS_factor_B,
'MS_error': MS_error,
'F_factor_A': F_factor_A,
'F_factor_B': F_factor_B
}
# Example data for two-way ANOVA
data = [
[[3, 2, 1], [4, 5, 6]],
[[5, 6, 7], [8, 9, 10]],
[[7, 8, 9], [10, 11, 12]]
]
print("\nTwo-way ANOVA")
# Perform two-way ANOVA
results = two_way_anova(data)
for key, value in results.items():
print(f"{key}: {value}")
Output:
Practical 13
Write a program for z-testing and t-testing.
Program:
#Program 13
# Import the necessary libraries
import numpy as np
import scipy.stats as stats

print("\nOne Tailed Test\n")

print("A school claimed that the students who study there are more intelligent than the average school. "
"On calculating the IQ scores of 50 students, the average turns out to be 110. "
"The mean of the population IQ is 100 and the standard deviation is 15. "
"State whether the claim of the principal is right or not at a 5% significance level.")

# Given information
sample_mean = 110
population_mean = 100
population_std = 15
sample_size = 50
alpha = 0.05

print("Sample Mean:", sample_mean)

print("Population Mean:", population_mean)
print("Population standard deviation:", population_std)
print("Sample size:", sample_size)
print("Level of significance:", alpha)

# Compute the z-score

z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
print('Z-Score:', z_score)

print("\nUsing Critical Z-Score\n")

# Critical Z-Score
z_critical = stats.norm.ppf(1 - alpha)
print('Critical Z-Score:', z_critical)

# Hypothesis
if z_score > z_critical:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")

print("\nApproach 2: Using P-value\n")

# P-Value: Probability of getting less than a Z-score

p_value = 1 - stats.norm.cdf(z_score)

print('P-value:', p_value)

# Hypothesis
if p_value < alpha:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")

print("\nTwo-sampled z-test:\n")
print("There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, "
"while Group B has studied online classes. After the examination, the score of each student comes. "
"Now we want to determine whether the online or offline classes are better.\n"
"Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10\n"
"Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12\n"
"Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the
online and offline classes.")

# Group A (Offline Classes)

n1 = 50
x1 = 75
s1 = 10

# Group B (Online Classes)

n2 = 60
x2 = 80
s2 = 12

# Null Hypothesis = mu_1 - mu_2 = 0

# Hypothesized difference (under the null hypothesis)

D=0

# Calculate the test statistic (z-score)

z_score = ((x1 - x2) - D) / np.sqrt((s1**2 / n1) + (s2**2 / n2))
print('Z-Score:', np.abs(z_score))

# Calculate the critical value

z_critical = stats.norm.ppf(1 - alpha/2)
print('Critical Z-Score:', z_critical)

# Compare the test statistic with the critical value

if np.abs(z_score) > z_critical:
print("Reject the null hypothesis. There is a significant difference between the online and offline classes.")
else:
print("Fail to reject the null hypothesis. There is not enough evidence to suggest a significant difference between the online and
offline classes.")

# Approach 2: Using P-value

p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))
print('P-Value:', p_value)

# Compare the p-value with the significance level

if p_value < alpha:
print("Reject the null hypothesis. There is a significant difference between the online and offline classes.")
else:
print("Fail to reject the null hypothesis. There is not enough evidence to suggest significant difference between the online and
offline classes.")

print("\nIndependent samples t-test:\n")

# Sample
sample_A = np.array([1, 2, 4, 4, 5, 5, 6, 7, 8, 8])
sample_B = np.array([1, 2, 2, 3, 3, 4, 5, 6, 7, 7])

# Perform independent sample t-test

t_statistic, p_value = stats.ttest_ind(sample_A, sample_B)

# Compute the degrees of freedom (df)

df = len(sample_A) + len(sample_B) - 2

# Calculate the critical t-value

critical_t = stats.t.ppf(1 - alpha/2, df)
# Print the results
print("T-value:", t_statistic)
print("P-Value:", p_value)
print("Critical t-value:", critical_t)

# Decision
print('With T-value:')
if np.abs(t_statistic) > critical_t:
print('There is significant difference between two groups')
else:
print('No significant difference found between two groups')

print('With P-value:')
if p_value > alpha:
print('No evidence to reject the null hypothesis of a significant difference between the two groups')
else:
print('Evidence found to reject the null hypothesis of a significant difference between the two groups')

print("\nPaired sample t-test\n")

# Create the paired samples

math1 = np.array([4, 4, 7, 16, 20, 11, 13, 9, 11, 15])
math2 = np.array([15, 16, 14, 14, 22, 22, 23, 18, 18, 19])

# Perform the paired sample t-test

t_statistic, p_value = stats.ttest_rel(math1, math2)

# Compute the degrees of freedom (df)

df = len(math2) - 1

# Calculate the critical t-value

critical_t = stats.t.ppf(1 - alpha/2, df)

# Print the results

print("T-value:", t_statistic)
print("P-Value:", p_value)
print("Critical t-value:", critical_t)

# Decision
print('With T-value:')
if np.abs(t_statistic) > critical_t:
print('There is significant difference between math1 and math2')
else:
print('No significant difference found between math1 and math2')

print('With P-value:')
if p_value > alpha:
print('No evidence to reject the null hypothesis of a significant difference between math1 and math2')
else:
print('Evidence found to reject the null hypothesis of a significant difference between math1 and math2')

print("\nOne sample t-test\n")

# Define the population mean weight

population_mean = 45

# Define the sample mean weight and standard deviation

sample_mean = 75
sample_std = 25

# Define the sample size

sample_size = 25

# Calculate the t-statistic

t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))

# Define the degrees of freedom

df = sample_size - 1

# Calculate the critical t-value

critical_t = stats.t.ppf(1 - alpha, df)

# Calculate the p-value

p_value = 1 - stats.t.cdf(t_statistic, df)

# Print the results

print("T-Statistic:", t_statistic)
print("Critical t-value:", critical_t)
print("P-Value:", p_value)

# Decision
print('With T-value:')
if t_statistic > critical_t:
print("There is a significant difference in weight before and after the camp. The fitness camp had an effect.")
else:
print("There is no significant difference in weight before and after the camp. The fitness camp did not have a significant
effect.")

print('With P-value:')
if p_value > alpha:
print("There is a significant difference in weight before and after the camp. The fitness camp had an effect.")
else:
print("There is no significant difference in weight before and after the camp. The fitness camp did not have a significant
effect.")

Output:

Hands On Data Visualization Using Matplotlib
100% (1)
Hands On Data Visualization Using Matplotlib
7 pages
311 D 2
No ratings yet
311 D 2
4 pages
Stats
No ratings yet
Stats
33 pages
ml lab
No ratings yet
ml lab
12 pages
Distributions Demo
No ratings yet
Distributions Demo
28 pages
EXP-4 ABHAYRAJ SINGH
No ratings yet
EXP-4 ABHAYRAJ SINGH
11 pages
Exp 4 Statistical Data Analysis With Python Sdk Ok
No ratings yet
Exp 4 Statistical Data Analysis With Python Sdk Ok
18 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Statistical Analysis in Physics Practical File
No ratings yet
Statistical Analysis in Physics Practical File
28 pages
AD3411 (2)
No ratings yet
AD3411 (2)
28 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
24 pages
Lab Mannual
No ratings yet
Lab Mannual
49 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Assignment Mridul
No ratings yet
Assignment Mridul
56 pages
projectpdf
No ratings yet
projectpdf
12 pages
PSLP FILE
No ratings yet
PSLP FILE
29 pages
4-12
No ratings yet
4-12
17 pages
Lab 3
No ratings yet
Lab 3
14 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
Sampling
No ratings yet
Sampling
8 pages
DS4.1
No ratings yet
DS4.1
5 pages
DATA SCIENCE EXPERIMENTS
No ratings yet
DATA SCIENCE EXPERIMENTS
31 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
Python Code - Summary Statistics
No ratings yet
Python Code - Summary Statistics
6 pages
DSA LAB
No ratings yet
DSA LAB
28 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Practical 2
No ratings yet
Practical 2
7 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
AI Obse-2
No ratings yet
AI Obse-2
32 pages
Stats_Lab(4-6)
No ratings yet
Stats_Lab(4-6)
7 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
Matplotlib Starter: Import As Import As Import As
No ratings yet
Matplotlib Starter: Import As Import As Import As
24 pages
DS_lab manual
No ratings yet
DS_lab manual
31 pages
Plot exponential distribution
No ratings yet
Plot exponential distribution
2 pages
Exp1 21GS61R04
No ratings yet
Exp1 21GS61R04
10 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
dsa
No ratings yet
dsa
26 pages
Python Programs
No ratings yet
Python Programs
7 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
exp_2_sdk_ok
No ratings yet
exp_2_sdk_ok
18 pages
DSA LAB MANUAL
No ratings yet
DSA LAB MANUAL
17 pages
Chapter 0 Introduction
No ratings yet
Chapter 0 Introduction
14 pages
Fha-pyhton Program Unit 1-4.Docx
No ratings yet
Fha-pyhton Program Unit 1-4.Docx
13 pages
Coding Probability and Statistics With Python From Scratch
No ratings yet
Coding Probability and Statistics With Python From Scratch
37 pages
Teste 3
No ratings yet
Teste 3
3 pages
graphs using matplotlib
No ratings yet
graphs using matplotlib
23 pages
FDS Lab Question Bank
No ratings yet
FDS Lab Question Bank
11 pages
FDSA Lab Record
No ratings yet
FDSA Lab Record
30 pages
FDA_BATCH2PROGRAM
No ratings yet
FDA_BATCH2PROGRAM
18 pages
Gautam Python Worksheet
No ratings yet
Gautam Python Worksheet
4 pages
Tanishq Worksheet 5
No ratings yet
Tanishq Worksheet 5
4 pages
python codes
No ratings yet
python codes
15 pages
Python Practical Practice Word
No ratings yet
Python Practical Practice Word
4 pages
Lab 01
No ratings yet
Lab 01
5 pages
Advanced_Plot_Types_with_Matplotlib
No ratings yet
Advanced_Plot_Types_with_Matplotlib
8 pages
Sem 5
No ratings yet
Sem 5
25 pages
Worksheet 5
No ratings yet
Worksheet 5
4 pages
Python Matplotlib
No ratings yet
Python Matplotlib
20 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Chainsaw 2
No ratings yet
Chainsaw 2
38 pages
Practice Test 1B 2024 - Linear Relations
No ratings yet
Practice Test 1B 2024 - Linear Relations
6 pages
PPBFullbook
No ratings yet
PPBFullbook
217 pages
Hayes, S. C. & Hayes, L. J. (1992) - American Psychologist, 47, 1383-1395
No ratings yet
Hayes, S. C. & Hayes, L. J. (1992) - American Psychologist, 47, 1383-1395
46 pages
Politecnico Di Torino: Master's Degree in Civil Engineering
No ratings yet
Politecnico Di Torino: Master's Degree in Civil Engineering
72 pages
PDE Project Report
No ratings yet
PDE Project Report
14 pages
Datalog Educational System V3.8 User's Manual
No ratings yet
Datalog Educational System V3.8 User's Manual
264 pages
Too Low An SWR Can Kill You
No ratings yet
Too Low An SWR Can Kill You
12 pages
18CS71 Model Question Paper Seventh Semester B.E. Degree Examination (2021-22)
No ratings yet
18CS71 Model Question Paper Seventh Semester B.E. Degree Examination (2021-22)
4 pages
Strategic Planning For NGOs
100% (1)
Strategic Planning For NGOs
18 pages
Learning: Book: Artificial Intelligence, A Modern Approach (Russell & Norvig)
No ratings yet
Learning: Book: Artificial Intelligence, A Modern Approach (Russell & Norvig)
22 pages
ODE Assign 5 Soln
No ratings yet
ODE Assign 5 Soln
9 pages
Heap (K-Smallest Element) : (MOOC) Application and Practice of Algorithm
No ratings yet
Heap (K-Smallest Element) : (MOOC) Application and Practice of Algorithm
13 pages
Problabistic-PP
No ratings yet
Problabistic-PP
26 pages
100 Days DSA Roadmap
No ratings yet
100 Days DSA Roadmap
21 pages
JR Chemistry PDF - Set-2
No ratings yet
JR Chemistry PDF - Set-2
1 page
Clock Homework Sheets
100% (1)
Clock Homework Sheets
7 pages
Inventory Management & Inventory Models
No ratings yet
Inventory Management & Inventory Models
27 pages
CL 5 - Stative Vs Dynamic Verbs
No ratings yet
CL 5 - Stative Vs Dynamic Verbs
4 pages
Trigonometric Special Angles
100% (2)
Trigonometric Special Angles
12 pages
05 - Data Structures - Asymptotic Analysis
No ratings yet
05 - Data Structures - Asymptotic Analysis
6 pages
Confusion Matrix: Example Table of Confusion References External Links
No ratings yet
Confusion Matrix: Example Table of Confusion References External Links
3 pages
Can We Make Genetic Algorithms Work in High-Dimensionality Problems?
No ratings yet
Can We Make Genetic Algorithms Work in High-Dimensionality Problems?
17 pages
Slender Concrete Columns Sway Frame Moment Magnification ACI318 14 W PDF
No ratings yet
Slender Concrete Columns Sway Frame Moment Magnification ACI318 14 W PDF
36 pages
The Visualisation of Integrated 3D Petroleum Datasets in Arcgis
No ratings yet
The Visualisation of Integrated 3D Petroleum Datasets in Arcgis
11 pages
Shear Force and Bending Moment
No ratings yet
Shear Force and Bending Moment
17 pages
Calculus I - Solving Trig Equations
No ratings yet
Calculus I - Solving Trig Equations
2 pages
API MPMS Chapter 11.1 2004 Uncertainty Worksheet: +44-1397-773190 Landline
No ratings yet
API MPMS Chapter 11.1 2004 Uncertainty Worksheet: +44-1397-773190 Landline
3 pages
Thursday 7
No ratings yet
Thursday 7
9 pages