0% found this document useful (0 votes)
3 views

Ankit Python

The document is a practical file for a Python lab course at Panipat Institute of Engineering and Technology, focusing on Artificial Intelligence and Machine Learning. It includes various programming assignments that cover essential Python libraries, statistical concepts, and data visualization techniques. Each program aims to implement specific tasks such as calculating statistics, applying the Central Limit Theorem, and conducting hypothesis tests.

Uploaded by

Golden Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Ankit Python

The document is a practical file for a Python lab course at Panipat Institute of Engineering and Technology, focusing on Artificial Intelligence and Machine Learning. It includes various programming assignments that cover essential Python libraries, statistical concepts, and data visualization techniques. Each program aims to implement specific tasks such as calculating statistics, applying the Central Limit Theorem, and conducting hypothesis tests.

Uploaded by

Golden Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Panipat Institute of Engineering and Technology Samalkha

Department of Computer Science and Engineering


(Artificial Intelligence and Machine Learning)
Python Lab II (PC – CS – AI&ML – 218A)
Practical File
Submitted To: Submitted by:
Ms. Gaurisha Ankit Raj
Asst. Professor B.tech CSE (AI&ML)
4th sem (2823302)

Affiliated to:

Kurukshetra University, Kurukshetra, India


Index
Sr. No. Aim Page no. Date Remark
Write a program to
1. implement of basic python 3
libraries - numpy, scipy.
Write a program to
implement of basic of python
2. libraries–matplotlib, pandas, 6
scikitlearn.
Write a program to create
3. 9
sample from population.
Write a program to evaluate
4. Mean, Median, Mode of 11
dataset.
Write a program to
5. implement Central Limit 13
theorem in dataset.
Write a program to
6. implement Measure of 16
Spread in dataset.
Write a program to
implement to differentiate
7. between descriptive and 19
inferential statistics.
Write a program to
8. 22
implement pmf, pdf and cdf.
Write a program to
implement different
9. visualization techniques on 26
sample dataset.
Write a program to
implement different
10. hypothesis test on sample 31
dataset.
Program – 1

Aim: Write a program to implement of basic python libraries - numpy, scipy.

Code:
# Importing required libraries import
numpy as np
from scipy import linalg

# NumPy Basics print("-----


NumPy -----")

# Create a NumPy array


a = np.array([[1, 2], [3, 4]])
print("Array a:\n", a)

# Array addition b =
np.array([[5, 6], [7, 8]])
print("Array b:\n", b)

sum_array = a + b print("Sum of a
and b:\n", sum_array)

# Transpose print("Transpose
of a:\n", a.T)

# Dot product dot_product = np.dot(a, b)


print("Dot product of a and b:\n", dot_product)

# Basic statistics
print("Mean of a:", np.mean(a))
print("Standard Deviation of a:", np.std(a))

# SciPy Basics
print("\n----- SciPy -----")
# Linear Algebra: Solving system of linear equations
# Example: 2x + 3y = 8 and 3x + 4y = 11
A = np.array([[2, 3], [3, 4]])
b = np.array([8, 11])

x = linalg.solve(A, b) print("Solution of the system 2x + 3y = 8 and 3x + 4y = 11 is:\n x


=", x[0], ", y =", x[1])

Output
Program – 2

Aim: Write a program to implement of basic of python libraries–matplotlib,


pandas, scikitlearn.
Code:
import pandas as pd import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample Data using Pandas data


={
'Hours_Studied': [1, 2, 3, 4, 5],
'Marks_Scored': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data) print("Data
Table:\n", df)

# Plotting with Matplotlib


plt.scatter(df['Hours_Studied'], df['Marks_Scored'], color='Red')
plt.title('Study Hours vs Marks') plt.xlabel('Hours Studied')
plt.ylabel('Marks Scored') plt.grid(True) plt.show()

# Simple Linear Regression with Scikit-learn


X = df[['Hours_Studied']] y =
df['Marks_Scored']

model = LinearRegression() model.fit(X,


y)
# Predict marks for 6 hours of study predicted =
model.predict([[6]]) print("\nPredicted marks for 6 hours of
study:", predicted[0])

Output
Program - 3

Aim: Write a program to create sample from population.

Code:
import pandas as pd

# Create a sample dataset (a dictionary) data


={
'Name': ['Sukriti', 'Tejas', 'Aditya', 'Keshav', 'Diya', 'Kashvi', 'Jaishree', 'Wandy', 'Savidhi',
'Rydhym'],
'Age': [17, 20, 22, 16, 24, 23, 19, 25, 26, 21],
'Score': [85, 78, 90, 88, 92, 70, 95, 80, 87, 75]
}

# Convert the dictionary to a pandas DataFrame df


= pd.DataFrame(data)

# Show the original dataset


print("Original Dataset:\n", df)

# Define the sample size (e.g., 4)


sample_size = 4

# Create a random sample from the dataset


sample_df = df.sample(n=sample_size)

# Print the random sample print("\nRandom Sample from


the Dataset:\n", sample_df)
Output
Program -4
Aim: Write a program to evaluate Mean, Median, Mode of dataset.

Code:
import pandas as pd
from scipy import stats

# Sample dataset: Product names and their prices


data = {
'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Printer', 'Tablet', 'Webcam', 'Speaker',
'Charger', 'Router'],
'Price': [750, 25, 45, 200, 120, 300, 60, 85, 25, 120]
}

# Create DataFrame
df = pd.DataFrame(data)

# Display the dataset print("Product


Prices:\n")
print(df)

# Extract the 'Price' column


prices = df['Price']

# Calculate statistics mean_price


= prices.mean() median_price =
prices.median()
mode_price = stats.mode(prices, keepdims=True)[0][0]

# Display results print("\nPrice


Statistics-")
print(f"Mean Price : ₹{mean_price:.2f}")
print(f"Median Price : ₹{median_price}") print(f"Mode
Price : ₹{mode_price}")
Output
Program - 5

Aim: Write a program to implement Central Limit theorem in dataset.

Code:

import pandas as pd
import matplotlib.pyplot as plt

# Create dataset data


={
'Product': ['Pen', 'Notebook', 'Bag', 'Bottle', 'Shoes', 'Cap', 'Watch', 'Book', 'Lamp', 'Mouse'],
'Price': [10, 25, 45, 30, 120, 18, 250, 40, 35, 22]
}

df = pd.DataFrame(data)

# Plot original distribution


plt.hist(df['Price'], bins=10, color='skyblue', edgecolor='black')
plt.title("Original Price Distribution") plt.xlabel("Price")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()

# Central Limit Theorem using pandas only


sample_means = []
sample_size = 4

for _ in range(1000):
sample = df.sample(n=sample_size, replace=True)
sample_mean = sample['Price'].mean()
sample_means.append(sample_mean)

# Convert to Series
sample_means_series = pd.Series(sample_means)

# Plot sampling distribution


plt.hist(sample_means_series, bins=30, color='orange', edgecolor='black')
plt.title("Sampling Distribution of Mean (Sample Size = 4)")
plt.xlabel("Sample Mean Price") plt.ylabel("Frequency") plt.grid(True)
plt.show()
Output
Program – 6

Aim: Write a program to implement Measure of Spread in dataset.


Code:

import pandas as pd

# Sample dataset of employee ages data = {'Employee_Ages':


[25, 30, 22, 35, 28, 40, 27, 32, 31, 29]} df = pd.DataFrame(data)

# Display the dataset print("Dataset:\n",


df)

# Calculate Range age_range = df['Employee_Ages'].max() -


df['Employee_Ages'].min()

# Calculate Variance (sample) age_variance


= df['Employee_Ages'].var()

# Calculate Standard Deviation (sample) age_std_dev


= df['Employee_Ages'].std()

# Calculate Interquartile Range (IQR) Q1


= df['Employee_Ages'].quantile(0.25) Q3
= df['Employee_Ages'].quantile(0.75)
age_iqr = Q3 - Q1

# Display the measures of spread


print(f"\nMeasures of Spread for Employee Ages:") print(f"Range:
{age_range}")

print(f"Variance: {age_variance:.2f}") print(f"Standard


Deviation: {age_std_dev:.2f}") print(f"Interquartile
Range (IQR): {age_iqr}")

Output
Program – 7

Aim: Write a program to implement to differentiate between descriptive


and inferential statistics

Code:
import pandas as pd
from scipy.stats import chisquare

# Sample dataset: Observed counts of different fruit preferences in a survey data


={
'Fruit': ['Apple', 'Banana', 'Orange', 'Grapes', 'Mango'],
'Observed_Count': [50, 30, 40, 20, 60]
}

df = pd.DataFrame(data)
print("Dataset:\n", df)

# --- Descriptive Statistics --- print("\n---


Descriptive Statistics ---") total_responses =
df['Observed_Count'].sum()
percentages = (df['Observed_Count'] / total_responses) * 100

print(f"Total Responses: {total_responses}")


print("Percentage of each fruit preference:") for
fruit, pct in zip(df['Fruit'], percentages):
print(f"{fruit}: {pct:.2f}%")

# --- Inferential Statistics --- print("\n---


Inferential Statistics ---")

# Suppose we expect equal preference for all fruits (null hypothesis) expected_counts
= [total_responses / len(df)] * len(df)

# Perform Chi-Square goodness of fit test chi_stat, p_value =


chisquare(f_obs=df['Observed_Count'], f_exp=expected_counts)
print(f"Chi-Square Statistic: {chi_stat:.2f}")
print(f"P-Value: {p_value:.4f}")

if p_value < 0.05: print("Result: Reject the null hypothesis. Preferences are not
equally distributed.") else:
print("Result: Fail to reject the null hypothesis. Preferences are equally distributed.")

Output
Program – 8

Aim: Write a program to implement pmf, pdf and cdf.

Code:
import pandas as pd import
matplotlib.pyplot as plt
from scipy.stats import binom, norm

# --- Helper function to create linspace without numpy ---


def linspace(start, stop, num): step = (stop - start) /
(num - 1)
return [start + step * i for i in range(num)]

# --- PMF: Binomial Distribution (Discrete) --- n,


p = 10, 0.5
df_pmf = pd.DataFrame({'Successes': list(range(n + 1))})
df_pmf['PMF'] = df_pmf['Successes'].apply(lambda x: binom.pmf(x, n, p))

# --- PDF & CDF: Normal Distribution (Continuous) ---


x_values = linspace(-4, 4, 1000) df_pdf_cdf =
pd.DataFrame({'x': x_values})
df_pdf_cdf['PDF'] = df_pdf_cdf['x'].apply(lambda x: norm.pdf(x, loc=0, scale=1))
df_pdf_cdf['CDF'] = df_pdf_cdf['x'].apply(lambda x: norm.cdf(x, loc=0, scale=1))

# --- Plotting --- plt.figure(figsize=(12,


4))

# PMF plot
plt.subplot(1, 3, 1)
plt.stem(df_pmf['Successes'], df_pmf['PMF'], basefmt=" ")
plt.title('PMF - Binomial Distribution') plt.xlabel('Number
of Successes') plt.ylabel('Probability')

# PDF plot plt.subplot(1, 3, 2)


plt.plot(df_pdf_cdf['x'], df_pdf_cdf['PDF'])
plt.title('PDF - Normal Distribution')
plt.xlabel('x')
plt.ylabel('Density')

# CDF plot plt.subplot(1, 3, 3)


plt.plot(df_pdf_cdf['x'], df_pdf_cdf['CDF'])
plt.title('CDF - Normal Distribution')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')

plt.tight_layout()
plt.show()

Output
Program – 9

Aim: Write a program to implement different visualization techniques on


sample dataset.
Code:
import pandas as pd
import matplotlib.pyplot as plt

# Sample dataset: Monthly sales (in units) of 3 products data


={
'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'Product_A': [150, 180, 220, 210, 250, 300],
'Product_B': [80, 120, 160, 150, 190, 230],
'Product_C': [100, 110, 130, 120, 170, 200]
}

df = pd.DataFrame(data)
print("Sales Dataset:") print(df)

# Set up the figure and subplots with better spacing


fig, axs = plt.subplots(2, 2, figsize=(14, 10))

# 1. Line Plot - Monthly sales trend for each product


axs[0, 0].plot(df['Month'], df['Product_A'], marker='o', label='Product A')
axs[0, 0].plot(df['Month'], df['Product_B'], marker='s', label='Product B')
axs[0, 0].plot(df['Month'], df['Product_C'], marker='^', label='Product C')
axs[0, 0].set_title('Monthly Sales Trend') axs[0, 0].set_xlabel('Month')
axs[0, 0].set_ylabel('Units Sold') axs[0, 0].legend()
axs[0, 0].grid(True)

# 2. Bar Plot - Total sales of each product over 6 months total_sales


= df[['Product_A', 'Product_B', 'Product_C']].sum()
axs[0, 1].bar(total_sales.index, total_sales.values, color=['skyblue', 'salmon', 'lightgreen']) axs[0,
1].set_title('Total Sales by Product')
axs[0, 1].set_xlabel('Product')
axs[0, 1].set_ylabel('Total Units Sold')
# 3. Pie Chart - Sales proportion of products in June
june_sales = df[df['Month'] == 'Jun'][['Product_A', 'Product_B', 'Product_C']].iloc[0]
axs[1, 0].pie(june_sales, labels=june_sales.index, autopct='%1.1f%%', startangle=90,
colors=['skyblue', 'salmon', 'lightgreen']) axs[1, 0].set_title('Sales Proportion in June')

# 4. Histogram - Distribution of Product_A sales over months axs[1,


1].hist(df['Product_A'], bins=5, color='violet', edgecolor='black') axs[1,
1].set_title('Product A Sales Distribution') axs[1, 1].set_xlabel('Units
Sold')
axs[1, 1].set_ylabel('Frequency')

# Adjust layout to avoid overlapping


plt.tight_layout(pad=3.0) plt.show()

Output
Program – 10

Aim: Write a program to implement different hypothesis test on sample

Code:
import pandas as pd
from scipy import stats

# Sample dataset: Marks of students in two classes


class_a_scores = [85, 78, 90, 88, 76, 95, 89, 92] class_b_scores
= [80, 75, 85, 70, 78, 82, 77, 74]

# Convert to DataFrame for clarity df_scores


= pd.DataFrame({
'ClassA': class_a_scores,
'ClassB': class_b_scores
}) print("Student Scores:\n",
df_scores)

# 1. One-sample t-test for ClassA print("\n1. One-sample


t-test (H0: mean of ClassA = 80)")

t_stat, p_val = stats.ttest_1samp(class_a_scores, 80) print(f"t-statistic =


{t_stat:.4f}, p-value = {p_val:.4f}") if p_val < 0.05: print("Reject H0:
Mean of ClassA is significantly different from 80\n") else: print("Fail
to reject H0: No significant difference from mean = 80\n")

# 2. Two-sample t-test between ClassA and ClassB print("2. Two-


sample t-test (H0: mean of ClassA = mean of ClassB)")

t_stat2, p_val2 = stats.ttest_ind(class_a_scores, class_b_scores) print(f"t-


statistic = {t_stat2:.4f}, p-value = {p_val2:.4f}") if p_val2 < 0.05:
print("Reject H0: Means of ClassA and ClassB are significantly different\n")
else:
print("Fail to reject H0: Means might be equal\n")

# 3. Chi-Square Test for Independence print("3. Chi-Square Test for Independence


(Gender vs Online Learning Preference)")
gender = ['Male', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Female'] prefers_online
= ['Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes']

# Create a contingency table using pandas df_pref


= pd.DataFrame({
'Gender': gender,
'Prefers_Online': prefers_online
})

contingency_table = pd.crosstab(df_pref['Gender'], df_pref['Prefers_Online'])


print("\nContingency Table:\n", contingency_table)

# Perform chi-square test


chi2, p_val3, dof, expected = stats.chi2_contingency(contingency_table)
print(f"\nchi-square = {chi2:.4f}, p-value = {p_val3:.4f}") if p_val3 < 0.05:
print("Reject H0: Gender and preference for online learning are related\n") else:
print("Fail to reject H0: No significant relation between gender and preference\n")
Output

You might also like