0% found this document useful (0 votes)
5 views43 pages

FDS LAB

The document outlines the curriculum for a laboratory course titled 'Fundamentals of Data Science and Analytics' offered by the Department of Artificial Intelligence and Data Science at a college affiliated with Anna University. It includes course objectives, suggested experiments, and expected outcomes for students, emphasizing skills in data analysis and programming. Additionally, it provides a structure for recording practical work and includes various programming exercises related to data science using Python.

Uploaded by

sarathgaming007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views43 pages

FDS LAB

The document outlines the curriculum for a laboratory course titled 'Fundamentals of Data Science and Analytics' offered by the Department of Artificial Intelligence and Data Science at a college affiliated with Anna University. It includes course objectives, suggested experiments, and expected outcomes for students, emphasizing skills in data analysis and programming. Additionally, it provides a structure for recording practical work and includes various programming exercises related to data science using Python.

Uploaded by

sarathgaming007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

(Approved by AICTE & Affiliated to Anna University, Chennai)

Peruvoyal, (Near Kavaraipettai), Gummidipoondi Taluk,


Thiruvallur District-601206

DEPARTMENT OF ARITIFICIAL INTELLIGENCE AND DATA SCIENCE

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTIC

LABORATORY

NAME :

REGISTER NO :

SEMESTER :
NAME : ……………………………………………

REGISTER NUMBER : ………………………………………..….._

DEPARTMENT : ……………………………………………

SUBJECT CODE /TITLE : ………………..………..…………………

YEAR/SEM : ……………………………………………

DATE OF EXAMINATION : …………………………………..……….

Certified that this is the Bonafide record of practical work done by the aforesaid
student in the during the year .

Laboratory in charge Head of the Department

Internal Examiner External Examiner


DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

VISION :
To create knowledge pool in the field of computer science and engineering to empower the students to meet
the challenges of the society

MISSION :

 Prepare the students with strong fundamental concepts, analytical capabilities, programming and problem
solving skills.

 Bringing an Eco-System to provide new cutting edge technologies required to meet the challenges.
 Imparting necessary skills to become continuous learners in the field of Computer Science and Engineering
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTIC
COURSE OBJECTIVES:
• To understand the techniques and processes of data science
• To apply descriptive data analytics
• To visualize data for various applications
• To understand inferential data analytics
• To analysis and build predictive models from data
SUGESTED EXPERIMENTS
1.Working with Numpy arrays
2. Working with Pandas data frames
3. Develop a Python Program for Basic plots using Matplotlib
4. a Develop a Python Program for Frequency distributions
4.b Develop a Python Program for Averages
4.c Develop a Python Program for Variability
5.a Develop a Python Program for Normal Curves
5.b Develop a Python Program for Correlation and scatter plots
5.c Develop a Python Program for Correlation Coefficient
6. Develop a Python program for Simple Linear Regression
7. Develop a Python Program for Z-Test
8. Develop a Python Program for T-Test
9. Develop a Python Program for ANOVA
10. Building and Validating Linear Models
11. Building and Validating Logistic Models
12. Develop a Python Program for Time Series Analysis
COURSE OUTCOMES:
Upon successful completion of this course, the students will be able to:
CO1: Explain the data analytics pipeline
CO2: Describe and visualize data
CO3: Perform statistical inferences from data CO4:
Analyze the variance in the data
CO5: Build models for predictive analytics
CO’s- PO’s & PSO’s MAPPING
CONTENTS

S.No Name of the Experiment Page Date Signature


No.
Working with Numpy arrays 1
1.
4
2. Working with Pandas data frames
Develop a Python Program for Basic plots using 6
3. Matplotlib
11
4. a
Develop a Python Program for Frequency distributions
12
Develop a Python Program for Averages
4.b
13
Develop a Python Program for Variability
4.c
15
Develop a Python Program for Normal Curves
5.a
Develop a Python Program for Correlation and scatter 17
5.b plots
Develop a Python Program for Correlation Coefficient 19
5.c
21
Develop a Python program for Simple Linear Regression
6.
24
Develop a Python Program for Z-Test
7.
26
Develop a Python Program for T-Test
8.
28
Develop a Python Program for ANOVA
9.
30
Building and Validating Linear Models
10.
Building and Validating Logistic Models 33
11.
Develop a Python Program for Time Series Analysis 36
12.
Ex no: 1 Working with Numpy arrays
Date:

AIM:
To work with Numpy arrays
.
ALGORITHM:
Step1: Start
Step2: Import Numpy module
Step3: Print the basic characteristics and operations of array
Step4: Stop

PROGRAM:
import numpy as np
# Creating array object arr
= np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr)) #
Printing array dimensions (axes)
print("No. of dimensions: ",
arr.ndim) # Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size) # Printing type of
elements in array
print("Array stores elements of type: ", arr.dtype)

OUTPUT:

Array is of type: <class 'numpy.ndarray'>


No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32

1
Program to Perform Array Slicing

a=np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a) print("After slicing")
print(a[1:])

OUTPUT:
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]

Program to Perform Array Slicing


# array to begin with import numpy
as np a =
np.array([[1,2,3],[3,4,5],[4,5,6]])
print('Our array is:' ) print(a)
# this returns array of items in the second column
print('The items in the second column are:' )
print(a[...,1]) print('\n' )
# Now we will slice all items from the second row
print ('The items in the second row are:' )
print(a[1,...]) print('\n' )

# Now we will slice all items from column 1 onwards


print('The items column 1 onwards are:' )
print(a[...,1:])

2
OUTPUT:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]

RESULT:

Thus the working with Numpy arrays was executed and verified successfully.

3
Ex no: 2 Working with Pandas data frames
Date:

AIM:

To work with Pandas data frames

ALGORITHM:

Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM:

import numpy as np import


pandas as pd data =
np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])

print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0],
columns=data[0,1:]))
# Take a 2D array as input to your DataFrame
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))

# Take a dictionary as input to your DataFrame


my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(pd.DataFrame(my_dict))
# Take a DataFrame as input to your DataFrame my_df =
pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
print(pd.DataFrame(my_df))

4
# Take a Series as input to your DataFrame
my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United
States":"Washington", "Belgium":"Brussels"}) print(pd.DataFrame(my_series)) df
= pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))

# Use the `shape` property


print(df.shape)
# Or use the `len()` function with the `index` property print(len(df.index))

OUTPUT:
Col1 Col2
Row1 1 2
Row2 3 4
012
0 123
1 4 5 61 2 3
0 112
1 3 2 4A

0 4
1 5
2 6
3 7
0
United Kingdom London
India New Delhi
United States Washington
Belgium Brussels
(2, 3)
2

RESULT:
Thus the working with Pandas data frames was was executed and verified successfully.
5
Ex. No.:3 Develop a Python Program for Basic plots using Matplotlib
Date:

AIM:

To draw basic plots in Python program using Matplotlib

ALGORITHM:

Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop

PROGRAM:

# importing the required module import


matplotlib.pyplot as plt

# x axis values x
= [1,2,3]
# corresponding y axis values y
= [2,4,1]

# plotting the points plt.plot(x,


y)

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph plt.title('My


first graph!')

# function to show the plot plt.show()

6
OUTPUT:

PROGRAM:3B
import matplotlib.pyplot as plt a
= [1, 2, 3, 4, 5] b = [0, 0.6, 0.2,
15, 10, 8, 16, 21] plt.plot(a)

# o is for circles and r is #


for red plt.plot(b, "or")
plt.plot(list(range(0, 22,
3)))

# naming the x-axis


plt.xlabel('Day ->')

# naming the y-axis plt.ylabel('Temp


->')

c = [4, 2, 6, 8, 3, 20, 13, 15] plt.plot(c,


label = '4th Rep')

# get current axes command


ax = plt.gca()

# get command over the individual


# boundary line of the graph body
ax.spines['right'].set_visible(False

7
)
ax.spines['top'].set_visible(False)
# set the range or the bounds of # the
left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)

# set the interval by which #


the x-axis set the marks
plt.xticks(list(range(-3,
10)))

# set the intervals by which y-axis


# set the marks plt.yticks(list(range(-3,
20, 3)))

# legend denotes that what color


# signifies what ax.legend(['1st Rep', '2nd Rep', '3rd
Rep', '4th Rep'])

# annotate command helps to write


# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))

# gives a title to the Graph plt.title('All


Features Discussed')
plt.show()

OUTPUT:

8
PROGRAM:3c
import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5] b = [0, 0.6, 0.2,


15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15] #
use fig whenever u want the #
output in a new window also
# specify the window size you #
want ans to be displayed fig =
plt.figure(figsize =(10, 10))

# creating multiple plots in a


# single plot sub1 =
plt.subplot(2, 2, 1) sub2 =
plt.subplot(2, 2, 2) sub3 =
plt.subplot(2, 2, 3) sub4 =
plt.subplot(2, 2, 4)
sub1.plot(a, 'sb')

# sets how the display subplot # x


axis values advances by 1 # within
the specified range
sub1.set_xticks(list(range(0, 10, 1)))
sub1.set_title('1st Rep')

sub2.plot(b, 'or')

# sets how the display subplot x axis


# values advances by 2 within the #
specified range
sub2.set_xticks(list(range(0, 10,
2))) sub2.set_title('2nd Rep')

# can directly pass a list in the plot #


function instead adding the reference
sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1)))
sub3.set_title('3rd Rep')

sub4.plot(c, 'Dm')

# similarly we can set the ticks for #


the y-axis range(start(inclusive), #
end(exclusive), step)

9
sub4.set_yticks(list(range(0, 24, 2)))
sub4.set_title('4th Rep')

# without writing plt.show() no plot


# will be visible
plt.show()

OUTPUT:

RESULT:
Thus the basic plots using Matplotlib in Python program was executed and verified successfully.

10
Ex. No.:4a Develop a python program Frequency distributions
Date:

AIM:
To Count the frequency of occurrence of a word in a body of text.

ALGORITHM:

Step 1: Start the Program


Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body of text
Step 5: Print the result
Step 6: Stop the process
PROGRAM:

from nltk.tokenize import word_tokenize


from nltk.corpus import gutenberg sample
= gutenberg.raw("blake-poems.txt")

token = word_tokenize(sample) wlist


= []

for i in range(50):
wlist.append(token[i]) wordfreq =
[wlist.count(w) for w in

OUTPUT:
[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2), (OF', 3),
(INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1), (BOOK', 1), (of', 2),
(THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1), (Piping', 2), (down', 1),
(the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant', 1), (glee', 1), (,', 3),
(On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he', 1), (laughing', 1),
(said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

RESULT:
Thus the count the frequency of occurrence of a word in a body of text was executed and verified
successfully.

11
Ex. No.:4b Develop a Python Program for Averages
Date:

AIM:
To compute weighted averages in Python either defining your own functions or using Numpy

ALGORITHM:

Step 1: Start the Program


Step 2: Create the employees_salary table and save as .csv file
Step 3: Import packages (pandas and numpy) and the employees_salary table itself:
Step 4: Calculate weighted sum and average using Numpy Average() Function Step
5 : Stop the process

PROGRAM:

#Method Using Numpy Average() Function weighted_avg_m3 = round(average(

df['salary_p_year'], weights = df['employees_number']),2) weighted_avg_m3

OUTPUT:

44225.35

RESULT:

Thus the computation of weighted averages in Python either defining your own functions or using
Numpy was executed and verified successfully.

12
Ex. No.: 4c Develop a Python Program for Variability
Date:

AIM:
To write a python program to calculate the variance.

ALGORITHM:

Step 1: Start the Program


Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process
PROGRAM:
# Python code to demonstrate variance()
# function on varying range of data-types

# importing statistics module from


statistics import variance

# importing fractions as parameter values from


fractions import Fraction as fr

# tuple of a set of positive integers # numbers


are spread apart but not very much sample1
= (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers


sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers


# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers sample4 = (fr(1,


2), fr(2, 3), fr(3, 4), fr(5, 6), fr(7, 8)) # tuple of a set of
floating point values sample5 = (1.23, 1.45, 2.1, 2.2,
1.9)
# Print the variance of each samples print("Variance of
Sample1 is % s " %(variance(sample1))) print("Variance of
Sample2 is % s " %(variance(sample2))) print("Variance of

13
Sample3 is % s " %(variance(sample3))) print("Variance of
Sample4 is % s " %(variance(sample4))) print("Variance of
Sample5 is % s " %(variance(sample5)))

OUTPUT :

Variance of Sample 1 is 15.80952380952381


Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

RESULT:
Thus the computation for variance was executed and verified successfully.

14
Ex. No.:5a Develop a Python Program for Normal Curve

Date:

AIM:
To create a normal curve using python program.

ALGORITHM:
Step 1: Start the Program
Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

PROGRAM:

# import required libraries


from scipy.stats import norm
import numpy as np import
matplotlib.pyplot as plt
import seaborn as sb

# Creating the distribution data =


np.arange(1,10,0.01) pdf = norm.pdf(data ,
loc = 5.3 , scale = 1 )

#Visualizing the distribution

sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')

15
OUTPUT:

RESULT:
Thus the normal curve using python program was executed and verified successfully.

16
Ex. No.: 5b Develop a Python Program for Correlation and scatter plots

Date:

AIM:
To write a python program for correlation with scatter plot

ALGORITHM:

Step 1: Start the Program


Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: Plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

PROGRAM:

# Scatterplot and Correlations

# Data

import sklearn import numpy


as np import matpotlib.pyplot
as plt import pandas as pd

X=np.random.randn(100
) yl= 5*x + 9 y2=-5 * x
y3=np.random.randn(100
)

#Plot

plt.figure(figsize=(10,8), dpi=100
plt.scatter(x, yl, label=’yl’, color = ‘blue’)
plt.scatter(x, y2, label=’y2’, color = ‘red’)
plt.scatter(x, y3, label=’y3’, color = ‘green’)
plt.title(‘Scatterplot and Correlations’)
plt.legend() plt.show()

17
OUTPUT:

RESULT:
Thus the Correlation and scatter plots using python program was executed and verified
successfully.

18
Ex. No.: 5c Develop a Python Program for Correlation coefficient

Date:

AIM:
To write a python program to compute correlation coefficient.

ALGORITHM:

Step 1: Start the Program


Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5:Print the result
Step 6 : Stop the process

PROGRAM:
# Python Program to find correlation coefficient.
import math

# function that returns correlation coefficient.


def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0

i = 0 while i
<n:
# sum of elements of array X. sum_X
= sum_X + X[i]

# sum of elements of array Y. sum_Y


= sum_Y + Y[i]

# sum of X[i] * Y[i].


sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.

19
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i=i+1

# use formula for calculating correlation


# coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Find the size of array.


n = len(X)

# Function call to correlationCoefficient. print


('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

OUTPUT :

0.953463

RESULT:
Thus the computation for correlation coefficient was executed and verified successfully.

20
Ex. No.: 6 Develop a Python Program for Regression

Date:

AIM:
To write a python program for Simple Linear Regression

ALGORITHM:

Step 1: Start the Program


Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process

PROGRAM:

import numpy as np import


matplotlib.pyplot as plt

def estimate_coef(x, y):


# number of observations/points
n = np.size(x)

# mean of x and y
vector m_x =
np.mean(x) m_y =
np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x SS_xx
= np.sum(x*x) - n*m_x*m_x

# calculating regression
coefficients b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

21
def plot_regression_line(x, y, b): # plotting the
actual points as scatter plot plt.scatter(x,
y, color = "m",
marker = "o", s = 30)

# predicted response vector


y_pred = b[0] + b[1]*x

# plotting the regression line plt.plot(x,


y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()

def main():
# observations / data x = np.array([0, 1, 2,
3, 4, 5, 6, 7, 8, 9]) y = np.array([1, 3, 2, 5,
7, 8, 8, 9, 10, 12])

# estimating coefficients b =
estimate_coef(x, y) print("Estimated
coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line plot_regression_line(x,


y, b)

if name == " main ": main()

OUTPUT :

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

22
Graph:

RESULT:

Thus the computation for Regression was executed and verified successfully.

23
Ex. No.: 7 Develop a Python Program for z-test

Date:

AIM:
To write a python program for z-test

ALGORITHM:

Step 1: Start the Program


Step 2: Import numpy and ztest package

Step 3: Define mean_ iq, sd_iq, alpha, null_mean


Step 4: Calculate data value
Step 5: Calculate z test
Step 6: Print whether to accept or Reject Null hypothesis
Step 7: Print the result
Step 8: Stop the process

PROGRAM:

# imports import math import numpy as np


from numpy.random import randn from
statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110 and sd
15
# similar to the IQ scores data we assume above
mean_iq = 110 sd_iq =
15/math.sqrt(50) alpha =
0.05 null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))


# now we perform the test. In this function, we passed data, in
the value parameter
# we passed mean value in the null hypothesis, in alternative
hypothesis we check whether the

24
# mean is larger
ztest_Score,p_value=ztest(data,value=null_mean,alternative='la
rger')
# the function outputs a p_value and z-score corresponding to that
value, we compare the
# p-value with alpha, if it is greater than alpha then we do
not null hypothesis # else we reject it. if(p_value < alpha):
print("Reject Null Hypothesis") else:
print("Fail to Reject Null Hypothesis")

OUTPUT :

Reject Null Hypothesis

RESULT:
Thus the computation for z-test was executed and verified successfully.

25
Ex. No.: 8 Develop a Python Program for t-test

Date:

AIM:
To write a python program for t-test.

ALGORITHM

Step 1: Start the Program


Step 2: Import numpy and t test package

Step 3: Define mean_ iq, sd_iq, alpha, null_mean


Step 4: Calculate data value
Step 5: Calculate t test
Step 6: Print whether to accept or Reject Null hypothesis
Step 7: Print the result
Step 8: Stop the process

PROGRAM:

# Importing the required libraries and packages


import numpy as np from
scipy import stats
# Defining two random distributions
# Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1 x
= np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1
y = np.random.randn(N) #
Calculating the Standard Deviation #
Calculating the variance to get the
standard deviation var_x =
x.var(ddof = 1) var_y = y.var(ddof =
1)
# Standard Deviation

26
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD)
# Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom dof
=2*N-2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf( tval, df = dof)
print("t = " + str(tval)) print("p = "
+ str(2 * pval))
## Cross Checking using the internal function from SciPy Package
tval2, pval2 = stats.ttest_ind(x, y) print("t
= " + str(tval2))
print("p = " + str(pval2))

Output:
Standard Deviation = 0.7642398582227466
t = 4.87688162540348 p =
0.0001212767169695983 t =
4.876881625403479
p = 0.00012127671696957205

RESULT:
Thus the computation for t-test was executed and verified successfully.

27
Ex. No.: 9 Develop a Python Program for ANOVA

Date:

AIM:
To write a python program for ANOVA.

ALGORITHM
Step 1: Start the Program
Step 2: Import numpy and dplyr package
Step 3:Setup Null Hypothesis and Alternate Hypothesis
Step 4: Calculate test statistics using aov function Step
5: Calculate F-Critical Value
Step 6: Compare test statistics with F-Critical value
Step 7: Print the result
Step 8: Stop the process

PROGRAM:
# Installing the package
install.packages("dplyr")
# Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu = mu01 = mu02 (There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function

mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))


summary(mtcars_aov)
# Step 3: Calculate F-Critical Value

28
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value
# and conclude test p <alpha, Reject Null Hypothesis

RESULT:

Thus the computation for ANOVA was executed and verified successfully.

29
Ex. No.: 10 Develop a Python Program for building and validating linear
models
Date:

AIM:
To write a python program for building and validating data models.

ALGORITHM:

Step 1: Start the Program


Step 2: Import numpy, matplotlib, seaborn packages.
Step 3: Load a dataset
Step 4: Check the keys
Step 5: Print the attribute information
Step 6: Print the heat map
Step 7: Stop the process

PROGRAM:

# Importing the necessary


libraries import pandas as pd
import numpy as np import
matplotlib.pyplot as plt import
seaborn as sns
from sklearn.datasets import
load_boston
sns.set(style=”ticks”,color_codes=True)
plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston()
#You can check those keys with the following code.
print(boston.keys())
# The output will be as follow: dict_keys([‘data’,
‘target’, ‘feature_names’, ‘DESCR’,

30
‘filename’])
print(boston.DESCR)
#You will find these details in output:
Attribute Information (in order):
— CRIM per capita crime rate by town
— ZN proportion of residential land zoned for lots over 25,000 sq.ft.
— INDUS proportion of non-retail business acres per town
— CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
— NOX nitric oxides concentration (parts per 10 million)
— RM average number of rooms per dwelling
— AGE proportion of owner-occupied units built prior to 1940
— DIS weighted distances to five Boston employment centres
— RAD index of accessibility to radial highways
— TAX full-value property-tax rate per $10,000

— PTRATIO pupil-teacher ratio by town


— B 1000 (Bk — 0.63)² where Bk is the proportion of blacks by town
— LSTAT % lower status of the population
— MEDV Median value of owner-occupied homes in $1000’s :Missing Attribute
Values: None

df=pd.DataFrame(boston.data,columns=boston.feature_names)
df.head()
# print the columns present in the dataset
print(df.columns)
# print the top 5 rows in the dataset
print(df.head())

31
OUTPUT:

RESULT:

Thus the Python Program for building and validating linear models was executed and verified
successfully.

32
Ex. No.: 11 Develop a Python Program for building and validating logistic
models
Date:
AIM:
To write a python program for building and validating logistic models.

ALGORITHM
Step 1: Start the Program
Step 2: Import statsmodel and pandas package
Step 3: Load a dataset

Step 4: Train the dataset.


Step 5: Predicting on new data
Step 6: Print the Confusion Matrix
Step 7: Stop the process

PROGRAM:

# importing libraries import statsmodels.api as sm


import pandas as pd # loading the training dataset
df = pd.read_csv('logit_train1.csv', index_col = 0)
# defining the dependent and independent
variables Xtrain = df[['gmat', 'gpa',
'work_experience']] ytrain = df[['admitted']]
# building the model and fitting the data log_reg
= sm.Logit(ytrain, Xtrain).fit() OUTPUT :
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8 # printing the
summary table
print(log_reg.summary())

33
Predicting on New Data : # loading the testing
dataset df = pd.read_csv('logit_test1.csv',
index_col = 0) # defining the dependent and
independent variables Xtest = df[['gmat', 'gpa',
'work_experience']] ytest = df['admitted']
# performing predictions on the test
dataset yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
# comparing original and predicted values of
y print('Actual values', list(ytest.values))
print('Predictions :', prediction)

OUTPUT :
Optimization terminated successfully.

34
Current function value: 0.352707 Iterations
8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

Testing the accuracy of the model : from


sklearn.metrics import (confusion_matrix,
accuracy_score)
# confusion
matrix
cm = confusion_matrix(ytest, prediction) print
("Confusion Matrix : \n", cm) # accuracy score of the
model print('Test accuracy = ', accuracy_score(ytest,
prediction))

OUTPUT :
Confusion Matrix :
[[6 0]
[2 2]]
Test accuracy = 0.8

RESULT:

Thus the Python Program for building and validating logistic models was executed and verified
successfully.

35
Ex. No.: 12 Develop a Python Program for Time Series Analysis
Date:

AIM:
To write a python program for Time Series Analysis.

ALGORITHM:

Step 1: Start the Program


Step 2: Import numpy, matpotlib, itertools package
Step 3: Load a furniture dataset

Step 4: Train the dataset.

Step 5: Data Preprocessing by removing columns, missing values etc.,

Step 6: Visualize the furniture sales Time Series Data.


Step 7: Stop the process
PROGRAM:
# We are using Superstore sales data .
import warnings import itertools
import numpy as np import
matplotlib.pyplot as plt
warnings.filterwarnings("ignore
") plt.style.use('fivethirtyeight')
import pandas as pd import
statsmodels.api as sm
import matplotlibmatplotlib.rcParams['axes.labelsize'] =
14 matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'
# We start from time series analysis and forecasting for furniture sales.
df=pd.read_excel("Superstore.xls") furniture = df.loc[df['Category']
== 'Furniture'] # A good 4-year furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max() Timestamp(‘2014–01–
06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)

36
# Data Preprocessing
# This step includes removing columns we do not need, check missing values, aggregate sales by date
etc., cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID',
'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region',
'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount',
'Profit'] furniture.drop(cols,axis=1,inplace=True)
furniture=furniture.sort_values('Order Date') furniture.isnull().sum()
furniture=furniture.groupby('OrderDate')['Sales'].sum().reset_index()

# Indexing with Time Series Data


furniture=furniture.set_index('OrderDate')
furniture.index

# We will use the averages daily sales value for that month instead, and we are using
the start of each month as the timestamp. y = furniture
['Sales'].resample('MS').mean() y['2017':] # Have a quick
peek 2017 furniture sales data.

37
OUTPUT:

Visualizing Furniture Sales Time Series


Data y.plot (figsize=(15,6)) plt.show()

RESULT:
Thus the Python program for Time Series Analysis was executed and verified successfully.

38

You might also like