0% found this document useful (0 votes)

5 views43 pages

FDS LAB

The document outlines the curriculum for a laboratory course titled 'Fundamentals of Data Science and Analytics' offered by the Department of Artificial Intelligence and Data Science at a college affiliated with Anna University. It includes course objectives, suggested experiments, and expected outcomes for students, emphasizing skills in data analysis and programming. Additionally, it provides a structure for recording practical work and includes various programming exercises related to data science using Python.

Uploaded by

sarathgaming007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views43 pages

FDS LAB

Uploaded by

sarathgaming007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

(Approved by AICTE & Affiliated to Anna University, Chennai)

Peruvoyal, (Near Kavaraipettai), Gummidipoondi Taluk,

Thiruvallur District-601206

DEPARTMENT OF ARITIFICIAL INTELLIGENCE AND DATA SCIENCE

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTIC

LABORATORY

NAME :

SEMESTER :
NAME : ……………………………………………

REGISTER NUMBER : ………………………………………..….._

DEPARTMENT : ……………………………………………

SUBJECT CODE /TITLE : ………………..………..…………………

YEAR/SEM : ……………………………………………

DATE OF EXAMINATION : …………………………………..……….

Certified that this is the Bonafide record of practical work done by the aforesaid
student in the during the year .

Laboratory in charge Head of the Department

Internal Examiner External Examiner

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

VISION :
To create knowledge pool in the field of computer science and engineering to empower the students to meet
the challenges of the society

MISSION :

 Prepare the students with strong fundamental concepts, analytical capabilities, programming and problem
solving skills.

 Bringing an Eco-System to provide new cutting edge technologies required to meet the challenges.
 Imparting necessary skills to become continuous learners in the field of Computer Science and Engineering
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTIC
COURSE OBJECTIVES:
• To understand the techniques and processes of data science
• To apply descriptive data analytics
• To visualize data for various applications
• To understand inferential data analytics
• To analysis and build predictive models from data
SUGESTED EXPERIMENTS
1.Working with Numpy arrays
2. Working with Pandas data frames
3. Develop a Python Program for Basic plots using Matplotlib
4. a Develop a Python Program for Frequency distributions
4.b Develop a Python Program for Averages
4.c Develop a Python Program for Variability
5.a Develop a Python Program for Normal Curves
5.b Develop a Python Program for Correlation and scatter plots
5.c Develop a Python Program for Correlation Coefficient
6. Develop a Python program for Simple Linear Regression
7. Develop a Python Program for Z-Test
8. Develop a Python Program for T-Test
9. Develop a Python Program for ANOVA
10. Building and Validating Linear Models
11. Building and Validating Logistic Models
12. Develop a Python Program for Time Series Analysis
COURSE OUTCOMES:
Upon successful completion of this course, the students will be able to:
CO1: Explain the data analytics pipeline
CO2: Describe and visualize data
CO3: Perform statistical inferences from data CO4:
Analyze the variance in the data
CO5: Build models for predictive analytics
CO’s- PO’s & PSO’s MAPPING
CONTENTS

S.No Name of the Experiment Page Date Signature

No.
Working with Numpy arrays 1
1.
4
2. Working with Pandas data frames
Develop a Python Program for Basic plots using 6
3. Matplotlib
11
4. a
Develop a Python Program for Frequency distributions
12
Develop a Python Program for Averages
4.b
13
Develop a Python Program for Variability
4.c
15
Develop a Python Program for Normal Curves
5.a
Develop a Python Program for Correlation and scatter 17
5.b plots
Develop a Python Program for Correlation Coefficient 19
5.c
21
Develop a Python program for Simple Linear Regression
6.
24
Develop a Python Program for Z-Test
7.
26
Develop a Python Program for T-Test
8.
28
Develop a Python Program for ANOVA
9.
30
Building and Validating Linear Models
10.
Building and Validating Logistic Models 33
11.
Develop a Python Program for Time Series Analysis 36
12.
Ex no: 1 Working with Numpy arrays
Date:

AIM:
To work with Numpy arrays
.
ALGORITHM:
Step1: Start
Step2: Import Numpy module
Step3: Print the basic characteristics and operations of array
Step4: Stop

PROGRAM:
import numpy as np
# Creating array object arr
= np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr)) #
Printing array dimensions (axes)
print("No. of dimensions: ",
arr.ndim) # Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size) # Printing type of
elements in array
print("Array stores elements of type: ", arr.dtype)

OUTPUT:

Array is of type: <class 'numpy.ndarray'>

No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32

1
Program to Perform Array Slicing

a=np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a) print("After slicing")
print(a[1:])

OUTPUT:
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]

Program to Perform Array Slicing

# array to begin with import numpy
as np a =
np.array([[1,2,3],[3,4,5],[4,5,6]])
print('Our array is:' ) print(a)
# this returns array of items in the second column
print('The items in the second column are:' )
print(a[...,1]) print('\n' )
# Now we will slice all items from the second row
print ('The items in the second row are:' )
print(a[1,...]) print('\n' )

# Now we will slice all items from column 1 onwards

print('The items column 1 onwards are:' )
print(a[...,1:])

2
OUTPUT:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]

RESULT:

Thus the working with Numpy arrays was executed and verified successfully.

3
Ex no: 2 Working with Pandas data frames
Date:

AIM:

To work with Pandas data frames

ALGORITHM:

Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM:

import numpy as np import

pandas as pd data =
np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])

print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0],
columns=data[0,1:]))
# Take a 2D array as input to your DataFrame
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))

# Take a dictionary as input to your DataFrame

my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(pd.DataFrame(my_dict))
# Take a DataFrame as input to your DataFrame my_df =
pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
print(pd.DataFrame(my_df))

4
# Take a Series as input to your DataFrame
my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United
States":"Washington", "Belgium":"Brussels"}) print(pd.DataFrame(my_series)) df
= pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))

# Use the `shape` property

print(df.shape)
# Or use the `len()` function with the `index` property print(len(df.index))

OUTPUT:
Col1 Col2
Row1 1 2
Row2 3 4
012
0 123
1 4 5 61 2 3
0 112
1 3 2 4A

0 4
1 5
2 6
3 7
0
United Kingdom London
India New Delhi
United States Washington
Belgium Brussels
(2, 3)
2

RESULT:
Thus the working with Pandas data frames was was executed and verified successfully.
5
Ex. No.:3 Develop a Python Program for Basic plots using Matplotlib
Date:

AIM:

To draw basic plots in Python program using Matplotlib

ALGORITHM:

Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop

PROGRAM:

# importing the required module import

matplotlib.pyplot as plt

# x axis values x
= [1,2,3]
# corresponding y axis values y
= [2,4,1]

# plotting the points plt.plot(x,

# naming the x axis

plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph plt.title('My

first graph!')

# function to show the plot plt.show()

6
OUTPUT:

PROGRAM:3B
import matplotlib.pyplot as plt a
= [1, 2, 3, 4, 5] b = [0, 0.6, 0.2,
15, 10, 8, 16, 21] plt.plot(a)

# o is for circles and r is #

for red plt.plot(b, "or")
plt.plot(list(range(0, 22,
3)))

# naming the x-axis

plt.xlabel('Day ->')

# naming the y-axis plt.ylabel('Temp

->')

c = [4, 2, 6, 8, 3, 20, 13, 15] plt.plot(c,

label = '4th Rep')

# get current axes command

ax = plt.gca()

# get command over the individual

# boundary line of the graph body
ax.spines['right'].set_visible(False

7
)
ax.spines['top'].set_visible(False)
# set the range or the bounds of # the
left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)

# set the interval by which #

the x-axis set the marks
plt.xticks(list(range(-3,
10)))

# set the intervals by which y-axis

# set the marks plt.yticks(list(range(-3,
20, 3)))

# legend denotes that what color

# signifies what ax.legend(['1st Rep', '2nd Rep', '3rd
Rep', '4th Rep'])

# annotate command helps to write

# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))

# gives a title to the Graph plt.title('All

Features Discussed')
plt.show()

OUTPUT:

8
PROGRAM:3c
import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5] b = [0, 0.6, 0.2,

15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15] #
use fig whenever u want the #
output in a new window also
# specify the window size you #
want ans to be displayed fig =
plt.figure(figsize =(10, 10))

# creating multiple plots in a

# single plot sub1 =
plt.subplot(2, 2, 1) sub2 =
plt.subplot(2, 2, 2) sub3 =
plt.subplot(2, 2, 3) sub4 =
plt.subplot(2, 2, 4)
sub1.plot(a, 'sb')

# sets how the display subplot # x

axis values advances by 1 # within
the specified range
sub1.set_xticks(list(range(0, 10, 1)))
sub1.set_title('1st Rep')

sub2.plot(b, 'or')

# sets how the display subplot x axis

# values advances by 2 within the #
specified range
sub2.set_xticks(list(range(0, 10,
2))) sub2.set_title('2nd Rep')

# can directly pass a list in the plot #

function instead adding the reference
sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1)))
sub3.set_title('3rd Rep')

sub4.plot(c, 'Dm')

# similarly we can set the ticks for #

the y-axis range(start(inclusive), #
end(exclusive), step)

9
sub4.set_yticks(list(range(0, 24, 2)))
sub4.set_title('4th Rep')

# without writing plt.show() no plot

# will be visible
plt.show()

OUTPUT:

RESULT:
Thus the basic plots using Matplotlib in Python program was executed and verified successfully.

10
Ex. No.:4a Develop a python program Frequency distributions
Date:

AIM:
To Count the frequency of occurrence of a word in a body of text.

ALGORITHM:

Step 1: Start the Program

Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body of text
Step 5: Print the result
Step 6: Stop the process
PROGRAM:

from nltk.tokenize import word_tokenize

from nltk.corpus import gutenberg sample
= gutenberg.raw("blake-poems.txt")

token = word_tokenize(sample) wlist

= []

for i in range(50):
wlist.append(token[i]) wordfreq =
[wlist.count(w) for w in

OUTPUT:
[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2), (OF', 3),
(INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1), (BOOK', 1), (of', 2),
(THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1), (Piping', 2), (down', 1),
(the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant', 1), (glee', 1), (,', 3),
(On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he', 1), (laughing', 1),
(said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

RESULT:
Thus the count the frequency of occurrence of a word in a body of text was executed and verified
successfully.

11
Ex. No.:4b Develop a Python Program for Averages
Date:

AIM:
To compute weighted averages in Python either defining your own functions or using Numpy

ALGORITHM:

Step 1: Start the Program

Step 2: Create the employees_salary table and save as .csv file
Step 3: Import packages (pandas and numpy) and the employees_salary table itself:
Step 4: Calculate weighted sum and average using Numpy Average() Function Step
5 : Stop the process

PROGRAM:

#Method Using Numpy Average() Function weighted_avg_m3 = round(average(

df['salary_p_year'], weights = df['employees_number']),2) weighted_avg_m3

OUTPUT:

44225.35

RESULT:

Thus the computation of weighted averages in Python either defining your own functions or using
Numpy was executed and verified successfully.

12
Ex. No.: 4c Develop a Python Program for Variability
Date:

AIM:
To write a python program to calculate the variance.

ALGORITHM:

Step 1: Start the Program

Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process
PROGRAM:
# Python code to demonstrate variance()
# function on varying range of data-types

# importing statistics module from

statistics import variance

# importing fractions as parameter values from

fractions import Fraction as fr

# tuple of a set of positive integers # numbers

are spread apart but not very much sample1
= (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers sample4 = (fr(1,

2), fr(2, 3), fr(3, 4), fr(5, 6), fr(7, 8)) # tuple of a set of
floating point values sample5 = (1.23, 1.45, 2.1, 2.2,
1.9)
# Print the variance of each samples print("Variance of
Sample1 is % s " %(variance(sample1))) print("Variance of
Sample2 is % s " %(variance(sample2))) print("Variance of

13
Sample3 is % s " %(variance(sample3))) print("Variance of
Sample4 is % s " %(variance(sample4))) print("Variance of
Sample5 is % s " %(variance(sample5)))

OUTPUT :

Variance of Sample 1 is 15.80952380952381

Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

RESULT:
Thus the computation for variance was executed and verified successfully.

14
Ex. No.:5a Develop a Python Program for Normal Curve

Date:

AIM:
To create a normal curve using python program.

ALGORITHM:
Step 1: Start the Program
Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

PROGRAM:

# import required libraries

from scipy.stats import norm
import numpy as np import
matplotlib.pyplot as plt
import seaborn as sb

# Creating the distribution data =

np.arange(1,10,0.01) pdf = norm.pdf(data ,
loc = 5.3 , scale = 1 )

#Visualizing the distribution

sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')

15
OUTPUT:

RESULT:
Thus the normal curve using python program was executed and verified successfully.

16
Ex. No.: 5b Develop a Python Program for Correlation and scatter plots

Date:

AIM:
To write a python program for correlation with scatter plot

ALGORITHM:

Step 1: Start the Program

Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: Plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

PROGRAM:

# Scatterplot and Correlations

# Data

import sklearn import numpy

as np import matpotlib.pyplot
as plt import pandas as pd

X=np.random.randn(100
) yl= 5*x + 9 y2=-5 * x
y3=np.random.randn(100
)

#Plot

plt.figure(figsize=(10,8), dpi=100
plt.scatter(x, yl, label=’yl’, color = ‘blue’)
plt.scatter(x, y2, label=’y2’, color = ‘red’)
plt.scatter(x, y3, label=’y3’, color = ‘green’)
plt.title(‘Scatterplot and Correlations’)
plt.legend() plt.show()

17
OUTPUT:

RESULT:
Thus the Correlation and scatter plots using python program was executed and verified
successfully.

18
Ex. No.: 5c Develop a Python Program for Correlation coefficient

Date:

AIM:
To write a python program to compute correlation coefficient.

ALGORITHM:

Step 1: Start the Program

Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5:Print the result
Step 6 : Stop the process

PROGRAM:
# Python Program to find correlation coefficient.
import math

# function that returns correlation coefficient.

def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0

i = 0 while i
<n:
# sum of elements of array X. sum_X
= sum_X + X[i]

# sum of elements of array Y. sum_Y

= sum_Y + Y[i]

# sum of X[i] * Y[i].

sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.

19
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i=i+1

# use formula for calculating correlation

# coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Find the size of array.

n = len(X)

# Function call to correlationCoefficient. print

('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

OUTPUT :

0.953463

RESULT:
Thus the computation for correlation coefficient was executed and verified successfully.

20
Ex. No.: 6 Develop a Python Program for Regression

Date:

AIM:
To write a python program for Simple Linear Regression

ALGORITHM:

Step 1: Start the Program

Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process

PROGRAM:

import numpy as np import

matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x and y
vector m_x =
np.mean(x) m_y =
np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x SS_xx
= np.sum(x*x) - n*m_x*m_x

# calculating regression
coefficients b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

21
def plot_regression_line(x, y, b): # plotting the
actual points as scatter plot plt.scatter(x,
y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line plt.plot(x,

y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data x = np.array([0, 1, 2,
3, 4, 5, 6, 7, 8, 9]) y = np.array([1, 3, 2, 5,
7, 8, 8, 9, 10, 12])

# estimating coefficients b =
estimate_coef(x, y) print("Estimated
coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line plot_regression_line(x,

y, b)

if name == " main ": main()

OUTPUT :

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

22
Graph:

RESULT:

Thus the computation for Regression was executed and verified successfully.

23
Ex. No.: 7 Develop a Python Program for z-test

Date:

AIM:
To write a python program for z-test

ALGORITHM:

Step 1: Start the Program

Step 2: Import numpy and ztest package

Step 3: Define mean_ iq, sd_iq, alpha, null_mean

Step 4: Calculate data value
Step 5: Calculate z test
Step 6: Print whether to accept or Reject Null hypothesis
Step 7: Print the result
Step 8: Stop the process

PROGRAM:

# imports import math import numpy as np

from numpy.random import randn from
statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110 and sd
15
# similar to the IQ scores data we assume above
mean_iq = 110 sd_iq =
15/math.sqrt(50) alpha =
0.05 null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data, in
the value parameter
# we passed mean value in the null hypothesis, in alternative
hypothesis we check whether the

24
# mean is larger
ztest_Score,p_value=ztest(data,value=null_mean,alternative='la
rger')
# the function outputs a p_value and z-score corresponding to that
value, we compare the
# p-value with alpha, if it is greater than alpha then we do
not null hypothesis # else we reject it. if(p_value < alpha):
print("Reject Null Hypothesis") else:
print("Fail to Reject Null Hypothesis")

OUTPUT :

Reject Null Hypothesis

RESULT:
Thus the computation for z-test was executed and verified successfully.

25
Ex. No.: 8 Develop a Python Program for t-test

Date:

AIM:
To write a python program for t-test.

ALGORITHM

Step 1: Start the Program

Step 2: Import numpy and t test package

Step 3: Define mean_ iq, sd_iq, alpha, null_mean

Step 4: Calculate data value
Step 5: Calculate t test
Step 6: Print whether to accept or Reject Null hypothesis
Step 7: Print the result
Step 8: Stop the process

PROGRAM:

# Importing the required libraries and packages

import numpy as np from
scipy import stats
# Defining two random distributions
# Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1 x
= np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1
y = np.random.randn(N) #
Calculating the Standard Deviation #
Calculating the variance to get the
standard deviation var_x =
x.var(ddof = 1) var_y = y.var(ddof =
1)
# Standard Deviation

26
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD)
# Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom dof
=2*N-2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf( tval, df = dof)
print("t = " + str(tval)) print("p = "
+ str(2 * pval))
## Cross Checking using the internal function from SciPy Package
tval2, pval2 = stats.ttest_ind(x, y) print("t
= " + str(tval2))
print("p = " + str(pval2))

Output:
Standard Deviation = 0.7642398582227466
t = 4.87688162540348 p =
0.0001212767169695983 t =
4.876881625403479
p = 0.00012127671696957205

RESULT:
Thus the computation for t-test was executed and verified successfully.

27
Ex. No.: 9 Develop a Python Program for ANOVA

Date:

AIM:
To write a python program for ANOVA.

ALGORITHM
Step 1: Start the Program
Step 2: Import numpy and dplyr package
Step 3:Setup Null Hypothesis and Alternate Hypothesis
Step 4: Calculate test statistics using aov function Step
5: Calculate F-Critical Value
Step 6: Compare test statistics with F-Critical value
Step 7: Print the result
Step 8: Stop the process

PROGRAM:
# Installing the package
install.packages("dplyr")
# Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu = mu01 = mu02 (There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function

mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))

summary(mtcars_aov)
# Step 3: Calculate F-Critical Value

28
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value
# and conclude test p <alpha, Reject Null Hypothesis

RESULT:

Thus the computation for ANOVA was executed and verified successfully.

29
Ex. No.: 10 Develop a Python Program for building and validating linear
models
Date:

AIM:
To write a python program for building and validating data models.

ALGORITHM:

Step 1: Start the Program

Step 2: Import numpy, matplotlib, seaborn packages.
Step 3: Load a dataset
Step 4: Check the keys
Step 5: Print the attribute information
Step 6: Print the heat map
Step 7: Stop the process

PROGRAM:

# Importing the necessary

libraries import pandas as pd
import numpy as np import
matplotlib.pyplot as plt import
seaborn as sns
from sklearn.datasets import
load_boston
sns.set(style=”ticks”,color_codes=True)
plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston()
#You can check those keys with the following code.
print(boston.keys())
# The output will be as follow: dict_keys([‘data’,
‘target’, ‘feature_names’, ‘DESCR’,

30
‘filename’])
print(boston.DESCR)
#You will find these details in output:
Attribute Information (in order):
— CRIM per capita crime rate by town
— ZN proportion of residential land zoned for lots over 25,000 sq.ft.
— INDUS proportion of non-retail business acres per town
— CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
— NOX nitric oxides concentration (parts per 10 million)
— RM average number of rooms per dwelling
— AGE proportion of owner-occupied units built prior to 1940
— DIS weighted distances to five Boston employment centres
— RAD index of accessibility to radial highways
— TAX full-value property-tax rate per $10,000

— PTRATIO pupil-teacher ratio by town

— B 1000 (Bk — 0.63)² where Bk is the proportion of blacks by town
— LSTAT % lower status of the population
— MEDV Median value of owner-occupied homes in $1000’s :Missing Attribute
Values: None

df=pd.DataFrame(boston.data,columns=boston.feature_names)
df.head()
# print the columns present in the dataset
print(df.columns)
# print the top 5 rows in the dataset
print(df.head())

31
OUTPUT:

RESULT:

Thus the Python Program for building and validating linear models was executed and verified
successfully.

32
Ex. No.: 11 Develop a Python Program for building and validating logistic
models
Date:
AIM:
To write a python program for building and validating logistic models.

ALGORITHM
Step 1: Start the Program
Step 2: Import statsmodel and pandas package
Step 3: Load a dataset

Step 4: Train the dataset.

Step 5: Predicting on new data
Step 6: Print the Confusion Matrix
Step 7: Stop the process

PROGRAM:

# importing libraries import statsmodels.api as sm

import pandas as pd # loading the training dataset
df = pd.read_csv('logit_train1.csv', index_col = 0)
# defining the dependent and independent
variables Xtrain = df[['gmat', 'gpa',
'work_experience']] ytrain = df[['admitted']]
# building the model and fitting the data log_reg
= sm.Logit(ytrain, Xtrain).fit() OUTPUT :
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8 # printing the
summary table
print(log_reg.summary())

33
Predicting on New Data : # loading the testing
dataset df = pd.read_csv('logit_test1.csv',
index_col = 0) # defining the dependent and
independent variables Xtest = df[['gmat', 'gpa',
'work_experience']] ytest = df['admitted']
# performing predictions on the test
dataset yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
# comparing original and predicted values of
y print('Actual values', list(ytest.values))
print('Predictions :', prediction)

OUTPUT :
Optimization terminated successfully.

34
Current function value: 0.352707 Iterations
8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

Testing the accuracy of the model : from

sklearn.metrics import (confusion_matrix,
accuracy_score)
# confusion
matrix
cm = confusion_matrix(ytest, prediction) print
("Confusion Matrix : \n", cm) # accuracy score of the
model print('Test accuracy = ', accuracy_score(ytest,
prediction))

OUTPUT :
Confusion Matrix :
[[6 0]
[2 2]]
Test accuracy = 0.8

RESULT:

Thus the Python Program for building and validating logistic models was executed and verified
successfully.

35
Ex. No.: 12 Develop a Python Program for Time Series Analysis
Date:

AIM:
To write a python program for Time Series Analysis.

ALGORITHM:

Step 1: Start the Program

Step 2: Import numpy, matpotlib, itertools package
Step 3: Load a furniture dataset

Step 4: Train the dataset.

Step 5: Data Preprocessing by removing columns, missing values etc.,

Step 6: Visualize the furniture sales Time Series Data.

Step 7: Stop the process
PROGRAM:
# We are using Superstore sales data .
import warnings import itertools
import numpy as np import
matplotlib.pyplot as plt
warnings.filterwarnings("ignore
") plt.style.use('fivethirtyeight')
import pandas as pd import
statsmodels.api as sm
import matplotlibmatplotlib.rcParams['axes.labelsize'] =
14 matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'
# We start from time series analysis and forecasting for furniture sales.
df=pd.read_excel("Superstore.xls") furniture = df.loc[df['Category']
== 'Furniture'] # A good 4-year furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max() Timestamp(‘2014–01–
06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)

36
# Data Preprocessing
# This step includes removing columns we do not need, check missing values, aggregate sales by date
etc., cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID',
'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region',
'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount',
'Profit'] furniture.drop(cols,axis=1,inplace=True)
furniture=furniture.sort_values('Order Date') furniture.isnull().sum()
furniture=furniture.groupby('OrderDate')['Sales'].sum().reset_index()

# Indexing with Time Series Data

furniture=furniture.set_index('OrderDate')
furniture.index

# We will use the averages daily sales value for that month instead, and we are using
the start of each month as the timestamp. y = furniture
['Sales'].resample('MS').mean() y['2017':] # Have a quick
peek 2017 furniture sales data.

37
OUTPUT:

Visualizing Furniture Sales Time Series

Data y.plot (figsize=(15,6)) plt.show()

RESULT:
Thus the Python program for Time Series Analysis was executed and verified successfully.

Data Analytics Using Python Lab Manual
50% (2)
Data Analytics Using Python Lab Manual
8 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
FDSA LAB MANUAL
No ratings yet
FDSA LAB MANUAL
53 pages
Fundamentals of Data science Lab manual new
No ratings yet
Fundamentals of Data science Lab manual new
33 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
CS3361-DATA SCIENCE LAB MANUAL
No ratings yet
CS3361-DATA SCIENCE LAB MANUAL
44 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
FODS_LAB_MANUAL
No ratings yet
FODS_LAB_MANUAL
26 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
FDS_LAB_MANUAL (1)
No ratings yet
FDS_LAB_MANUAL (1)
62 pages
ML MANUAL
No ratings yet
ML MANUAL
21 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
NumPy and Pandas (1)
No ratings yet
NumPy and Pandas (1)
12 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
11th PGM
No ratings yet
11th PGM
9 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
CS3361-Data Science Lab Manual - B.rethina Kumar
No ratings yet
CS3361-Data Science Lab Manual - B.rethina Kumar
36 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
Dsa Lab Manual Inserting Pages
No ratings yet
Dsa Lab Manual Inserting Pages
6 pages
data science
No ratings yet
data science
42 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
final dev record
No ratings yet
final dev record
49 pages
FDS Record
No ratings yet
FDS Record
59 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
27 pages
Report
No ratings yet
Report
18 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
ML3_Data_Analysis
No ratings yet
ML3_Data_Analysis
80 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
FINAL FDS MANUAL print
No ratings yet
FINAL FDS MANUAL print
55 pages
fds-fundamentals-of-data-science-laboratory
No ratings yet
fds-fundamentals-of-data-science-laboratory
53 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
Fds PDF
No ratings yet
Fds PDF
58 pages
FDA Lab Manual Final
No ratings yet
FDA Lab Manual Final
42 pages
549608474 Data Analytics Using Python Lab Manual
No ratings yet
549608474 Data Analytics Using Python Lab Manual
8 pages
DSF LAB EXP FULL (1) (1)
No ratings yet
DSF LAB EXP FULL (1) (1)
88 pages
Fdsa Record Ai&Ds
No ratings yet
Fdsa Record Ai&Ds
26 pages
dsa-lab-manual (1)
No ratings yet
dsa-lab-manual (1)
72 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
Batch2_FDS_printout
No ratings yet
Batch2_FDS_printout
38 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Dsa Record-1
No ratings yet
Dsa Record-1
153 pages
Data Science Record
No ratings yet
Data Science Record
44 pages
FDS Lab Manual R21
No ratings yet
FDS Lab Manual R21
47 pages
Python For Data Analysis
67% (3)
Python For Data Analysis
39 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
dfs manual
No ratings yet
dfs manual
43 pages
Unit 5 PythonPackages(Matplotlib)
No ratings yet
Unit 5 PythonPackages(Matplotlib)
24 pages
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Python programming lab manual
No ratings yet
Python programming lab manual
61 pages
Keshav
No ratings yet
Keshav
21 pages
Qu 1
No ratings yet
Qu 1
42 pages
ch2 Slides
No ratings yet
ch2 Slides
62 pages
Ai, Ds & ML
No ratings yet
Ai, Ds & ML
52 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Psycho Py Manual
No ratings yet
Psycho Py Manual
162 pages
How To Reverse The Columns of A 2D Array?
No ratings yet
How To Reverse The Columns of A 2D Array?
27 pages
Numpy User
No ratings yet
Numpy User
529 pages
Python in Oil Refineries-
No ratings yet
Python in Oil Refineries-
5 pages
Explore The World of Data Analytics in Python With Digikull
No ratings yet
Explore The World of Data Analytics in Python With Digikull
9 pages
Report Print
No ratings yet
Report Print
22 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Final Print
No ratings yet
Final Print
43 pages
Enhancing Video Accessibility For Color Vision Deficiencies
No ratings yet
Enhancing Video Accessibility For Color Vision Deficiencies
8 pages
PRACTICAL LIST CLASS-XII (INFO. PRACTICALS - fINAL PDF
100% (1)
PRACTICAL LIST CLASS-XII (INFO. PRACTICALS - fINAL PDF
8 pages
VX Python For Finance EuroScipy 2012 Y Hilpisch
No ratings yet
VX Python For Finance EuroScipy 2012 Y Hilpisch
26 pages
Intro To Scientific Computing With Python
No ratings yet
Intro To Scientific Computing With Python
87 pages
Python Notes and Cheat Sheets
No ratings yet
Python Notes and Cheat Sheets
10 pages
CFD-3
No ratings yet
CFD-3
3 pages
Pharmacy Management System Harshal00000
No ratings yet
Pharmacy Management System Harshal00000
44 pages
J1(SkillDzire)
No ratings yet
J1(SkillDzire)
49 pages
Billones_SebastianLuise_Week3Exercise00Series
No ratings yet
Billones_SebastianLuise_Week3Exercise00Series
5 pages
Class XII (As Per CBSE Board) : Informatics Practices
No ratings yet
Class XII (As Per CBSE Board) : Informatics Practices
18 pages
Logistic Regression With A Neural Network Mindset: 1 - Packages
No ratings yet
Logistic Regression With A Neural Network Mindset: 1 - Packages
23 pages
Brochure Python For Data Scientist
No ratings yet
Brochure Python For Data Scientist
14 pages
Python for Data Analysis 3rd Edition by Wes McKinney ISBN 9781098103989 109810398X - Download the full set of chapters carefully compiled
100% (8)
Python for Data Analysis 3rd Edition by Wes McKinney ISBN 9781098103989 109810398X - Download the full set of chapters carefully compiled
83 pages
CV Assignment 2 RecognitionAR
No ratings yet
CV Assignment 2 RecognitionAR
5 pages
B.Tech Assignment
No ratings yet
B.Tech Assignment
51 pages
Helmet and Number Plate Detection
No ratings yet
Helmet and Number Plate Detection
7 pages