0% found this document useful (0 votes)

13 views

Dsf-Pyt-Lab Manual

The document discusses creating and working with NumPy arrays in Python. It covers creating arrays, basic operations on arrays like slicing, and demonstrates creating Pandas dataframes from arrays and dictionaries. It also shows basic plotting in Matplotlib, including line plots, setting labels and titles.

Uploaded by

thilakraj.a0321

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Dsf-Pyt-Lab Manual

Uploaded by

thilakraj.a0321

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Ex no: 1 WORKING WITH NUMPY ARRAYS

AIM

Working with Numpy arrays

ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operactions of array Step4: Stop

PROGRAM

import numpy as np

# Creating array object arr = np.array( [[ 1, 2, 3],

[ 4, 2, 5]] )

# Printing type of arr object print("Array is of type: ", type(arr))

# Printing array dimensions (axes)

print("No. of dimensions: ", arr.ndim)

# Printing shape of array print("Shape of array: ", arr.shape)

# Printing size (total number of elements) of array print("Size of array: ", arr.size)

# Printing type of elements in array

print("Array stores elements of type: ", arr.dtype)

OUTPUT
Array is of type: <class 'numpy.ndarray'> No. of dimensions: 2
Shape of array: (2, 3) Size of array: 6
Array stores elements of type: int32

Program to Perform Array Slicing

a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing") print(a[1:])

Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing [[3 4 5]
[4 5 6]]

Program to Perform Array Slicing

# array to begin with import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print('Our array is:' ) print(a)
# this returns array of items in the second column print('The items in the second
column are:' ) print(a[...,1])
print('\n' )
# Now we will slice all items from the second row print ('The items in the second
row are:' ) print(a[1,...])
print('\n' )
# Now we will slice all items from column 1 onwards print('The items column 1
onwards are:' ) print(a[...,1:])

Output:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]
Result:
Thus the working with Numpy arrays was successfully completed

Ex no: 2 Create a dataframe using a list of elements.

Aim:
To work with Pandas data frames

ALGORITHM
Step1: Start
Step2: import numpy and pandas module Step3: Create a dataframe using the
dictionary Step4: Print the output
Step5: Stop

PROGRAM
import numpy as np import pandas as pd
data = np.array([['','Col1','Col2'], ['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0], columns=data[0,1:]))
# Take a 2D array as input to your DataFrame my_2darray = np.array([[1, 2, 3],
[4, 5, 6]]) print(pd.DataFrame(my_2darray))
# Take a dictionary as input to your DataFrame my_dict = {1: ['1', '3'], 2: ['1', '2'], 3:
['2', '4']}
print(pd.DataFrame(my_dict))
# Take a DataFrame as input to your DataFrame
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
print(pd.DataFrame(my_df))
# Take a Series as input to your DataFrame
my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United
States":"Washington", "Belgium":"Brussels"})
print(pd.DataFrame(my_series))
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
# Use the `shape` property print(df.shape)
# Or use the `len()` function with the `index` property print(len(df.index))

Output:
Col1 Col2
Row1, , , 1, 2
Row2, , , 3, 4
0, 1, 2, ,
0, 1, 2, 3,
1, 4, 5, 61, 2 3
0, 1, 1, 2,
1, 3, 2, 4A,
0, 4, , ,
1, 5, , ,
2, 6, , ,
3, 7, , ,
0, , , ,
United Kingdom London India
New Delhi United States Washington Belgium
Brussels
(2, 3)
2

Result:
Thus the working with Pandas data frames was successfully completed.

EX. NO.:3 BASIC PLOTS USING MATPLOTLIB

AIM:

To draw basic plots in Python program using Matplotlib

ALGORITHM
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop

Program:3a
# importing the required module
import matplotlib.pyplot as plt
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
# plotting the points
plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
# giving a title to my graph
plt.title('My first graph!')
# function to show the plot
plt.show()
Output:

Program:3b
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is
# for red
plt.plot(b, "or")
plt.plot(list(range(0, 22, 3)))
# naming the x-axis
plt.xlabel('Day ->')
# naming the y-axis
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')
# get current axes command
ax = plt.gca()
# get command over the individual
# boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# set the range or the bounds of
# the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)

# set the interval by which

# the x-axis set the marks
plt.xticks(list(range(-3, 10)))

# set the intervals by which y-axis

# set the marks
plt.yticks(list(range(-3, 20, 3)))

# legend denotes that what color

# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])

# annotate command helps to write

# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))

# gives a title to the Graph

plt.title('All Features Discussed')
plt.show()
Output:

Program:3c
import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]

# use fig whenever u want the

# output in a new window also
# specify the window size you
# want ans to be displayed
fig = plt.figure(figsize =(10, 10))

# creating multiple plots in a

# single plot
sub1 = plt.subplot(2, 2, 1)
sub2 = plt.subplot(2, 2, 2)
sub3 = plt.subplot(2, 2, 3)
sub4 = plt.subplot(2, 2, 4)

sub1.plot(a, 'sb')
# sets how the display subplot
# x axis values advances by 1
# within the specified range
sub1.set_xticks(list(range(0, 10, 1)))
sub1.set_title('1st Rep')

sub2.plot(b, 'or')

# sets how the display subplot x axis

# values advances by 2 within the
# specified range
sub2.set_xticks(list(range(0, 10, 2)))
sub2.set_title('2nd Rep')

# can directly pass a list in the plot

# function instead adding the reference
sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1)))
sub3.set_title('3rd Rep')

sub4.plot(c, 'Dm')

# similarly we can set the ticks for

# the y-axis range(start(inclusive),
# end(exclusive), step)
sub4.set_yticks(list(range(0, 24, 2)))
sub4.set_title('4th Rep')
# without writing plt.show() no plot
# will be visible
plt.show()

Output:

Result:

Thus the basic plots using Matplotlib in Python program was successfully
completed.

Ex. No.:4(a) FREQUENCY DISTRIBUTIONS

AIM:
To Count the frequency of occurrence of a word in a body of text is often needed
during text processing.

ALGORITHM
Step 1: Start the Program
Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body
of text
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
wordfreq = [wlist.count(w) for w in wlist]
print("Pairs\n" + str(zip(token, wordfreq)))
Output:
[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS',
2), (OF', 3),
(INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1),
(BOOK', 1), (of', 2),
(THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1),
(Piping', 2), (down', 1),
(the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant',
1), (glee', 1), (,', 3),
(On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he',
1), (laughing', 1),
(said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

Result:
Thus the count the frequency of occurrence of a word in a body of text is often
needed during
text processing and Conditional Frequency Distribution program using python
was successfully completed.

Ex. No.: 4 (c) VARIABILITY

Aim:
To write a python program to calculate the variance.

ALGORITHM

Step 1: Start the Program

Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process

Program:
# Python code to demonstrate variance()
# function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values

from fractions import Fraction as fr

# tuple of a set of positive integers

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))

# tuple of a set of floating point values

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the variance of each samples

print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))

Output :

Variance of Sample 1 is 15.80952380952381

Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

Result:
Thus the computation for variance was successfully completed.
Ex. No.:4(d) NORMAL CURVE

Aim:
To create a normal curve using python program.
ALGORITHM
Step 1: Start the Program
Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

Program:
# import required libraries
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

# Creating the distribution

data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )

#Visualizing the distribution

sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')
Output:

Result:
Thus the normal curve using python program was successfully
completed.

Ex. No.: 4 (e) CORRELATION AND SCATTER PLOTS

Aim:
To write a python program for correlation with scatter plot

ALGORITHM
Step 1: Start the Program
Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

Program:
# Scatterplot and Correlations
# Data
x-pp random randn(100)
yl=x*5+9
y2=-5°x
y3=no_random.randn(100)
#Plot
plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100})
plt scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)})
plt scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1], 2)})
plt scatter(x, y3, label=fy3 Correlation = (np.round(np.corrcoef(x,y3)[0,1], 2)})
# Plot
plt titlef('Scatterplot and Correlations')
plt(legend)
plt(show)
Output :

RESULT:
Thus the Correlation and scatter plots using python program was successfully
completed.

Ex. No.: 4(f) CORRELATION COEFFICIENT

Aim:
To write a python program to compute correlation coefficient.

ALGORITHM
Step 1: Start the Program
Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5:Print the result
Step 6 : Stop the process

Program:
# Python Program to find correlation coefficient.
import math
# function that returns correlation coefficient.
def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0

i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.

sum_Y = sum_Y + Y[i]
# sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.

squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]

i=i+1
# use formula for calculating correlation
# coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Find the size of array.

n = len(X)

# Function call to correlationCoefficient.

print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

Output :
0.953463

Result:
Thus the computation for correlation coefficient was successfully completed.

Ex. No.: 4 (g) SIMPLE LINEAR REGRESSION

Aim:
To write a python program for Simple Linear Regression
ALGORITHM
Step 1: Start the Program
Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process

Program:
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)
if __name__ == "__main__":
main()

Output :

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

Graph:

Result:
Thus the computation for Simple Linear Regression was successfully completed.
EX.NO 5. USE THE STANDARD BENCHMARK DATASET FOR
PERFORMING THE FOLLOWING:
A) UNIVARIATE ANALYSIS: FREQUENCY, MEAN, MEDIAN, MODE,

VARIANCE, STANDARD DEVIATION, SKEWNESS AND KURTOSIS.

AIM:To explore various commands for doing Univariate analytics on the UCI AND

PIMA INDIANS DIABETES data set.

ALGORITHM:

STEP 1: Start the program

STEP 2: To download the UCI AND PIMA INDIANS DIABETES data set using

Kaggle.

STEP 3: To read data from UCI AND PIMA INDIANS DIABETES data set.

STEP 4: To find the mean, median, mode, variance, standard deviation,

skewness and kurtosis in the given excel data set package.

STEP 5: Display the output.

STEP 6: Stop the program.

PROGRAM:

import pandas as pd import numpy as np

import matplotlib.pyplot as plt import seaborn as sns sns.set_style('darkgrid')

%matplotlib inline

from matplotlib.ticker import FormatStrFormatter import warnings

warnings.filterwarnings('ignore')
df = pd.read_csv('C:/Users/kirub/Documents/Learning/Untitled

Folder/diabetes.csv') df.head()

df.shape df.dtypes

df['Outcome']=df['Outcome'].astype('bool') df.dtypes['Outcome']

df.info() df.describe().T

# Frequency# finding the unique count df1 = df['Outcome'].value_counts()

# displaying df1 print(df1) #mean df.mean() #median df.median()

#mode df.mode() #Variance df.var()

#standard deviation df.std()

#kurtosis df.kurtosis(axis=0,skipna=True)

df['Outcome'].kurtosis(axis=0,skipna=True) #skewness

# skewness along the index axis df.skew(axis = 0, skipna = True)

# skip the na values

# find skewness in each row df.skew(axis = 1, skipna = True)

#Pregnancy variable

preg_proportion = np.array(df['Pregnancies'].value_counts()) preg_month =

np.array(df['Pregnancies'].value_counts().index) preg_proportion_perc =

np.array(np.round(preg_proportion/sum(preg_proportion),3)*100,dtype=int)

preg =
pd.DataFrame({'month':preg_month,'count_of_preg_prop':preg_proportion,'perce

ntage_pro portion':preg_proportion_perc})

preg.set_index(['month'],inplace=True) preg.head(10)

sns.countplot(data=df['Outcome']) sns.distplot(df['Pregnancies'])

sns.boxplot(data=df['Pregnancies'])

OUTPUT:

RESULT: Exploring various commands for doing univariate analytics on the UCI
AND PIMA INDIANS DIABETES was successfully executed.
EX.NO:5. B) BIVARIATE ANALYSIS: LINEAR AND LOGISTIC REGRESSION
DATE:MODELING

AIM:
To explore the Linear and Logistic Regression model on the USA HOUSING
AND UCI AND PIMA INDIANS DIABETES data set.

ALGORITHM:
STEP 1: Start the program
STEP 2: To download the any kind of data set like housing dataset using kaggle.
STEP 3: To read data from downloaded data set.
STEP 4: To find the linear and logistic regression model using the given data set.
STEP 5: Display the output.
STEP 6: Stop the program.

PROGRAM:
BIVARIATE ANALYSIS GENERAL PROGRAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt import
seaborn as sns
sns.set_style('darkgrid')
%matplotlib inline
from matplotlib.ticker import FormatStrFormatter import
warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('C:/Users/diabetes.csv')
df.head()
df.shape
df.dtypes
df['Outcome']=df['Outcome'].astype('bool')

fig,axes = plt.subplots(nrows=3,ncols=2,dpi=120,figsize = (8,6))

plot00=sns.countplot('Pregnancies',data=df,ax=axes[0][0],color='gree
n') axes[0][0].set_title('Count',fontdict={'fontsize':8})
axes[0][0].set_xlabel('Month of Preg.',fontdict={'fontsize':7})
axes[0][0].set_ylabel('Count',fontdict={'fontsize':7})
plt.tight_layout()

plot01=sns.countplot('Pregnancies',data=df,hue='Outcome',ax=axes[0][1])
axes[0][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8})
axes[0][1].set_xlabel('Month of Preg.',fontdict={'fontsize':7})
axes[0][1].set_ylabel('Count',fontdict={'fontsize':7})
plot01.axes.legend(loc=1)
plt.setp(axes[0][1].get_legend().get_texts(), fontsize='6')
plt.setp(axes[0][1].get_legend().get_title(), fontsize='6')
plt.tight_layout()

plot10 = sns.distplot(df['Pregnancies'],ax=axes[1][0])
axes[1][0].set_title('Pregnancies Distribution',fontdict={'fontsize':8})
axes[1][0].set_xlabel('Pregnancy Class',fontdict={'fontsize':7})
axes[1][0].set_ylabel('Freq/Dist',fontdict={'fontsize':7}) plt.tight_layout()

plot11 = df[df['Outcome']==False]['Pregnancies'].plot.hist(ax=axes[1][1],label='Non-
Diab.')
plot11_2=df[df['Outcome']==True]['Pregnancies'].plot.hist(ax=axes[1][1],label='Diab.')
axes[1][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8})
axes[1][1].set_xlabel('Pregnancy Class',fontdict={'fontsize':7})
axes[1][1].set_ylabel('Freq/Dist',fontdict={'fontsize':7})
plot11.axes.legend(loc=1)
plt.setp(axes[1][1].get_legend().get_texts(), fontsize='6') # for legend text
plt.setp(axes[1][1].get_legend().get_title(), fontsize='6') # for legend title
plt.tight_layout()

plot20 = sns.boxplot(df['Pregnancies'],ax=axes[2][0],orient='v')
axes[2][0].set_title('Pregnancies',fontdict={'fontsize':8})
axes[2][0].set_xlabel('Pregnancy',fontdict={'fontsize':7})
axes[2][0].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.tight_layout()

plot21 = sns.boxplot(x='Outcome',y='Pregnancies',data=df,ax=axes[2][1])
axes[2][1].set_title('Diab. VS Non-Diab.',fontdict={'fontsize':8})

axes[2][1].set_xlabel('Pregnancy',fontdict={'fontsize':7})
axes[2][1].set_ylabel('Five Point Summary',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7)
plt.tight_layout()
plt.show()

OUTPUT:

## Blood Pressure variable

fig,axes = plt.subplots(nrows=2,ncols=2,dpi=120,figsize = (8,6))

plot00=sns.distplot(df['BloodPressure'],ax=axes[0][0],color='green')
axes[0][0].yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
axes[0][0].set_title('Distribution of BP',fontdict={'fontsize':8})
axes[0][0].set_xlabel('BP Class',fontdict={'fontsize':7})
axes[0][0].set_ylabel('Count/Dist.',fontdict={'fontsize':7})
plt.tight_layout()

plot01=sns.distplot(df[df['Outcome']==False]['BloodPressure'],ax=axes[0][1],color='green',
label='Non Diab.')
sns.distplot(df[df.Outcome==True]['BloodPressure'],ax=axes[0][1],color='red',label='Diab')

axes[0][1].set_title('Distribution of BP',fontdict={'fontsize':8})
axes[0][1].set_xlabel('BP Class',fontdict={'fontsize':7})
axes[0][1].set_ylabel('Count/Dist.',fontdict={'fontsize':7})
axes[0][1].yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
plot01.axes.legend(loc=1)
plt.setp(axes[0][1].get_legend().get_texts(), fontsize='6')
plt.setp(axes[0][1].get_legend().get_title(), fontsize='6')
plt.tight_layout()
plot10=sns.boxplot(df['BloodPressure'],ax=axes[1][0],orient='v')
axes[1][0].set_title('Numerical Summary',fontdict={'fontsize':8})
axes[1][0].set_xlabel('BP',fontdict={'fontsize':7})
axes[1][0].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.tight_layout()
plot11=sns.boxplot(x='Outcome',y='BloodPressure',data=df,ax=axes[1][1])
axes[1][1].set_title(r'Numerical Summary (Outcome)',fontdict={'fontsize':8})
axes[1][1].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.xticks(ticks=[0,1],labels=['Non-Diab.','Diab.'],fontsize=7)
axes[1][1].set_xlabel('Category',fontdict={'fontsize':7})
plt.tight_layout()
plt.show()

OUTPUT:

fig,axes = plt.subplots(nrows=1,ncols=2,dpi=120,figsize = (8,4))

plot0=sns.distplot(df[df['BloodPressure']!=0]['BloodPressure'],ax=axes[0],color='green')
axes[0].yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
axes[0].set_title('Distribution of BP',fontdict={'fontsize':8})
axes[0].set_xlabel('BP Class',fontdict={'fontsize':7})
axes[0].set_ylabel('Count/Dist.',fontdict={'fontsize':7})
plt.tight_layout()

plot1=sns.boxplot(df[df['BloodPressure']!=0]['BloodPressure'],ax=axes[1],orient='v')
axes[1].set_title('Numerical Summary',fontdict={'fontsize':8})
axes[1].set_xlabel('BloodPressure',fontdict={'fontsize':7})
axes[1].set_ylabel(r'Five Point Summary(BP)',fontdict={'fontsize':7})
plt.tight_layout()

OUTPUT:

LINEAR REGRESSION MODELLING ON HOUSING DATASET

# Data manipulation libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

USAhousing = pd.read_csv('USA_Housing.csv')
USAhousing.head()
USAhousing.info()
USAhousing.describe()

USAhousing.columns
sns.pairplot(USAhousing)

sns.distplot(USAhousing['Price'])

sns.heatmap(USAhousing.corr())
X = USAhousing[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of
Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']]
y = USAhousing['Price']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)
# print the intercept
print(lm.intercept_)

coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])
coeff_df

predictions = lm.predict(X_test)
plt.scatter(y_test,predictions)
sns.distplot((y_test-predictions),bins=50);
from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

OUTPUT:

LOGISTIC REGRESSION MODELLING ON PIME DIABETIES

# Data manipulation libraries

import numpy as np
import pandas as pd

###scikit Learn Modules needed for Logistic Regression

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.preprocessing import
LabelEncoder,MinMaxScaler,OneHotEncoder,StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.impute import SimpleImputer from
sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

#for plotting
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(color_codes=True)
import warnings
warnings.filterwarnings('ignore')

df=pd.read_csv('C:/Users/diabetes.csv')

df.head()

df.tail()

df.isnull().sum()

df.describe(include='all')

df.corr()

sns.heatmap(df.corr(),annot=True)
plt.show()

df.hist()
plt.show()

sns.countplot(x=df['Outcome'])

scaler=StandardScaler()
df[['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age']]=scaler.fit_transform(df[['Pregnancies',
'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age']])
df_new = df

# Train & Test split

x_train, x_test, y_train, y_test = train_test_split( df_new[['Pregnancies', 'Glucose',
'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age']],
df_new['Outcome'],test_size=0.20,
random_state=21)

print('Shape of Training Xs:{}'.format(x_train.shape))

print('Shape of Test Xs:{}'.format(x_test.shape))
print('Shape of Training y:{}'.format(y_train.shape))
print('Shape of Test y:{}'.format(y_test.shape))

Shape of Training Xs:(614, 8) Shape of

Test Xs:(154, 8) Shape of Training
y:(614,) Shape of Test y:(154,)

# Build Model
model = LogisticRegression()
model.fit(x_train, y_train) y_predicted
= model.predict(x_test)

score=model.score(x_test,y_test);
print(score)

0.7337662337662337

#Confusion Matrix
# Compute confusion matrix
cnf_matrix = confusion_matrix(y_test, y_predicted)
np.set_printoptions(precision=2)
cnf_matrix

OUTPUT:
RESULT:
Exploring various commands for doing Bivariate analytics on the USA HOUSING Dataset
was successfully executed.
EX.NO:7. APPLY AND EXPLORE VARIOUS PLOTTING
FUNCTIONS ON UCI DATE: DATA SETS.

AIM:
To apply and explore various plotting functions on UCI datasets.

ALGORITHM:

STEP 1: Install seaborn package and import the package.

STEP 2: Normal curves, density or contour plots, correlation
and sctter plots, and histogram plots are visualized.
STEP 3: 3d plotting done using plotly package
STEP 4: Stop the program.
PROGRAM:

A. NORMAL CURVES

#seaborn package
import seaborn as
sns
flights = sns.load_dataset("flights")
flights.head()
may_flights = flights.query("month == 'May'")
sns.lineplot(data=may_flights, x="year",
y="passengers")
OUTPUT:
B. DENSITY AND CONTOUR PLOTS

iris = sns.load_dataset("iris")
sns.kdeplot(data=iris)

OUTPUT:

C. CORRELATION AND SCATTER PLOTS

#correlation visualized using

heatmap function df =
sns.load_dataset("titanic")
ax = sns.heatmap(df annot=True, fmt="d")

#scatter plots of categorical

variable df =
sns.load_dataset("titanic")
sns.catplot(data=df, x="age", y="class")
OUTPUT:
D. HISTOGRAMS

#histogram of datafra,e
df = sns.load_dataset("titanic")
sns.histplot(data=df, x="age")

OUTPUT:

E. THREE DIMENSIONAL PLOTTING

#3d plotting using

ploty package
import plotly as px
df = sns.load_dataset("iris")

px.scatter_3d(df, x="PetalLengthCm", y="PetalWidthCm",

z="SepalWidthCm", size="SepalLengthCm",
color="Species", color_discrete_map = {"Joly": "blue",
"Bergeron": "violet", "Coderre":"pink"})
OUTPUT:

Electrodynamics Tutorials W - (Z-Library) - 1
No ratings yet
Electrodynamics Tutorials W - (Z-Library) - 1
295 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
50 pages
Fundamentals of Data Science Lab Manual-5-26
No ratings yet
Fundamentals of Data Science Lab Manual-5-26
22 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
FODS_LAB_MANUAL
No ratings yet
FODS_LAB_MANUAL
26 pages
Fundamentals of Data science Lab manual new
No ratings yet
Fundamentals of Data science Lab manual new
33 pages
STD XII-IP Ch-1 (Practical)
No ratings yet
STD XII-IP Ch-1 (Practical)
7 pages
FDS Exp1,2
No ratings yet
FDS Exp1,2
4 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
DSF LAB EXP FULL (1) (1)
No ratings yet
DSF LAB EXP FULL (1) (1)
88 pages
Python
No ratings yet
Python
17 pages
Record
No ratings yet
Record
25 pages
PDS Ketan
No ratings yet
PDS Ketan
15 pages
3 IntroToPython-PythonLibraries
No ratings yet
3 IntroToPython-PythonLibraries
36 pages
Python Myssql Programs For Practical File Class 12 Ip
No ratings yet
Python Myssql Programs For Practical File Class 12 Ip
26 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
SADEWSE
No ratings yet
SADEWSE
50 pages
DSA lab program
No ratings yet
DSA lab program
52 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
Manual
No ratings yet
Manual
21 pages
Lab Mannual
No ratings yet
Lab Mannual
49 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
DS
No ratings yet
DS
63 pages
Mod 2 Finalans
No ratings yet
Mod 2 Finalans
9 pages
Python 20240309 154846 0000
No ratings yet
Python 20240309 154846 0000
34 pages
Shiv PDS
No ratings yet
Shiv PDS
15 pages
Python Programming
No ratings yet
Python Programming
31 pages
DV WITH PYTHON -3RD SEM
No ratings yet
DV WITH PYTHON -3RD SEM
14 pages
module5_python
No ratings yet
module5_python
10 pages
Lecture 9 - Data Visualization (Matplotlib)
No ratings yet
Lecture 9 - Data Visualization (Matplotlib)
26 pages
CSA Lab 2
No ratings yet
CSA Lab 2
5 pages
STD X AI PYTHON PROGRAMS 2024-24 new
No ratings yet
STD X AI PYTHON PROGRAMS 2024-24 new
4 pages
Python Introduction
No ratings yet
Python Introduction
20 pages
OSDBMS
No ratings yet
OSDBMS
59 pages
CD3281 DSA (1)
No ratings yet
CD3281 DSA (1)
60 pages
Ids 6 Experiments
No ratings yet
Ids 6 Experiments
27 pages
Cd3281 Final Copy Lab Manual Information Technology
No ratings yet
Cd3281 Final Copy Lab Manual Information Technology
52 pages
Phython Practical Notebook1
No ratings yet
Phython Practical Notebook1
14 pages
Phython Practical Notebook
No ratings yet
Phython Practical Notebook
14 pages
DSD RECORD 2
No ratings yet
DSD RECORD 2
51 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
dsd lab final
No ratings yet
dsd lab final
54 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Data Science Record
No ratings yet
Data Science Record
44 pages
Numpy Pyhton Tutorial
No ratings yet
Numpy Pyhton Tutorial
28 pages
python_lab
No ratings yet
python_lab
19 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Python Program
No ratings yet
Python Program
24 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Python Programs All Manual
No ratings yet
Python Programs All Manual
17 pages
Plotting - Ipynb in
100% (1)
Plotting - Ipynb in
15 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
52 pages
12th IP PRACTICALS
No ratings yet
12th IP PRACTICALS
18 pages
Bipin Python Programming
No ratings yet
Bipin Python Programming
21 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
MLRecord
No ratings yet
MLRecord
24 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
24 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Multi Document Summarization Research Paper 1
No ratings yet
Multi Document Summarization Research Paper 1
26 pages
My Book of Python Computing - Abhijit Kar Gupta
50% (2)
My Book of Python Computing - Abhijit Kar Gupta
385 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
IE684 Lab05
No ratings yet
IE684 Lab05
3 pages
Practical Guide To NumPy For Data Science
No ratings yet
Practical Guide To NumPy For Data Science
27 pages
Python Scientific
No ratings yet
Python Scientific
146 pages
Essential Python Data Visualization Libraries 1687141550
No ratings yet
Essential Python Data Visualization Libraries 1687141550
16 pages
Python For Software Development
100% (1)
Python For Software Development
196 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
Final Year 2023 Project
No ratings yet
Final Year 2023 Project
44 pages
00 Pytorch Fundamentals - Ipynb - Colab
No ratings yet
00 Pytorch Fundamentals - Ipynb - Colab
24 pages
Python Numpy
No ratings yet
Python Numpy
23 pages
Deep Learning A Comprehensive Guide 1st Edition Vasudevan
No ratings yet
Deep Learning A Comprehensive Guide 1st Edition Vasudevan
60 pages
CS229 Section: Python Tutorial: Maya Srikanth
No ratings yet
CS229 Section: Python Tutorial: Maya Srikanth
39 pages
4.3.1.4 Lab - Internet Traffic Data Linear Regression
No ratings yet
4.3.1.4 Lab - Internet Traffic Data Linear Regression
14 pages
Vikas Internship Document
No ratings yet
Vikas Internship Document
34 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
Assignment 01
No ratings yet
Assignment 01
7 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Num Py
No ratings yet
Num Py
25 pages
AI DEP
No ratings yet
AI DEP
2 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
65 pages
Python Module 5
No ratings yet
Python Module 5
19 pages
Soumyakant Tripathy - Project Report
No ratings yet
Soumyakant Tripathy - Project Report
44 pages
UNIT-IV Functions Module and Packages
No ratings yet
UNIT-IV Functions Module and Packages
13 pages
A Visual Intro To NumPy and Data Representation - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
A Visual Intro To NumPy and Data Representation - Jay Alammar - Visualizing Machine Learning One Concept at A Time
16 pages
PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages