Fds Lab Record
Fds Lab Record
Aim:
To studyabout how to download, install and explore the features of NumPy, SciPy,
Jupyter, Statsmodels and Pandas packages.
1
Arrays in Numpy can be created by multiple ways, with various number of Ranks, defining the size of the
Array. Arrays can also be created with the use of various data types such as lists, tuples,etc. The type of the
resultant array is deduced from the type of the elements in the sequences.
# Python program for#
Creationof Arrays import
numpy as np # Creating a
rank 1 Arrayarr =
np.array([1, 2, 3])
print("Array with Rank 1: \n",arr)#
Creating arank 2 Array
arr = np.array([[1, 2, 3],[4, 5, 6]])
print("Array with Rank 2: \n", arr)#
Creating an array fromtuple
arr = np.array((1, 3, 2))
print("\nArray created using ","passed tuple:\n", arr)
Output:
Array with Rank 1:
[1 2 3]
Array with Rank 2:
[[1 2 3]
[4 5 6]]
Array created using passed tuple:
[1 3 2]
Accessing the array index in a numpy
SciPy is a general-purpose package for mathematics, science, and engineering and extends the base
capabilitiesof NumPy.
Pip install scipy
2
# importing linalg function from scipyfrom
scipyimport linalg
# Compute the determinant of a
matrixlinalg.det(A)
Output :
2.999999999999997
Jupyter:
o Installing Jupyter Notebook using PIP:
o python3 -m pip install --upgrade pip
o python3 -m pip install jupyter
o Command to run the Jupyter notebook:
o jupyter notebook
o # For capturing the execution time
o %%timeo
o # Find the squares of a number in the
o # range from 0 to 14
o for x in range(15):
o square = x**2
o print(square)
3
Output :
Statsmodels is a package for exploring data, estimating statistical models, and performing statistical
tests. Itinclude descriptive statistics, statistical tests, plotting functions, and result statistics.
pip install statsmodelsimport
numpy as np import pandas as
pd
import statsmodels.formula.api as smf#
loadingthe csv file
df =
pd.read_csv('headbrain1.csv')
print(df.head())
# fitting the model
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()#
modelsummary
print(model.summary())
4
Output:
Description
pandas visualizes and manipulates data tables. There are many functions that allow
efficientmanipulation for the preliminary steps of data analysis problems.
Pip install pandas import
pandas as pdimport
numpy asnp
# Creating empty seriesser
=pd.Series() print(ser)
# simple array
data = np.array(['g', 'e', 'e', 'k',
's'])ser =pd.Series(data)
print(ser)Output:
Series([], dtype:
float64)
5
Output:
0 g
1 e
2 e
3 k
4 s
Dtype: object
Result:
Thus the program has been executed successfully and output was verified.
6
Ex.no :2
Date :
Working with numpy array
Aim:
To write a python program for working a numpy arrays.
Algorithm:
Step1: Start the program.
Step2: Declare the header files and create an numpy array.
Step3: Defining array1 and array2 (a, b).
Step4: Adding 1 to each element which can be defining in array as (a+1).
Step5: Subtracting 2 to each element as (b-2).
Step6: Print to create an array from tuple to be displayed.
Step7: Stop the program.
7
Working with numpy array
Program:
import numpy as np
a1=np.array([1,2,3,4])
a2=np.array([5,2,8,4])a=np.add(a1,a2)
s=np.subtract(a1,a2)
m=np.multiply(a1,a2)
d=np.divide(a1,a2)
p=np.power(a1,a2)
mo=np.mod(a1,a2)
print("add",a,"\nsub",s,"\nmultiply",m,"\ndivide",d,"\npower",p,"\nmod",mo)
x=np.array([-2,-1,0,1,2])
x1=np.array([3-4j,4-3j,2+0j,0+1j])r=np.abs(x)
r1=np.absolute(x1)print(r,r1)
x=[1,2,3]
print("x=",x)
print("e^x= ",np.exp(x),"\n2^x= ",np.power(2,x),"\n3^x= ",np.power(3,x))x=[-1,0,1]
print("sin(x)= ",np.arcsin(x),"\ncos(x)= ",np.arccos(x),"\ntan(x)= ",np.arctan(x))
l=np.random.random(100)
print("sum of random = ",sum(l))
r=np.random.randint(100)
print("a single random num in 100= ",r)
r1=np.random.rand(3,2)
print("random num in 3x2 matrix= ",r1)
m=np.min(l)print(l)
print("minimum value= ",min(l))print("max
value= ",max(l)) print("variance=
",np.var(l))
print("index max,min", np.argmin(l),np.argmax(l))
print("median= ",np.median(l))
8
Output:
add [ 6 4 11 8]
sub [-4 0 -5 0]
multiply [ 5 4 24 16]
divide [0.2 1. 0.375 1. ]
power [ 1 4 6561 256]
mod [1 0 3 0]
[2 1 0 1 2] [5. 5. 2. 1.]
x= [1, 2, 3]
e^x= [ 2.71828183 7.3890561 20.08553692]
2^x= [2 4 8]
3^x= [ 3 9 27]
sin(x)= [-1.57079633 0. 1.57079633]
9
[0.28586698 0.45352049 0.25451039 0.97105316 0.02535396 0.09221343
0.79413109 0.42820734 0.26583637 0.75377329 0.91799221 0.02531889
0.83878874 0.7960848 0.8585927 0.85812235 0.83383714 0.2284291
0.52179336 0.90650508 0.09977865 0.30471393 0.22279719 0.86075837
0.046045 0.71424256 0.35741625 0.98231166 0.42137124 0.72331874
0.99012286 0.44864868 0.61460373 0.546737 0.80822087 0.33479619
0.93914826 0.61968599 0.8092868 0.7183906 0.91778332 0.99110823
0.38505856 0.85986195 0.8065371 0.48528166 0.67631784 0.7436611
0.70843818 0.53572227 0.34302813 0.8711374 0.6286608 0.55200494
0.71513182 0.41856791 0.49761823 0.22898292 0.6934951 0.96199147
0.49538308 0.47882394 0.64561895 0.23255628 0.34430992 0.2406554
0.90681161 0.74727446 0.73762249 0.65311322 0.35202763 0.03110263
0.29740445 0.74235981 0.05979187 0.67236839 0.10224003 0.01170318
0.29194168 0.0124052 0.10913692 0.50157127 0.44995131 0.21237588
0.39721251 0.78490074 0.40505713 0.02867668 0.19936884 0.29913253
0.80024108 0.61590016 0.98818807 0.03773516 0.18219169 0.96718149
0.36593412 0.69541724 0.63905247 0.20797664]
minimum value= 0.01170317850813174
max value= 0.9911082265003155
variance= 0.08578639769250541
index max,min 77 41
median= 0.528757819258286
10
Result:
Thus the program has been executed successfully and output was verified.
11
Ex.no :3
Date :
Working with pandas data frames
Aim:
To write a python program to pandas data frames.
Algorithm:
Step1 : Start the program.
Step2 : Import pandas library and aliasing as pd.
Step3 : Create empty dataframe and also create dataframe from list of dictionaries.
Step4 : Print the column indices , value as dictionary.
Step5 : Create dataframe from dictionary series.
Step6 : Print the needed indices values.
Step7 : Adding a column and print it.
Step8 : Perform following functions
pop function
Row selection by table and integer location
Slice row
Addition of rows.
Step9 : Print the values.
Step10:Stop the program.
12
Working with Pandas Data Frames
Program:
13
df3['one']
14
print (df1)
print ("Deleting another column using POP function:")
df3.pop('two')
print (df3)
#Row selection bylableprint
(df3.loc['b'])
#row selection by integer locationprint
(df3.iloc[2])
#slice rows print (df3[2:4])
#Addition of rows
df4 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df5 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])df4 =
df.append(df5)
print (df4)
# Drop rows with label 0df4 =
df4.drop(0)
print (df4)
Output:
Empty DataFrame
Columns: []Index: []
ab
first 1 2
second 5 10a b1
first 1 NaN
second 5 NaN
d NaN 4 NaN
16
four 22.0
Name: b, dtype: float64three
30.0
four 33.0
Name: c, dtype: float64three four
c 30.0 33.0
d N
17
Result:
Thus the program has been executed successfully and output was verified.
18
Ex.no :4
Date : Read data from test file, read and the web and exploring various
commends for doing descriptive analysis in Iris data set.
Aim:
To write a python program to Read data from test file, read and the web and exploring various
commends for doing descriptive analysis in Iris data set.
Algorithm:
Step1 .: Start the program.
Step2 .: Declare the variable of data.
Step3 .: Declared data with head(), column, sample(10) and shape().
Step4 .: Slicing the data from 10:21 and print it.
Step5 .: Also print it with variables and data head().
Step6 .: iloc() to display records with Iris set.
Step7 .: using value counts() to print min, max and median.
Step8 .: Using total values to add a column and print cols and take column and rename it.
Step9 .: print the processing values.
Step10: Stop the program.
19
Read data from test file, read and the web and exploring various
commends for doing descriptive analysis in Iris data set.
Program:
import pandas as pd
data = pd.read_csv('iris.csv')print(data)
data.head() data.sample(10)
data.columns data.shape
print(data) print(data[10:21])
sliced_data=data[10:21] print(sliced_data)
specific_data=data[["Id","Species"]]
print(specific_data.head(10)) data.iloc[5]
data.loc[data["Species"] == "Iris-setosa"]
data["Species"].value_counts()
sum_data = data["sepallength"].sum() mean_data =
data["sepallength"].mean() median_data =
data["sepallength"].median()
print("Sum:",sum_data, "\nMean:", mean_data, "\nMedian:",median_data)
min_data=data["sepallength"].min()
max_data=data["sepallength"].max() print("Minimum:",min_data,
"\nMaximum:", max_data)cols = data.columns
print(cols)
cols = cols[1:5]
data1 = data[cols]
data["total_values"]=data1[cols].sum(axis=1)
newcols={"Id":"id","sepallength":"sepallength","sepalwidth":"sepalwidth"}
data.rename(columns=newcols,inplace=True)
print(data.head())data.style
data.head(10).style.highlight_max(color='lightgreen', axis=0)
data.head(10).style.highlight_max(color='lightgreen', axis=1)
data.head(10).style.highlight_max(color='lightgreen', axis=None)data.isnull()
20
data.isnull().sum()
21
import seaborn as sns
iris = sns.load_dataset("iris")
sns.heatmap(iris.corr(), linecolor = 'white', linewidths = 1) sns.heatmap(iris.corr(),
linecolor = 'white', linewidths = 1, annot = True )data.corr(method='pearson')
g = sns.pairplot(data,hue="Species")
Output:
Id sepallength sepalwidth petallength petalwidth Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
22
149 150 5.9 3.0 5.1 1.8 Iris-virginica
23
Id Species
0 1 Iris-setosa
1 2 Iris-setosa
2 3 Iris-setosa
3 4 Iris-setosa
4 5 Iris-setosa
5 6 Iris-setosa
6 7 Iris-setosa
7 8 Iris-setosa
8 9 Iris-setosa
9 10 Iris-setosaSum: 876.5
Mean: 5.843333333333335
Median: 5.8
Minimum: 4.3
Maximum: 7.9
Index(['Id', 'sepallength', 'sepalwidth', 'petallength', 'petalwidth','Species'],
dtype='object')
id sepallength sepalwidth petallength petalwidth Species \
total_values
0 10.2
1 9.5
2 9.4
3 9.4
4 10.2
24
25
Result:
Thus the program has been executed successfully and output was verified.
26
Ex.no :5A
Date : Univariate analysis: standard deviation, skewness and
Kurtosis of pima Indians diabetes dataset
Aim:
To write a python program for Univariate analysis.
Algorithm:
Step1 : Start the program
Step2 : Import pandas library and alaising as pd.
Step3 : Initialize the data.
Step4 : Include pima and rows, columns
Step5 : Enter the mean, median, mode, variance, Standard deviation values from the dataset.
Step6 : Print the values.
Step7 : Stop the program.
27
Standard Deviation, Skewness and Kurtosis of Pima Indians Diabetes Dataset.
Program:
import pandas as pd
import statistics
pima = pd.read_csv('diabetes.csv')
print(pima.head()) print(pima.shape)
print(type(pima))
pima_row_idx = pima.index
print(pima_row_idx) pima_col_idx =
pima.columnsprint(pima_col_idx)
print(pima.dtypes)
mean=statistics.mean(pima["Insulin"])
mode=statistics.mode(pima["Insulin"])
median=statistics.median(pima["Insulin"])
variance=statistics.variance(pima["Outcome"])
standard_deviation=statistics.stdev(pima["Outcome"])
fre_count=pima["Outcome"].value_counts()
skew=pima.skew(axis=0,skipna=True)
kurt=pima.kurtosis(skipna=True)
print(mean,"\n",mode,"\n",median,"\n",variance,"\n",standard_deviation,"\n",fre_count,"\n",skew
,"\n",kurt)
Output:
Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome
0 6 148 72 ... 0.627 50 1
1 1 85 66 ... 0.351 31 0
28
[5 rows x 9 columns]
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex(start=0, stop=768, step=1)
Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin','BMI',
'DiabetesPedigreeFunction', 'Age', 'Outcome'],
dtype='object')
Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64Age
int64
Outcome int64
dtype: object
79.79947916666667
0
30.5
0.22748261625380273
0.47695137724279896
0 500
1 268
29
Name: Outcome, dtype: int64
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigree Function 1.919 1
Age 1.129597
Outcome 0.635017
dtype: float64
Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
Insulin 7.214260
BMI 3.290443
DiabetesPedigree Function 5.594 5
Age 0.643159
Outcome -1.600930
dtype: float64
30
Result:
Thus the program has been executed successfully and output was verified.
31
Ex.no :5B
Date : Univariate analysis: Frequency, mean, median, mode, variance
Standard deviation, skewness and kurtosis of UCT diabetes
Aim:
To write a python program for Univariate analysis.
Algorithm:
Step1 : Start the program.
Step2 : Import pandas as pd, statistics and CSV files
Step3 : Print head(), shape, UCI
Step4 : By using index and columns and print row and column.
Step5 : Declare mean, mode, median, variance, standard diviation, skew and kurt with statics
Step6 : Print the perform value.
Step7 : Stop the program.
32
Univariate analysis: Frequency, Mean, Median, Mode, Variance, StandardDeviation,
Skewness and Kurtosis of UCI Diabetes Dataset.
Program:
import pandas as pd
import statistics
uci = pd.read_csv('diabetic_data.csv')
print(uci.head())
print(uci.shape) print(type(uci))
uci_row_idx = uci.index
print(uci_row_idx) uci_col_idx =
uci.columnsprint(uci_col_idx)
print(uci.dtypes)
mean=statistics.mean(uci["num_lab_procedures"])
mode=statistics.mode(uci["num_lab_procedures"])
median=statistics.median(uci["num_lab_procedures"])
variance=statistics.variance(uci["num_lab_procedures"])
standard_deviation=statistics.stdev(uci["num_lab_procedures"])
fre_count=uci["num_lab_procedures"].value_counts()
skew=uci.skew(axis=0,skipna=True) kurt=uci.kurtosis(skipna=True)
print(mean,"\n",mode,"\n",median,"\n",variance,"\n",standard_deviation,"\n",fre_count,"\n",skew
,"\n",kurt)
33
Output:
encounter_id patient_nbr race ... change diabetesMed readmitted0
2278392 8222157 Caucasian ... No No NO
1 149190 55629189 Caucasian ... Ch Yes >30
2 64410 86047875 AfricanAmerican ... No Yes NO3
500364 82442376 Caucasian ... Ch Yes NO
4 16680 42519267 Caucasian ... Ch Yes NO
[5 rows x 50 columns]
(101766, 50)
<class 'pandas.core.frame.DataFrame'>
RangeIndex(start=0, stop=101766, step=1)
Index(['encounter_id', 'patient_nbr', 'race', 'gender', 'age', 'weight', 'admission_type_id',
'discharge_disposition_id', 'admission_source_id','time_in_hospital', 'payer_code',
'medical_specialty', 'num_lab_procedures', 'num_procedures', 'num_medications',
'number_outpatient', 'number_emergency', 'number_inpatient', 'diag_1','diag_2',
'diag_3', 'number_diagnoses', 'max_glu_serum', 'A1Cresult', 'metformin',
'repaglinide', 'nateglinide', 'chlorpropamide',
'glimepiride', 'acetohexamide', 'glipizide', 'glyburide', 'tolbutamide',
'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol', 'troglitazone', 'tolazamide',
'examide', 'citoglipton', 'insulin',
'glyburide-metformin', 'glipizide-metformin', 'glimepiride-
pioglitazone', 'metformin-rosiglitazone',
'metformin-pioglitazone', 'change', 'diabetesMed', 'readmitted'],
dtype='object')
encounter_id int64
patient_nbr int64
34
race object
gender object
age object
weight object
admission_type_id int64
discharge_disposition_id int64
admission_source_id int64
time_in_hospital int64 payer_code
object
medical_specialty object
num_lab_procedures int64
num_procedures int64
num_medications int64
number_outpatient int64
number_emergency int64
number_inpatient int64 diag_1
object
diag_2 object
diag_3 object
number_diagnoses int64
max_glu_serum objectA1Cresult
object
metformin object
repaglinide object
nateglinide object
chlorpropamide objectglimepiride
object
acetohexamide object
35
glipizide object
glyburide object
tolbutamide object
pioglitazone object
rosiglitazone object
acarbose object
miglitol object
troglitazone object
tolazamide object
examide object
citoglipton object
insulin object
glyburide-metformin object glipizide-
metformin object glimepiride-
pioglitazone objectmetformin-
rosiglitazone objectmetformin-
pioglitazone objectchange
object
diabetesMed object
readmitted objectdtype:
object 43.09564098028811
1
44.0
387.0805299104688
19.674362249142124
1 3208
43 2804
36
44 2496
45 2376
38 2213
120 1
132 1
121 1
126 1
118 1
discharge_disposition_id 6.003347
37
admission_source_id 1.744989
time_in_hospital 0.850251
num_lab_procedures -0.245074
num_procedures 0.857110
num_medications 3.468155
number_outpatient 147.907736
number_emergency 1191.686726
number_inpatient 20.719397
number_diagnoses -0.079056
dtype: float64
38
Result:
Thus the program has been executed successfully and output was verified.
39
Ex.no :5C
Date :
Bivariate Analysis- Program for Linear Regression
Aim:
To write a python program for Bivariate analysis program for Linear Regression.
Algorithm:
Step1 : Start the program.
Step2 : Import numpy as np, pandas as pd and seaborn as sns.
Step3 : Print head values .
Step4 : Print BMI and age of all values.
Step5 : Print as glucose vs bloodpressure.
Step6 : Print all the font size = 14.
Step7 : Print grid as true.
Step8 : Stop the program.
40
Bivariate Analysis-Program for linear regression
:
Program:
import numpy as np
import pandas as pd
import seaborn assns
import statistics
from sklearn.linear_model
import LinearRegression
importstatsmodels.api as sm
import matplotlib.pyplot as plt df =
pd.read_csv('diabetes.csv')head=df.head()
print(head)
cols=["Pregnancies", "Glucose","BloodPressure","SkinThickness","Insulin",
"BMI","DiabetesPedigreeFunction", "Age"]
X=df.BMI
Y=df.Age
plt.scatter(df['Glucose'], df['BloodPressure'], color='blue')
plt.title('Glucose Vs BloodPressure', fontsize=14)
plt.xlabel('Glucose', fontsize=14) plt.ylabel('BloodPresure',
fontsize=14)
plt.grid(True)plt.show()
model = LinearRegression()
model = LinearRegression().fit(X,Y)
3 1 89 66 ... 0.167 21 0
4 0 137 40 ... 2.288 33 1
41
[5 rows x 9 columns]
42
Result:
Thus the program has been executed successfully and output was verified.
43
Ex.no :5D
Date :
Bivariate Analysis- Logistic Regression
Aim:
To write a python program for Bivariate Analysis Logistic Regression.
Algorithm:
Step1 : Start the program.
Step2 : Import necessary Header file.
Step3 : Using a dataset to print a head.
Step4 : Declare the columns of dataframe and print it.
Step5 : Perform logistic model fit and print summary.
Step6 : Print the values.
Step7 : Stop the program.
44
Bivariate Analysis:Logistic regression
Program:
import numpy as np
import pandas as pd
import seaborn assns
import matplotlib.pyplot as plt
## Importing stats models for running logistic regressionimport
statsmodels.api as sm
sns.set(color_codes =True)
%matplotlib inline
df = pd.read_csv('diabetes.csv')
head=df.head()
print(head)
cols=["Pregnancies", "Glucose","BloodPressure","SkinThickness","Insulin",
"BMI","DiabetesPedigreeFunction", "Age"]
X=df[cols] y=df.Outcome
## Defining the model and assigning Y (Dependent) and X (Independent Variables)
logit_model=sm.Logit(y,X)
## Fitting the model and publishing the results
result=logit_model.fit() print(result.summary())
cols2=["Pregnancies", "Glucose","BloodPressure","SkinThickness","BMI"]X=df[cols2]
logit_model=sm.Logit(y,X)
result=logit_model.fit()
print(result.summary2())
cols3=["Pregnancies", "Glucose","BloodPressure","SkinThickness"]X=df[cols3]
logit_model=sm.Logit(y,X)
result=logit_model.fit()
print(result.summary())
cols4=["Pregnancies", "Glucose","BloodPressure"]X=df[cols4]
logit_model=sm.Logit(y,X)
result=logit_model.fit()
45
print(result.summary())
## Importing LogisticRegression from Sk.Learn linear model as stats model function cannot giveus
classification report and confusion matrix
from sklearn.linear_model import LogisticRegressionlogreg =
LogisticRegression()
cols4=["Pregnancies", "Glucose","BloodPressure"]X=df[cols4]
y=df.Outcome
logreg.fit(X,y)
## Defining the y_pred variable for the predicting values. I have taken 392 dia dataset. We canalso take a
test dataset
y_pred=logreg.predict(X)
## Calculating the precision of the model
from sklearn.metrics import classification_report
print(classification_report(y,y_pred))
from sklearn.metrics import confusion_matrix
## Confusion matrix gives the number of cases where the model is able to accurately predict theoutcomes..
both 1 and 0 and how many cases it gives false positive and false negatives
confusion_matrix = confusion_matrix(y, y_pred)
print(confusion_matrix
46
Output:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1
DiabetesPedigreeFunction Age Outcome
0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
=====================================================================
=======================
coef std err z P>|z| [0.025 0.975]
47
Pregnancies 0.1284 0.029 4.484 0.000 0.072 0.185
Glucose 0.0129 0.003 4.757 0.000 0.008 0.018
BloodPressure -0.0303 0.005 -6.481 0.000 -0.039 -0.021
SkinThickness 0.0002 0.006 0.032 0.974 -0.012 0.012
Insulin 0.0007 0.001 0.942 0.346 -0.001 0.002
BMI -0.0048 0.011 -0.449 0.653 -0.026 0.016
48
BloodPressure -0.0331 0.0046 -7.2700 0.0000 -0.0421 -0.0242
SkinThickness 0.0048 0.0054 0.8871 0.3750 -0.0058 0.0154
BMI -0.0057 0.0106 -0.5365 0.5916 -0.0265 0.0151
=================================================================
Optimization terminated successfully. Current
function value: 0.612759Iterations 5
Logit Regression Results
=====================================================================
=========
Dep. Variable: Outcome No. Observations: 768
Model: Logit Df Residuals: 764
Method: MLE Df Model: 3
=====================================================================
============
coef std err z P>|z| [0.025 0.975]
=====================================================================
============
Optimization terminated successfully. Current
function value: 0.613118
49
Iterations 5
Logit Regression Results
=====================================================================
=========
Dep. Variable: Outcome No. Observations: 768
Model: Logit Df Residuals: 765
Method: MLE Df Model: 2
=====================================================================
============
coef std err z P>|z| [0.025 0.975]
50
Result:
Thus the program has been executed successfully and output was verified.
51
Ex.no :5E
Date :
Multiple Regression analysis
Aim:
To write a python program for multi regression analysis.
Algorithm:
Step1 : start the program.
Step2 : Import necessary header files.
Step3 : Using a dataset to print head.
Step4 : Perform corr, sho() by help of dataset.
Step5 : Print the values.
Step6 : Stop the program.
52
MULTIPLE REGRESSION ANALYSIS
import pandas as pd
importseaborn as sns
from pandas.plotting import scatter_matriximport
matplotlib.pyplot as plt
df=pd.read_csv('diabetes.csv')df.head()
print(df) corr=df.corr()
print(corr)
hm=sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.columns,cmap='RdBu',annot=T rue)
print(hm) data=df.Age
sm.qqplot(data,line='s')pylab.show()
scatter_matrix(df)
#pd.plotting.scatter_matrix(df,alpha=1,figsize(20,20)) plt.show()
data=df[["Pregnancies","Glucose","BloodPressure"]] print(data)
#pd.plotting.scatter_matrix(data,alpha=1,figsize(30,20))plt.show()
53
Output:
0 6 148 ... 50 1
1 1 85 ... 31 0
2 8 183 ... 32 1
3 1 89 ... 21 0
4 0 137 ... 33 1
54
[9 rows x 9 columns]
AxesSubplot(0.125,0.125;0.62x0.755)
0 6 148 72
1 1 85 66
2 8 183 64
3 1 89 66
4 0 137 40
763 10 101 76
764 2 122 70
765 5 121 72
766 1 126 60
767 1 93 70
55
Result:
Thus the program has been executed successfully and output was verified.
56
Ex.no :6A
Apply and explore various plotting functions on UCI data sets.
Date :
Normal curve
Aim:
To write a python program for normal curves.
Algorithm:
Step1 : Start the program.
Step2 : Import necessary header files.
Step3 : Plot between- 10 and to with 001 steps.
Step4 : Calculate mean and standard deviation.
Step5 : Print the values.
Step6 : stop the program.
57
Apply and explore various plotting functions on UCI data sets.
Normal curves
import numpy as np
import matplotlib.pyplot as pltfrom
scipy.stats import norm import statistics
# Plot between -10 and 10 with .001 steps.
x_axis = np.arange(-20, 20, 0.01)
# Calculating mean and standard deviationmean =
statistics.mean(x_axis)
sd = statistics.stdev(x_axis)
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))plt.show()
Output:
58
Result:
Thus the program has been executed successfully and output was verified.
59
Ex.no :6B
Date :
Density and contour plots
Aim:
To write a python program for density and contour plots.
Algorithm:
Step1 : Start the program.
Step2 : Import necessary header files.
Step3 : Compute np.sin(x)*10+np.cos(10+y*x)*np.cos(x).
Step4 : Perform linspace, meshgrid, lable and show().
Step5 : Print the values.
Step6 : stop the program.
60
Density and contour plots
Output:
61
Result:
Thus the program has been executed successfully and output was verified.
62
Ex.no :6C
Date :
Correlation and scatter plots
Aim:
To write a python program for correlation and scatter plots.
Algorithm:
Step1 : Start the program.
Step2 : Import the necessary header files.
Step3 : Correlation coefficient between Bloodpresure and BMI also print matrix.
Step4 : Print the values.
Step5 : Stop the program.
63
Correlation and scatter plots
import pandas as pd
diab=pd.read_csv("diabetes.csv")
print("Diabetes DataFile headers Details")
print(diab.head())
import seaborn as sns
sns.scatterplot(x="BloodPressure", y="BMI", data=diab);
ax = sns.scatterplot(x="BloodPressure", y="BMI", data=diab)
ax.set_title("BloodPressure vs. BMI") ax.set_xlabel("BloodPressure");
sns.lmplot(x="BloodPressure", y="BMI", data=diab);
sns.lmplot(x="BloodPressure", y="BMI", hue="BloodPressure", data=diab);from scipy
import stats
print("Correaltion coefficient between BloodPressure and BMI")
print(stats.pearsonr(diab['BloodPressure'], diab['BMI']))
cormat = diab.corr()
print("correlation MATRIX")
print(round(cormat,2))
sns.heatmap(cormat);
Output:
Diabetes DataFile headers Details
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1
64
SkinThickness -0.08 0.06 0.21 1.00
Insulin -0.07 0.33 0.09 0.44
BMI 0.02 0.22 0.28 0.39
DiabetesPedigreeFunction -0.03 0.14 0.04 0.18
Age 0.54 0.26 0.24 -0.11
Outcome 0.22 0.47 0.07 0.07
Outcome
Pregnancies 0.22
Glucose 0.47
BloodPressure 0.07
SkinThickness 0.07
Insulin 0.13
BMI 0.29
DiabetesPedigreeFunction 0.17
Age 0.24
Outcome 1.00
65
66
67
Result:
Thus the program has been executed successfully and output was verified.
68
Ex.no :6D(1)
Date :
Histogram
Aim:
To write a python program for Histograms.
Algorithm:
Step1 : Start the program.
Step2 : Import Header files.
Step3 : Declare variable with values.
Step4 : Put histogram and show() it.
Step5 : Print the values.
Step6 : Stop the program.
69
Histograms
1)
import matplotlib.pyplot as pltx =
[1,1,2,3,3,5,7,8,9,10,
10,11,11,13,13,15,16,17,18,18,
18,19,20,21,21,23,24,24,25,25,
25,25,26,26,26,27,27,27,27,27,
29,30,30,31,33,34,34,34,35,36,
36,37,37,38,38,39,40,41,41,42,
43,44,45,45,46,47,48,48,49,50,
51,52,53,54,55,55,56,57,58,60,
61,63,64,65,66,68,70,71,72,74,
75,77,81,83,84,87,89,90,90,91
]
plt.hist(x, bins=10)plt.show()
70
Result:
Thus the program has been executed successfully and output was verified.
71
Ex.no :6D(2)
Date :
Histogram
Aim:
To write a python program for Histograms.
Algorithm:
Step1 : Start the program.
Step1 : Import necessary header files.
Step1 : Creating 3 subplots, 1st for histogram, 2nd for histogram segmented by outcome and 3rd
For representing same segmentation using tosplot.
Step1 : Print the values.
Step1 : Stop the program.
72
2)
import pandas as pd
import matplotlib.pyplot as pltimport
seaborn as sns sns.set(color_codes =True)
%matplotlib inline diab=pd.read_csv("diabetes.csv")
print("Diabetes DataFile headers Details")
print(diab.head())
sns.countplot(x=diab.Outcome)
dia1 = diab[diab.Outcome==1]dia0 = diab[diab.Outcome==0]plt.title("Count Plot for Outcome")
# Creating 3 subplots - 1st for histogram, 2nd for histogram segmented by Outcome and 3rd for
representing same segmentation using boxplot
plt.figure(figsize=(20, 6))
plt.subplot(1,3,1)
sns.set_style("dark")
plt.title("Histogram for BloodPressure")
sns.distplot(diab.BloodPressure,kde=False)plt.subplot(1,3,2)
sns.distplot(dia0.BloodPressure,kde=False,color="Blue", label="Preg for Outome=0")
sns.distplot(dia1.BloodPressure,kde=False,color = "Gold", label = "Preg for Outcome=1")
plt.title("Histograms for Preg by Outcome")
plt.legend() plt.subplot(1,3,3)
sns.boxplot(x=diab.Outcome,y=diab.BloodPressure)plt.title("Boxplot for Preg
by Outcome")
73
Output:
Diabetes DataFile headers Details
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1
74
75
Result:
Thus the program has been executed successfully and output was verified.
76
Ex.no :6E
Date :
Three dimensional plotting
Aim:
To write a python program for three dimensional plotting.
Algorithm:
Step1 : Start the program.
Step2 : Import necessary header files.
Step3 : Data for a three dimensional line is x line, y line and z line.
Step4 : Data for a three dimensional scatter points in z data, x data, y data.
Step5 : Print the values.
Step6 : Stop the program.
77
Three dimensional plotting
78
Result:
Thus the program has been executed successfully and output was verified.
79
Ex.no :7
Date :
Visualizing geographic data with Basemap
Aim:
To write a python program for visualizing geographic data with Basemap using module.
Algorithm:
Step1 : Start the program.
Step2 : Include necessary header files.
Step3 : Map the long, lat of x, y for plotting.
Step4 : Draw a shaded. Relief image; lots and longs are returned as a dictionary.
Step5 : Key contains the plt.line instance and cycle. Through the line and set the desired style for
line in all lines.
Step6 : Print the values.
Step7 : Stop the program.
80
VISUALIZING GEOGRAPHIC DATA WITH BASEMAP
Program:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemapfig =
plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,width=8E6,
height=8E6,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)
# Map (long, lat) to (x, y) for plottingx, y = m(-
122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5) plt.text(x, y, '
Seattle', fontsize=12);from itertools import
chain
def draw_map(m, scale=0.2):
# draw a shaded-relief image
m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary
lats = m.drawparallels(np.linspace(-90, 90, 13))
lons = m.drawmeridians(np.linspace(-180, 180, 13))# keys
contain the plt.Line2D instances
lat_lines = chain(*(tup[1][0] for tup in lats.items())) lon_lines =
chain(*(tup[1][0] for tup in lons.items()))all_lines =
chain(lat_lines, lon_lines)
# cycle through these lines and set the desired stylefor line in
all_lines:
line.set(linestyle='-', alpha=0.3, color='w')
81
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None,llcrnrlat=-
90, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180, )
draw_map(m)
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None,llcrnrlat=-
90, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180, )
draw_map(m)
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,lon_0=0,
lat_0=50, lat_1=45, lat_2=55, width=1.6E7,
height=1.2E7)
draw_map(m)
Output:
82
83
Result:
Thus the program has been executed successfully and output was verified.
84