0% found this document useful (0 votes)

0 views24 pages

Datascience Lab

The document outlines a series of experiments related to data science using Python and various libraries such as Pandas, Matplotlib, and NumPy. It includes practical examples of working with data frames, creating plots, performing statistical tests like Z-test and T-test, and building linear and logistic models. Each experiment is numbered and provides code snippets along with expected outputs.

Uploaded by

Saravanan Sujatha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views24 pages

Datascience Lab

Uploaded by

Saravanan Sujatha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

INDEX

Sl.No LIST OF EXPERIMENTS Pg.No.

Tools: Python, Numpy, Scipy, Matplotib, Pandas,Statmodels,

Seaborn,Plotly,Bokeh,working with Numpy arrays

1. Working with Pandas data frame 2

Basic Plots using Matplotlib

2. 3

Frequency distributors, Averages, Variability

3. 5

Normal Curves, Correlation and scatter plots, Correlation

4. coefficient 6

5. Regression 9

6. Z-test 11

7. T-test 13

8. Anova 15

9. Building and validating linear models 16

10. Building and validating logistic models 19

11. Time series analysis 22

L.2 Fundamentals of Data Science

Experiment No: 1

WORKING WITH PANDAS DATA FRAMES

Program:
import pandas as pd
data = {"calories": [420, 380, 390], "duration": [50,
40,
45]}
#load data into a DataFrame
object: df = pd.DataFrame(data)
print (df.loc[0])

Output:
calories 420
duration 50
Name: 0, dtype: int64



Downloaded by Saravanan Sujatha

Experiment No: 2

BASIC PLOTS USING MATPLOTLIB

Program:
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r
is # for red
plt.plot(b, "or")
plt.plot(list(range(0, 22,
3))) # naming the x-axis
plt.xlabel('Day ->')
# naming the y-axis
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th
Rep') # get current axes
command
ax = plt.gca()
# get command over the
individual # boundary line of
the graph body
ax.spines['right'].set_visible(Fa
lse)
ax.spines['top'].set_visible(Fals
e) # set the range or the bounds
of
# the left boundary line to fixed
range ax.spines['left'].set_bounds(-
3, 40)
# set the interval by
which # the x-axis set the
marks
plt.xticks(list(range(-3, 10)))
Downloaded by Saravanan Sujatha
L.4 Fundamentals of Data Science

# set the intervals by which y-

axis # set the marks
plt.yticks(list(range(-3, 20,
3))) # legend denotes that what
color
# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th
Rep']) # annotate command helps to write
# ON THE GRAPH any text xy
denotes # the position on the
graph
plt.annotate('Temperature V / s Days', xy = (1.01, -
2.15)) # gives a title to the Graph
plt.title('All Features
Discussed') plt.show()

Output:



Downloaded by Saravanan Sujatha

Experiment No: 3

FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY

Program:
# Python program to get average of a list
# Importing the NumPy
module import numpy as np
# Taking a list of elements
list = [2, 40, 2, 502, 177, 7, 9]
# Calculating average using
average() print(np.average(list))
Output:
105.57142857142857
# Python program to get variance of a list
# Importing the NumPy module
import numpy as np
# Taking a list of elements
list = [2, 4, 4, 4, 5, 5, 7,
9]
# Calculating variance using
var() print(np.var(list))
Output:
4.0
# Python program to get standard deviation of a list
# Importing the NumPy module
import numpy as np
# Taking a list of
elements list = [290, 124,
127, 899]
# Calculating
standard # deviation
using var()
print(np.std(list))

Output:
318.35750344541907


Downloaded by Saravanan Sujatha

L.6 Fundamentals of Data Science

Experiment No: 4

NORMAL CURVES, CORRELATION AND SCATTER PLOTS,

CORRELATION COEFFICIENT

Program:
#Normal curves
import matplotlib.pyplot as
plt import numpy as np
mu, sigma = 0.5, 0.1
s = np.random.normal(mu, sigma,
1000) # Create the bins and
histogram
count, bins, ignored = plt.hist(s, 20, normed=True)

Output:

#Correlation and scatter

plots import sklearn
import numpy as np
import matplotlib.pyplot as
plt import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation

Downloaded by Saravanan Sujatha

Output:
0.8603090020146067
# Correlation
coefficient import math
# function that returns correlation
coefficient. def correlationCoefficient(X,
Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
i = 0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]
# sum of elements of array
Y. sum_Y = sum_Y + Y[i
# sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i]
# sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i = i + 1
# use formula for calculating
correlation # coefficient.
corr = (float)(n * sum_XY - sum_X *
sum_Y)/ (float)(math.sqrt((n *
squareSum_X - sum_X * sum_X)* (n *
squareSum_Y - sum_Y * sum_Y)))
return corr
# Driver function
X = [15, 18, 21, 24, 27]

Downloaded by Saravanan Sujatha

L.8 Fundamentals of Data Science

Y = [25, 25, 27, 31, 32]

# Find the size of
array. n = len(X)
# Function call to correlationCoefficient.
print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

Output:
0.953463


Downloaded by Saravanan Sujatha

Experiment No: 5

REGRESSION

Program:
import numpy as np
import matplotlib.pyplot as
plt def estimate_coef(x, y):
# number of
observations/points n =
np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression
coefficients b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s =
30) # predicted response
vector y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color =
"g") # putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot

Downloaded by Saravanan Sujatha

L.10 Fundamentals of Data Science

plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating
coefficients b =
estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0],
b[1])) # plotting regression line
plot_regression_line(x, y, b)
if name == " main ":
main()

Output:

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437



Downloaded by Saravanan Sujatha

Fundamentals of Data Science Laboratory L.11

Experiment No: 6

Z-TEST

Program:
# imports
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110
and sd 15
# similar to the IQ scores data we assume
above mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha = 0.05
null_mean =100
data =
sd_iq*randn(50)+mean_iq #
print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
# now we perform the test. In this function, we passed
data, in the value parameter
# we passed mean value in the null hypothesis, in
alternative hypothesis we check whether the
# mean is larger
ztest_Score,p_value=ztest(data,value=null_mean,alternative='
la rger')
# the function outputs a p_value and z-score corresponding
to that value, we compare the
# p-value with alpha, if it is greater than alpha then
we do not null hypothesis
# else we reject it.
if(p_value < alpha):
print("Reject Null
Hypothesis")

Downloaded by Saravanan Sujatha

L.12 Fundamentals of Data Science

else:
print("Fail to Reject NUll Hypothesis")

Output:
Reject Null Hypothesis



Downloaded by Saravanan Sujatha

Fundamentals of Data Science Laboratory L.13

Experiment No: 7

T-TEST

Program:
# Importing the required libraries and
packages import numpy as np
from scipy import stats
# Defining two random
distributions # Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var
= 1 x = np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var
= 1 y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard
deviation var_x = x.var(ddof = 1)
var_y = y.var(ddof =
1) # Standard
Deviation
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =",
SD) # Calculating the T-
Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 /
N)) # Comparing with the critical T-Value
# Degrees of freedom
dof = 2 * N - 2
# p-value after comparison with the T-
Statistics pval = 1 - stats.t.cdf( tval, df
= dof) print("t = " + str(tval))
print("p = " + str(2 * pval))

Downloaded by Saravanan Sujatha

L.14 Fundamentals of Data Science

## Cross Checking using the internal function from SciPy

Packa ge
tval2, pval2 = stats.ttest_ind(x, y)
print("t = " + str(tval2))
print("p = " + str(pval2))

Output:
Standard Deviation =
0.7642398582227466 t =
4.87688162540348
p = 0.0001212767169695983
t = 4.876881625403479
p = 0.00012127671696957205



Downloaded by Saravanan Sujatha

Fundamentals of Data Science Laboratory L.15

Experiment No: 8

ANOVA

Program:
# Installing the
package
install.packages("dplyr
") # Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")
# Step 1: Setup Null Hypothesis and Alternate
Hypothesis # H0 = mu = mu01 = mu02 (There is no
difference
# between average displacement for different
gear) # H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function
mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))
summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha =
0.05 # Step 4: Compare test statistics with F-Critical
value
# and conclude test p <alpha, Reject Null Hypothesis

Output:



Downloaded by Saravanan Sujatha

L.16 Fundamentals of Data Science

Experiment No: 9

BUILDING AND VALIDATING LINEAR MODELS

Program
# Importing the necessary
libraries import pandas as pd
import numpy as np
import matplotlib.pyplot as
plt import seaborn as sns
from sklearn.datasets import load_boston
sns.set(style=”ticks”,color_codes=True)
plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston()
You can check those keys with the following code.
print(boston.keys())
The output will be as follow:
dict_keys([‘data’, ‘target’, ‘feature_names’,
‘DESCR’, ‘filename’])
print(boston.DESCR)

You will find these details in output:

Attribute Information (in order):
— CRIM per capita crime rate by town
— ZN proportion of residential land zoned for lots over 25,000 sq.ft.
— INDUS proportion of non-retail business acres per town
— CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
— NOX nitric oxides concentration (parts per 10 million)
— RM average number of rooms per dwelling
— AGE proportion of owner-occupied units built prior to 1940
— DIS weighted distances to five Boston employment centres
— RAD index of accessibility to radial highways
— TAX full-value property-tax rate per $10,000

Downloaded by Saravanan Sujatha

Fundamentals of Data Science Laboratory L.17

— PTRATIO pupil-teacher ratio by town

— B 1000 (Bk — 0.63)² where Bk is the proportion of blacks by town
— LSTAT % lower status of the population
— MEDV Median value of owner-occupied homes in $1000’s :Missing
Attribute Values: None
df=pd.DataFrame(boston.data,columns=boston.feature_names
) df.head()
# print the columns present in the dataset
print(df.columns)
# print the top 5 rows in the dataset
print(df.head())

First five records from data set

#plotting heatmap for overall data
setsns.heatmap(df.corr(), square=True, cmap=’RdYlGn’)

Downloaded by Saravanan Sujatha

L.18 Fundamentals of Data Science

Heat map of overall data set

So let’s plot a regression plot to see the correlation between RM and MEDV.
sns.lmplot(x = ‘RM’, y = ‘MEDV’, data = df)

Regression plot with RM and MEDV



Downloaded by Saravanan Sujatha

Fundamentals of Data Science Laboratory L.19

Experiment No: 10

BUILDING AND VALIDATING LOGISTICS MODELS

Program

Building the Logistic Regression model:

# importing libraries
import statsmodels.api as
sm import pandas as pd
# loading the training dataset
df = pd.read_csv('logit_train1.csv', index_col
= 0) # defining the dependent and independent
variables Xtrain = df[['gmat', 'gpa',
'work_experience']] ytrain = df[['admitted']]
# building the model and fitting the data
log_reg = sm.Logit(ytrain, Xtrain).fit()

Output :
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8
# printing the summary table
print(log_reg.summary())

Output :
Logit Regression Results
=============================================================
Dep. Variable: admitted No. Observations: 30
Model: Logit Df Residuals: 27
Method: MLE Df Model: 2
Date: Wed, 15 Jul 2020 Pseudo R-squ.: 0.4912
Time: 16:09:17 Log-Likelihood: -10.581

Downloaded by Saravanan Sujatha

L.20 Fundamentals of Data Science

converged: True LL-Null: -20.794

Covariance Type: nonrobust LLR p-value: 3.668e-05
=============================================================
===
coef std err z P>|z| [0.025 0.975]

gmat -0.0262 0.011 -2.383 0.017 -0.048 -0.005

gpa 3.9422 1.964 2.007 0.045 0.092 7.792
work_experience 1.1983 0.482 2.487 0.013 0.254 2.143

Predicting on New Data :

# loading the testing dataset

df = pd.read_csv('logit_test1.csv', index_col
= 0) # defining the dependent and independent
variables Xtest = df[['gmat', 'gpa',
'work_experience']] ytest = df['admitted']
# performing predictions on the test
dataset yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
# comparing original and predicted values of y
print('Actual values', list(ytest.values))
print('Predictions :', prediction)

Output :
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

Downloaded by Saravanan Sujatha

Fundamentals of Data Science Laboratory L.21

Testing the accuracy of the model :

from sklearn.metrics import (confusion_matrix,

accuracy_score)
# confusion matrix
cm = confusion_matrix(ytest, prediction)
print ("Confusion Matrix : \n", cm)
# accuracy score of the model
print('Test accuracy = ', accuracy_score(ytest, prediction))

Output :
Confusion Matrix :
[[6 0]
[2 2]]
Test accuracy = 0.8



Downloaded by Saravanan Sujatha

L.22 Fundamentals of Data Science

Experiment No: 11

TIME SERIES ANALYSIS

Program
We are using Superstore sales data .
import warnings
import itertools
import numpy as
np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlibmatplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'

We start from time series analysis and forecasting for furniture sales.
df=pd.read_excel("Superstore.xls")
furniture = df.loc[df['Category'] ==
'Furniture'] A good 4-year furniture sales
data.
furniture['Order Date'].min(), furniture['Order Date'].max()
Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30
00:00:00’)
Data Preprocessing
This step includes removing columns we do not need, check missing values,
aggregate sales by date and so on.
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID',
'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product
ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']

Downloaded by Saravanan Sujatha

Fundamentals of Data Science Laboratory L.23

furniture.drop(cols,axis=1,inplace=True)
furniture=furniture.sort_values('Order
Date')furniture.isnull().sum()
furniture=furniture.groupby('OrderDate')
['Sales'].sum().reset_ index()

Order Date 0
Sales 0
dtype:
int64
Figure 1

Indexing with Time Series Data

furniture=furniture.set_index('OrderDate')
furniture.index

Figure 2
We will use the averages daily sales value for that month instead, and we are using
the start of each month as the timestamp.
y = furniture
['Sales'].resample('MS').mean() Have a
quick peek 2017 furniture sales data.
y['2017':]

Downloaded by Saravanan Sujatha

L.24 Fundamentals of Data Science

Figure 3

Visualizing Furniture Sales Time Series Data

y.plot (figsize=(15,6))
plt.show()



Downloaded by Saravanan Sujatha

Ad3411 Data Science and Analytics Laboratory
100% (7)
Ad3411 Data Science and Analytics Laboratory
24 pages
Machine Learning Week 2 Coursera
100% (1)
Machine Learning Week 2 Coursera
4 pages
AD3411-DATA SCIENCE AND ANALYTICS LABORATORY
No ratings yet
AD3411-DATA SCIENCE AND ANALYTICS LABORATORY
27 pages
dsa
No ratings yet
dsa
26 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2)_removed
24 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Python Code - Summary Statistics
No ratings yet
Python Code - Summary Statistics
6 pages
DSA LAB MANUAL
No ratings yet
DSA LAB MANUAL
17 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
FDSA Lab Record
No ratings yet
FDSA Lab Record
30 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
AD3411 (2)
No ratings yet
AD3411 (2)
28 pages
DSA LAB
No ratings yet
DSA LAB
28 pages
FDSA Lab Manual aim algorithm
No ratings yet
FDSA Lab Manual aim algorithm
32 pages
Exp 5-6-7-8
No ratings yet
Exp 5-6-7-8
8 pages
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
No ratings yet
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
68 pages
Fdsa Record Ai&Ds
No ratings yet
Fdsa Record Ai&Ds
26 pages
ml programs
No ratings yet
ml programs
41 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
1DA (1)
No ratings yet
1DA (1)
18 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
2330293Lab7SubmissionPPJ
No ratings yet
2330293Lab7SubmissionPPJ
13 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
Data science and analtics Laboratory
No ratings yet
Data science and analtics Laboratory
21 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
SMEC ML LAB MANUAL R22
No ratings yet
SMEC ML LAB MANUAL R22
21 pages
Lab 11,12 - Copy
No ratings yet
Lab 11,12 - Copy
7 pages
Lab Manual (DAV)
No ratings yet
Lab Manual (DAV)
33 pages
Cb161 Lab Manual
No ratings yet
Cb161 Lab Manual
25 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
GEC PRACTICALS
No ratings yet
GEC PRACTICALS
31 pages
Complex Problem AI
No ratings yet
Complex Problem AI
13 pages
4-12
No ratings yet
4-12
17 pages
Data Science
No ratings yet
Data Science
18 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
ML-LAB-MANUAL (1)
No ratings yet
ML-LAB-MANUAL (1)
21 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
FDA_BATCH2PROGRAM
No ratings yet
FDA_BATCH2PROGRAM
18 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
DS_lab manual
No ratings yet
DS_lab manual
31 pages
Statistical Analysis With Scipy?
No ratings yet
Statistical Analysis With Scipy?
9 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
L_AND_T_project_Naveen 24cs002895
No ratings yet
L_AND_T_project_Naveen 24cs002895
7 pages
Data Science Lab Experiments
No ratings yet
Data Science Lab Experiments
32 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
IDS-1
No ratings yet
IDS-1
30 pages
Fha-pyhton Program Unit 1-4.Docx
No ratings yet
Fha-pyhton Program Unit 1-4.Docx
13 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
pp DWDM 4 5
No ratings yet
pp DWDM 4 5
26 pages
DATA SCIENCE EXPERIMENTS
No ratings yet
DATA SCIENCE EXPERIMENTS
31 pages
Lab Mannual
No ratings yet
Lab Mannual
49 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
internal1-QP
No ratings yet
internal1-QP
1 page
model2-PYTHON
No ratings yet
model2-PYTHON
2 pages
AD3491-2MARKS-UNIT2
No ratings yet
AD3491-2MARKS-UNIT2
11 pages
model2-PYTHON-key
No ratings yet
model2-PYTHON-key
3 pages
ad3491-foda-question-bank
No ratings yet
ad3491-foda-question-bank
7 pages
ai&its education
No ratings yet
ai&its education
8 pages
cs3401-unit3
No ratings yet
cs3401-unit3
8 pages
GE3151-2 MARKS
No ratings yet
GE3151-2 MARKS
2 pages
2 Mark Material
No ratings yet
2 Mark Material
11 pages
Resemblance_CSE-CCS352
No ratings yet
Resemblance_CSE-CCS352
1 page
Selection sort
No ratings yet
Selection sort
9 pages
Algorithms - CS3401 - Question Bank and Important Questions with Answer
No ratings yet
Algorithms - CS3401 - Question Bank and Important Questions with Answer
55 pages
Cs3401 Algorithm Unit2
100% (1)
Cs3401 Algorithm Unit2
34 pages
CS3401 Algorithm Unit3
No ratings yet
CS3401 Algorithm Unit3
10 pages
daa-2
No ratings yet
daa-2
23 pages
Questionsfor Students
No ratings yet
Questionsfor Students
5 pages
Cs3401 Algorithm Unit4
No ratings yet
Cs3401 Algorithm Unit4
10 pages
CS3401 Algorithm Unit3
No ratings yet
CS3401 Algorithm Unit3
11 pages
Algorithm Lab Manual Updated NEW
No ratings yet
Algorithm Lab Manual Updated NEW
65 pages
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 01 - by WWW - LearnEngineering.in
No ratings yet
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 01 - by WWW - LearnEngineering.in
23 pages
Signal Processing
No ratings yet
Signal Processing
367 pages
M.Tech ICE
No ratings yet
M.Tech ICE
96 pages
2020 Quiz 1
No ratings yet
2020 Quiz 1
2 pages
A.P. Euro
No ratings yet
A.P. Euro
2 pages
IT-312 N.S Course Outline
No ratings yet
IT-312 N.S Course Outline
5 pages
Maths Pre Board
No ratings yet
Maths Pre Board
7 pages
Computer Project Finals
No ratings yet
Computer Project Finals
41 pages
Latihan CAPM CAL
No ratings yet
Latihan CAPM CAL
6 pages
NUMERICALS
No ratings yet
NUMERICALS
6 pages
Unit 1 - Curve Fitting & Statistical Methods
No ratings yet
Unit 1 - Curve Fitting & Statistical Methods
23 pages
Pragya
100% (1)
Pragya
31 pages
Doria, John Carlo M. - Quiz 2
No ratings yet
Doria, John Carlo M. - Quiz 2
20 pages
Binary Search Trees: Objectives
No ratings yet
Binary Search Trees: Objectives
36 pages
Foundation Course - Statistics for Data Science II
No ratings yet
Foundation Course - Statistics for Data Science II
2 pages
k214548 AI Lab 3.ipynb - Colaboratory
No ratings yet
k214548 AI Lab 3.ipynb - Colaboratory
4 pages
AI ML Session Slides
No ratings yet
AI ML Session Slides
34 pages
Chapter 3
No ratings yet
Chapter 3
10 pages
Model Predictive Control
No ratings yet
Model Predictive Control
23 pages
A Gentle Introduction To Object Recognition With Deep Learning
No ratings yet
A Gentle Introduction To Object Recognition With Deep Learning
9 pages
Computerised Paper Evaluation Using Neural Network
90% (10)
Computerised Paper Evaluation Using Neural Network
19 pages
Project Management: To Accompany
No ratings yet
Project Management: To Accompany
64 pages
Integrating Factors Found by Inspection
No ratings yet
Integrating Factors Found by Inspection
28 pages
Cascaded Integrator Comb Digital Filters Paper Hogenauer
No ratings yet
Cascaded Integrator Comb Digital Filters Paper Hogenauer
8 pages
1 Work Sampling 1
No ratings yet
1 Work Sampling 1
8 pages
Arrangements: Apoorva Sharma
No ratings yet
Arrangements: Apoorva Sharma
16 pages
Untitled
No ratings yet
Untitled
31 pages
Tutorial 3 - Functional Forms
No ratings yet
Tutorial 3 - Functional Forms
3 pages
Chapter 12 Decision-Making Under Conditions of Risk and Uncertainty
No ratings yet
Chapter 12 Decision-Making Under Conditions of Risk and Uncertainty
5 pages

Datascience Lab

Uploaded by

Datascience Lab

Uploaded by

INDEX

Sl.No LIST OF EXPERIMENTS Pg.No.

Tools: Python, Numpy, Scipy, Matplotib, Pandas,Statmodels,

1. Working with Pandas data frame 2

Basic Plots using Matplotlib

Frequency distributors, Averages, Variability

Normal Curves, Correlation and scatter plots, Correlation

9. Building and validating linear models 16

10. Building and validating logistic models 19

11. Time series analysis 22

WORKING WITH PANDAS DATA FRAMES

Downloaded by Saravanan Sujatha

BASIC PLOTS USING MATPLOTLIB

# set the intervals by which y-

Downloaded by Saravanan Sujatha

FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY

Downloaded by Saravanan Sujatha

NORMAL CURVES, CORRELATION AND SCATTER PLOTS,

#Correlation and scatter

Downloaded by Saravanan Sujatha

Downloaded by Saravanan Sujatha

Y = [25, 25, 27, 31, 32]

Downloaded by Saravanan Sujatha

Downloaded by Saravanan Sujatha

Downloaded by Saravanan Sujatha

Downloaded by Saravanan Sujatha

Downloaded by Saravanan Sujatha

Downloaded by Saravanan Sujatha

## Cross Checking using the internal function from SciPy

Downloaded by Saravanan Sujatha

Downloaded by Saravanan Sujatha

BUILDING AND VALIDATING LINEAR MODELS

You will find these details in output:

Downloaded by Saravanan Sujatha

— PTRATIO pupil-teacher ratio by town

First five records from data set

Downloaded by Saravanan Sujatha

Heat map of overall data set

Regression plot with RM and MEDV

Downloaded by Saravanan Sujatha

BUILDING AND VALIDATING LOGISTICS MODELS

Building the Logistic Regression model:

Downloaded by Saravanan Sujatha

converged: True LL-Null: -20.794

gmat -0.0262 0.011 -2.383 0.017 -0.048 -0.005

Predicting on New Data :

# loading the testing dataset

Downloaded by Saravanan Sujatha

Testing the accuracy of the model :

from sklearn.metrics import (confusion_matrix,

Downloaded by Saravanan Sujatha

TIME SERIES ANALYSIS

Downloaded by Saravanan Sujatha

Indexing with Time Series Data

Downloaded by Saravanan Sujatha

Visualizing Furniture Sales Time Series Data

Downloaded by Saravanan Sujatha

You might also like