0% found this document useful (0 votes)

93 views23 pages

Exp7 11 Data Science

This document describes 10 experiments on data analysis and visualization using Python. Experiment 7 introduces data frames and how to create, merge and apply functions to them. Experiment 8 covers various visualization effects, plotting with layers, and overriding aesthetics. Experiment 9 discusses creating histograms and density charts to visualize and explore data. Experiment 10 is about implementing simple linear regression, including fitting a regression line, evaluating the model, and visualizing the results.

Uploaded by

Nikhil Ranjan 211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views23 pages

Exp7 11 Data Science

Uploaded by

Nikhil Ranjan 211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Experiment 7

Experiment name: Creating a Data Frame and Matrix-like Operations on a Data Frame.
Merging two Data Frames and Applying functions to Data Frames

Objectives:To learn how to create data frame and perform operations on it

Prerequisites: knowledge of Python

Key terms: data, frame

Experimental set-up/Equipment/Apparatus/Tools: -
1 Computer System
2 Google Colab /python Installed on system with editor (like pycharm, jupyter)

Theory and application

DataFramesin Python are very similar: they come with the Pandas library, and they are
defined as two-dimensional labeled data structures with columns of potentially different
types.

In general, we could say that the Pandas DataFrame consists of three main components: the
data, the index, and the columns.

1. Firstly, the DataFrame can contain data that is:

 a Pandas DataFrame

 a Pandas Series: a one-dimensional labeled array capable of holding any data type
with axis labels or index. An example of a Series object is one column from a
DataFrame.

 a NumPy ndarray, which can be a record or structured

 a two-dimensional ndarray

 Dictionaries of one-dimensional ndarray’s, lists, dictionaries or Series.

Experimental Procedure-

1. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
2. Type a python program using input, output and calculations
3. Save the program
4. Execute it.

import pandas

import pandas as pd

print(pd. version )

Series

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

Return the first value of the Series:

print(myvar[0])
Create Labels

With the index argument, you can name your own labels
import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

print(myvar["y"])

Create a simple Pandas Series from a dictionary:

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)
Create a Series using only data from "day1" and "day2":

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

print(myvar)

print(myvar["day1"])

DataFrames

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table. A Pandas DataFrame is

a 2 dimensional data structure, like a 2 dimensional array, or a table with rows
and columns
import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

Locate Row

As you can see from the result above, the DataFrame is like a table with rows
and columns.

Pandas use the loc attribute to return one or more specified

row(s) #refer to the row index:

print(df.loc[0])

#use a list of indexes:

print(df.loc[[0, 1]])
import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

#refer to the named index:

print(df.loc["day2"])

Merging of series into dataframe

import pandas as pd
import matplotlib.pyplot as plt

author = ['Jitender', 'Purnima', 'Arpit', 'Jyoti']

article = [210, 211, 114, 178]

auth_series = pd.Series(author)
article_series = pd.Series(article)

frame = { 'Author': auth_series, 'Article': article_series }

df1 = pd.DataFrame(frame)

print(df1)
Precaution and sources of error:
The devices either computers or any other networking device should be handled
with due care and preserved carefully.

Results

Conclusions
Through this experiment we learntto perform various operations on a data frame
Experiment 8

Experiment name: Visualization Effects, Plotting with Layers, Overriding Aesthetics

Objectives: To learn various visualization effects

Prerequisites: knowledge of Python

Key terms: data, frame, visualization tool

Experimental set-up/Equipment/Apparatus/Tools: -
3 Computer System
4 Google Colab /python Installed on system with editor (like pycharm, jupyter)

Theory and application

Data visualization is the discipline of trying to understand data by placing it in a visual

context so that patterns, trends and correlations that might not otherwise be detected can be
exposed.

Python offers multiple great graphing libraries that come packed with lots of different
features. No matter if you want to create interactive, live or highly customized plots python
has an excellent library for you.

Following are popular plotting libraries:

 Matplotlib: low level, provides lots of freedom

 Pandas Visualization: easy to use interface, built on Matplotlib

 Seaborn: high-level interface, great default styles

 ggplot: based on R’s ggplot2, uses Grammar of Graphics

 Plotly: can create interactive plots

Experimental Procedure-

5. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
6. Type a python program using input, output and calculations
7. Save the program
8. Execute it.

importpandasaspd
iris=pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length',
'petal_width', 'class'])
print(iris.head())

Scatter Plot

fig, ax=plt.subplots()
# scatter the sepal_length against the sepal_width
ax.scatter(iris['sepal_length'], iris['sepal_width'])
# set a title and labels
ax.set_title('Iris Dataset')
ax.set_xlabel('sepal_length')
ax.set_ylabel('sepal_width')

Line Chart

columns=iris.columns.drop(['class'])
# create x data
x_data=range(0, iris.shape[0])
# create figure and axis
fig, ax=plt.subplots()
# plot each column
forcolumnincolumns:
ax.plot(x_data, iris[column], label=column)
# set title and legend
ax.set_title('Iris Dataset')
ax.legend()
Bar Chart

fig, ax=plt.subplots()
# count the occurrence of each class
data=wine_reviews['points'].value_counts()
# get x and y data
points=data.index
frequency=data.values
# create bar chart
ax.bar(points, frequency)
# set title and labels
ax.set_title('Wine Review Scores')
ax.set_xlabel('Points')
ax.set_ylabel('Frequency')

Precaution and sources of error:

The devices either computers or any other networking device should be handled
with due care and preserved carefully.

Results
Conclusions
Through this experiment we learntvarious charts for visualization, aesthatics and plotting in
layers
Experiment 9
Experiment name: Creating Histograms and Density Charts

Objectives:To learn how to create Histograms and Density Charts

Prerequisites: knowledge of Python

Key terms: Charts, Histograms

Experimental set-up/Equipment/Apparatus/Tools: -
1. Computer System
2. Google Colab /python Installed on system with editor (like pycharm, jupyter)

Theory and application

Histograms

A great way to get started exploring a single variable is with the histogram. A histogram
divides the variable into bins, counts the data points in each bin, and shows the bins on the x-
axis and the counts on the y-axis. In our case, the bins will be an interval of time representing
the delay of the flights and the count will be the number of flights falling into that interval.
The binwidth is the most important parameter for a histogram and we should always try out a
few different values of binwidth to select the best one for our data.

To make a basic histogram in Python, we can use either matplotlib or seaborn. The code
below shows function calls in both libraries that create equivalent figures. For the plot calls,
we specify the binwidth by the number of bins. For this plot, I will use bins that are 5 minutes
in length, which means that the number of bins will be the range of the data (from -60 to 120
minutes) divided by the binwidth, 5 minutes ( bins = int(180/5)).
Density Plots

Density plot is a smoothed, continuous version of a histogram estimated from the data. The
most common form of estimation is known as kernel density estimation. In this method, a
continuous curve (the kernel) is drawn at every individual data point and all of these curves
are then added together to make a single smooth density estimation. The kernel most often
used is a Gaussian (which produces a Gaussian bell curve at each data point).

Experimental Procedure-

9. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
10. Type a python program using input, output and calculations
11. Save the program
12. Execute it.

Histograms

importmatplotlib.pyplotasplt
importseabornassns
# matplotlib histogram
plt.hist(flights['arr_delay'], color='blue', edgecolor='black',
bins =int(180/5))
# seaborn histogram
sns.distplot(flights['arr_delay'], hist=True, kde=False,
bins=int(180/5), color='blue',
hist_kws={'edgecolor':'black'})
# Add labels
plt.title('Histogram of Arrival Delays')
plt.xlabel('Delay (min)')
plt.ylabel('Flights')

Density Plots
# Density Plot and Histogram of all arrival delays
sns.distplot(flights['arr_delay'], hist=True, kde=True,
bins=int(180/5), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 4})

Precaution and sources of error:

The devices either computers or any other networking device should be handled
with due care and preserved carefully.

Results
Conclusions
Through this experiment we learnt to make histograms and density charts
Experiment 10

Experiment name: Simple Linear Regression – Fitting, Evaluation and Visualization

Objectives:To implement Simple Linear Regression

Prerequisites: knowledge of Python

Key terms: Regression, Simple Linear Regression

Experimental set-up/Equipment/Apparatus/Tools: -
3. Computer System
4. Google Colab /python Installed on system with editor (like pycharm, jupyter)

Theory and application

Simple Linear Regression

To predict the relationship between two variables, we’ll use a simple linear regression
model. In a simple linear regression model, we’ll predict the outcome of a variable known as
the dependent variable using only one independent variable.
Building a linear regression model
To build a linear regression model in python, we’ll follow five steps:
1. Reading and understanding the data
2. Visualizing the data
3. Performing simple linear regression
4. Residual analysis
5. Predictions on the test set
Performing Simple Linear
Regression Equation of simple linear
regression y = c + mX
In our case:
y = c + m * TV

The m values are known as model coefficients or model parameters.

We’ll perform simple linear regression in four steps.
1. Create X and y
2. Create Train and Test set
3. Train your model
4. Evaluate the model
Create X and y
First, we’ll assign our feature variable/column TV as X and our target variable Sales as
y. To generalize,
The independent variable represents X, and y represents the target variable in a simple linear
regression model.

Experimental Procedure-

1. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
2. Type a python program using input, output and calculations
3. Save the program
4. Execute it.

# Supress Warnings
import warnings
warnings.filterwarnings('ignore')
# Import the numpy and pandas package
importnumpyasnp
import pandas aspd
# Read the given CSV file, and view some sample records
advertising =pd.read_csv("Company_data.csv")
advertising

# Import matplotlib and seaborn libraries to visualize the data

importmatplotlib.pyplotasplt
importseabornassns

# Using pairplot we'll visualize the data for correlation

sns.pairplot(advertising, x_vars=['TV', 'Radio','Newspaper'],

y_vars='Sales', size=4, aspect=1, kind='scatter')

plt.show()

X= advertising['TV']
y = advertising['Sales']

fromsklearn.model_selectionimporttrain_test_split
X_train, X_test, y_train, y_test=train_test_split(X, y, train_size=0.7,
test_size=0.3, random_state=100)

X_train

y_train

# Importing Statsmodels.api library from Stamodel package

importstatsmodels.apiassm
# Adding a constant to get an intercept
X_train_sm=sm.add_constant(X_train)

# Fitting the resgression line using 'OLS'

lr=sm.OLS(y_train, X_train_sm).fit()

# Performing a summary to list out all the different parameters of

the regression line fitted
lr.summary()

Precaution and sources of error:

The devices either computers or any other networking device should be handled
with due care and preserved carefully.

Results
Conclusions
Through this experiment we learnt to build and execute linear regression model
Experiment 11

Experiment name: build Multiple Linear Regression, Lasso and Ridge Regression

Objectives:To learn how to build Multiple Linear Regression, Lasso and Ridge Regression
Prerequisites: knowledge of Python
Key terms: Linear Regression, Lasso, Ridge Regression

Experimental set-up/Equipment/Apparatus/Tools: -
5. Computer System
6. Google Colab /python Installed on system with editor (like pycharm, jupyter)

Theory and application

Ridge and Lasso regression are powerful techniques generally used for creating parsimonious
models in presence of a ‘large’ number of features. Here ‘large’ can typically mean either of two
things:

1. Large enough to enhance the tendency of a model to overfit (as low as 10 variables
might cause overfitting)
2. Large enough to cause computational challenges. With modern systems, this situation
might arise in case of millions or billions of features

Though Ridge and Lasso might appear to work towards a common goal, the inherent properties
and practical use cases differ substantially. If you’ve heard of them before, you must know that
they work by penalizing the magnitude of coefficients of features along with minimizing the error
between predicted and actual observations. These are called ‘regularization’ techniques. The key
difference is in how they assign penalty to the coefficients:

1. Ridge Regression:
o Performs L2 regularization, i.e. adds penalty equivalent to square of the
magnitude of coefficients
o Minimization objective = LS Obj + α * (sum of square of coefficients)
2. Lasso Regression:
o Performs L1 regularization, i.e. adds penalty equivalent to absolute value of
the magnitude of coefficients
o Minimization objective = LS Obj + α * (sum of absolute value of
coefficients)
Experimental Procedure-

13. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
14. Type a python program using input, output and calculations
15. Save the program
16. Execute it.

Ridge Regression:

fromsklearn.linear_model import Ridge

Lasso Regression
defridge_regression(data, predictors, alpha,
models_to_plot={}): #Fit the model
ridgereg =
Ridge(alpha=alpha,normalize=True)
ridgereg.fit(data[predictors],data['y'])
y_pred =
ridgereg.predict(data[predictors])

#Check if a plot is to be made for the entered

alpha if alpha in models_to_plot:
plt.subplot(models_to_plot[alpha])
plt.tight_layout()
plt.plot(data['x'],y_pred)
plt.plot(data['x'],data['y'],'.')
plt.title('Plot for alpha: %.3g'%alpha)

#Return the result in pre-defined

format rss = sum((y_pred-data['y'])**2)
ret = [rss]
ret.extend([ridgereg.intercept_
]) ret.extend(ridgereg.coef_)
fromsklearn.linear_model import Lasso
deflasso_regression(data, predictors, alpha,
models_to_plot={}): #Fit the model
lassoreg = Lasso(alpha=alpha,normalize=True, max_iter=1e5)
lassoreg.fit(data[predictors],data['y'])
y_pred =
lassoreg.predict(data[predictors])

#Check if a plot is to be made for the entered

alpha if alpha in models_to_plot:
plt.subplot(models_to_plot[alpha])
plt.tight_layout()
plt.plot(data['x'],y_pred)
plt.plot(data['x'],data['y'],'.')
plt.title('Plot for alpha: %.3g'%alpha)

#Return the result in pre-defined

format rss = sum((y_pred-data['y'])**2)
ret = [rss]
ret.extend([lassoreg.intercept_
]) ret.extend(lassoreg.coef_)

Precaution and sources of error:

The devices either computers or any other networking device should be handled
with due care and preserved carefully.

Results
Conclusions
Through this experiment we learnt Multiple Linear Regression, Lasso and Ridge Regression

Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
No ratings yet
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
19 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
UNIT-05
No ratings yet
UNIT-05
26 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
14 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
2.5. Introduction To Matplotlib - 2
No ratings yet
2.5. Introduction To Matplotlib - 2
60 pages
final dev record
No ratings yet
final dev record
49 pages
KJD ML File
No ratings yet
KJD ML File
45 pages
Python Libraries 2
No ratings yet
Python Libraries 2
80 pages
Data Visualization in Python With Libraries
No ratings yet
Data Visualization in Python With Libraries
28 pages
CS1010S Lecture 11 - Visualising Data
No ratings yet
CS1010S Lecture 11 - Visualising Data
68 pages
DSA lab manual pgms_fINAL
No ratings yet
DSA lab manual pgms_fINAL
34 pages
Chapter11_DataVisualization2
No ratings yet
Chapter11_DataVisualization2
43 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
FDS (1)
No ratings yet
FDS (1)
38 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
42 pages
11 PlottingExperimental
No ratings yet
11 PlottingExperimental
40 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
ProgrammingForDS12_viz
No ratings yet
ProgrammingForDS12_viz
25 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
graphs using matplotlib
No ratings yet
graphs using matplotlib
23 pages
Data science and analtics Laboratory
No ratings yet
Data science and analtics Laboratory
21 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
ip study
No ratings yet
ip study
18 pages
Data Visualization - 1 by Matplot Lib
No ratings yet
Data Visualization - 1 by Matplot Lib
19 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Statistics and Research Methodology I-VI
No ratings yet
Statistics and Research Methodology I-VI
88 pages
DAVP Lab Manual
No ratings yet
DAVP Lab Manual
12 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Guidelines_ Data Exploration and Visualization
No ratings yet
Guidelines_ Data Exploration and Visualization
3 pages
Data Sci
No ratings yet
Data Sci
10 pages
BDA File
No ratings yet
BDA File
26 pages
Python Unit 4.Notes
No ratings yet
Python Unit 4.Notes
50 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
04.14 Visualization With Seaborn
No ratings yet
04.14 Visualization With Seaborn
11 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Mohit
No ratings yet
Mohit
19 pages
Machinelearning Prac
No ratings yet
Machinelearning Prac
17 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Datascienece
No ratings yet
Datascienece
18 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Intro to Business Analytics Syllabus 8_6_2020
No ratings yet
Intro to Business Analytics Syllabus 8_6_2020
8 pages
Project Report: People Management and Leadership Qualities
No ratings yet
Project Report: People Management and Leadership Qualities
15 pages
ICT
No ratings yet
ICT
25 pages
MCQ Analysis of Variance, ANOVA, Anova, Qtt501, Lpu-Noteshanger, LPU, Galgotias, Amity
No ratings yet
MCQ Analysis of Variance, ANOVA, Anova, Qtt501, Lpu-Noteshanger, LPU, Galgotias, Amity
10 pages
12 Graphical Analysis
No ratings yet
12 Graphical Analysis
35 pages
AWS Certified Data Analytics - Specialty
No ratings yet
AWS Certified Data Analytics - Specialty
2 pages
Chapter 4 Forecasting 10102024 015647pm
No ratings yet
Chapter 4 Forecasting 10102024 015647pm
45 pages
Module 3
No ratings yet
Module 3
98 pages
SPSS & Stata
No ratings yet
SPSS & Stata
2 pages
BI Lecture05A DataWrangling
No ratings yet
BI Lecture05A DataWrangling
51 pages
FINA340 8 Single Index Model - Full Version
No ratings yet
FINA340 8 Single Index Model - Full Version
14 pages
Chapter 4 Data Exploration and Visualization 2
No ratings yet
Chapter 4 Data Exploration and Visualization 2
11 pages
Bound Test Yuni
No ratings yet
Bound Test Yuni
3 pages
Question 2
No ratings yet
Question 2
2 pages
Final Report2027219
No ratings yet
Final Report2027219
22 pages
ANOVA For Feature Selection in Machine Learning by Sampath Kumar Gajawada Towards Data Science
No ratings yet
ANOVA For Feature Selection in Machine Learning by Sampath Kumar Gajawada Towards Data Science
10 pages
CH 5 Discussion Bnad 277
No ratings yet
CH 5 Discussion Bnad 277
6 pages
DMW Simp-Tie
No ratings yet
DMW Simp-Tie
2 pages
M.SC Statistics Information Brochure (R.J College)
No ratings yet
M.SC Statistics Information Brochure (R.J College)
5 pages
Curriculum and Syllabi: National Institute of Technology
No ratings yet
Curriculum and Syllabi: National Institute of Technology
30 pages
Irjet A Review Paper On Cloud Computing
No ratings yet
Irjet A Review Paper On Cloud Computing
5 pages
CGCJ - Resume Format - All Branch
No ratings yet
CGCJ - Resume Format - All Branch
1 page
CGCJ Resume Format 2
No ratings yet
CGCJ Resume Format 2
1 page
Digital Electronics: Instructio Ns To Candidates
No ratings yet
Digital Electronics: Instructio Ns To Candidates
1 page
Final Final Report
No ratings yet
Final Final Report
38 pages
Profitability Performance of HDFC Bank and ICICI Bank: An Analytical and Comparative Study
No ratings yet
Profitability Performance of HDFC Bank and ICICI Bank: An Analytical and Comparative Study
13 pages
Theory of Automata Notes
No ratings yet
Theory of Automata Notes
6 pages
A Review Paper On Cloud Computing: Nikhil Ranjan
No ratings yet
A Review Paper On Cloud Computing: Nikhil Ranjan
6 pages
Hypothesis Testing & ANOVA
No ratings yet
Hypothesis Testing & ANOVA
23 pages
Errors in Research, Features of Good Research and Research Design Process
No ratings yet
Errors in Research, Features of Good Research and Research Design Process
14 pages
Unit 4 Test Review Answers
No ratings yet
Unit 4 Test Review Answers
3 pages
Stata Commands PDF
No ratings yet
Stata Commands PDF
5 pages
How To Control Confounders in Statistic
No ratings yet
How To Control Confounders in Statistic
5 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
Nidhi Professional Summary:: Analysis Services
No ratings yet
Nidhi Professional Summary:: Analysis Services
5 pages
İstatistik Formülleri
No ratings yet
İstatistik Formülleri
2 pages
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet

Exp7 11 Data Science

Uploaded by

Exp7 11 Data Science

Uploaded by

Experiment 7

Objectives:To learn how to create data frame and perform operations on it

Prerequisites: knowledge of Python

Key terms: data, frame

Theory and application

1. Firstly, the DataFrame can contain data that is:

 a NumPy ndarray, which can be a record or structured

 Dictionaries of one-dimensional ndarray’s, lists, dictionaries or Series.

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Return the first value of the Series:

myvar = pd.Series(a, index = ["x", "y", "z"])

Create a simple Pandas Series from a dictionary:

calories = {"day1": 420, "day2": 380, "day3": 390}

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table. A Pandas DataFrame is

#load data into a DataFrame object:

Pandas use the loc attribute to return one or more specified

row(s) #refer to the row index:

#use a list of indexes:

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

Locate Named Indexes

#refer to the named index:

Merging of series into dataframe

author = ['Jitender', 'Purnima', 'Arpit', 'Jyoti']

frame = { 'Author': auth_series, 'Article': article_series }

Experiment name: Visualization Effects, Plotting with Layers, Overriding Aesthetics

Prerequisites: knowledge of Python

Key terms: data, frame, visualization tool

Theory and application

Data visualization is the discipline of trying to understand data by placing it in a visual

Following are popular plotting libraries:

 Matplotlib: low level, provides lots of freedom

 Pandas Visualization: easy to use interface, built on Matplotlib

 Seaborn: high-level interface, great default styles

 ggplot: based on R’s ggplot2, uses Grammar of Graphics

 Plotly: can create interactive plots

Precaution and sources of error:

Objectives:To learn how to create Histograms and Density Charts

Prerequisites: knowledge of Python

Key terms: Charts, Histograms

Theory and application

Precaution and sources of error:

Experiment name: Simple Linear Regression – Fitting, Evaluation and Visualization

Objectives:To implement Simple Linear Regression

Prerequisites: knowledge of Python

Key terms: Regression, Simple Linear Regression

Theory and application

Simple Linear Regression

The m values are known as model coefficients or model parameters.

# Import matplotlib and seaborn libraries to visualize the data

# Using pairplot we'll visualize the data for correlation

sns.pairplot(advertising, x_vars=['TV', 'Radio','Newspaper'],

y_vars='Sales', size=4, aspect=1, kind='scatter')

# Importing Statsmodels.api library from Stamodel package

# Fitting the resgression line using 'OLS'

# Performing a summary to list out all the different parameters of

Precaution and sources of error:

Theory and application

fromsklearn.linear_model import Ridge

#Check if a plot is to be made for the entered

#Return the result in pre-defined

#Check if a plot is to be made for the entered

#Return the result in pre-defined

Precaution and sources of error:

You might also like