Data Analysis W Pandas

The document provides code examples and descriptions for importing data sets, data wrangling, exploratory data analysis, model development, and model evaluation/refinement in Python. Some key steps include reading CSV data into a pandas dataframe, handling missing data, data normalization, correlation analysis, linear and polynomial regression modeling, evaluating models using metrics like R^2 and MSE, and optimizing models with techniques like k-fold cross validation and grid search.

Uploaded by

x7jn4sxdn9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Data Analysis W Pandas

Uploaded by

x7jn4sxdn9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Cheat Sheet: Importing Data Sets

Package/Method Description Code Example

Read CSV file to a pd.read_csv('data.csv', header=None) pd.read_csv('data.csv',
Read CSV data set pandas data frame header=0)
Print first few entries
of the pandas data
Print first few entries frame df.head() # default prints first 5 entries
Print last few entries
of the pandas data
Print last few entries frame df.tail() # default prints last 5 entries
Assign appropriate
Assign header header names to the
names data frame df.columns = ['Column1', 'Column2', ...]
Replace "?" entries
Replace "?" with with NaN from
NaN Numpy library df = df.replace("?", np.nan)
Retrieve data types
of the data frame
Retrieve data types columns df.dtypes
Retrieve statistical
Retrieve statistical description of the
description data set df.describe(include="all")
Retrieve summary of
Retrieve data set the data set from the
summary data frame df.info()
Save processed data
Save data frame to frame to a CSV file
CSV with a specified path df.to_csv('processed_data.csv')

Cheat Sheet: Data Wrangling

Package/Method Description Code Example

Replace
missing values
Replace missing with mode
data with frequency common entry df['attribute'].fillna(df['attribute'].mode()[0], inplace=True)
Replace
missing values
Replace missing with mean of
data with mean entries df['attribute'].fillna(df['attribute'].mean(), inplace=True)
Fix data types
of columns in
Fix the data types the dataframe df['numeric_col'] = df['numeric_col'].astype('float64')
Data Normalization Normalize df['attribute'] = (df['attribute'] - df['attribute'].min()) / (df['attribute'].max()
- df['attribute'].min())
data in a
Package/Method Description Code Example
column
between 0
and 1
Create bins for
better analysis
and
Binning visualization pd.cut(df['numeric_col'], bins=5)
Change label
name of a
Change column dataframe
name column df.rename(columns={'old_name':'new_name'}, inplace=True)
Create
indicator
variables for
categorical
Indicator Variables data pd.get_dummies(df['categorical_col'])

Cheat Sheet: Exploratory Data Analysis

Package/Method Description Code Example

Complete dataframe Correlation matrix using
correlation all attributes df.corr()
Specific Attribute Correlation matrix using
correlation specific attributes df[['attr1', 'attr2']].corr()
Create scatter plot for
dependent vs
Scatter Plot independent variables plt.scatter(df['independent'], df['dependent'])
Create regression plot
using dependent and
Regression Plot independent variables sns.regplot(x='independent', y='dependent', data=df)
Create box-and-whisker
Box plot plot for variables sns.boxplot(x='category', y='numeric', data=df)
Create subset of data
Grouping by based on different
attributes attributes df_group = df.groupby('attribute')
Group data and display
average value of
GroupBy statements numerical attributes df_group = df.groupby('attr')['numeric'].mean()
Create Pivot tables for pivot = df.pivot_table(index='attr1', columns='attr2',
Pivot Tables data representation values='numeric')
Create heatmap using
Pseudocolor plot Pivot table data plt.pcolor(pivot, cmap='RdBu')
Pearson Coefficient Calculate Pearson
and p-value Coefficient and p-value pearson_coef, p_value = stats.pearsonr(df['attr1'], df['attr2'])
Cheat Sheet: Model Development

Process Description Code Example

Create Linear
Linear Regression model from sklearn.linear_model import LinearRegression lr =
Regression object LinearRegression()
Train Linear
Train Linear Regression model on
Regression input and output
model attributes X = df[['attr1', 'attr2']] Y = df['target'] lr.fit(X, Y)
Generate Predict output for set
output of input attribute
predictions values Y_hat = lr.predict(X)
Identify the Get slope coefficient
coefficient and and intercept values
intercept of the model coeff = lr.coef intercept = lr.intercept_
Create residual plot
for regression
Residual Plot analysis sns.residplot(x=df['attr1'], y=df['attr2'])
Plot distribution of
data with respect to
Distribution Plot an attribute sns.distplot(df['attribute'], hist=False)
Fit polynomial
Polynomial regression model
Regression using numpy f = np.polyfit(x, y, deg) p = np.poly1d(f) Y_hat = p(x)
Generate new feature
Multi-variate matrix with
Polynomial polynomial
Regression combinations pr = PolynomialFeatures(degree=2) Z_pr = pr.fit_transform(Z)
Create data pipelines
to simplify from sklearn.pipeline import Pipeline pipe = Pipeline([('scale',
Pipeline processing steps StandardScaler()), ('model', LinearRegression())])
Calculate R^2 for
linear and
polynomial
R^2 value regression R2_score = lr.score(X, Y) R2_score = r2_score(y, p(x))
Calculate Mean from sklearn.metrics import mean_squared_error mse =
MSE value Squared Error mean_squared_error(Y, Y_hat)

Cheat Sheet: Model Evaluation and Refinement

Process Description Code Example

Splitting
data for Separate data
training and into training from sklearn.model_selection import train_test_split x_train, x_test, y_train,
testing and testing sets y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
Process Description Code Example
Evaluate model
Cross performance
validation using cross- from sklearn.model_selection import cross_val_score scores =
score validation cross_val_score(model, X, Y, cv=5)
Predict output
Cross using cross-
validation validated from sklearn.model_selection import cross_val_predict y_pred =
prediction model cross_val_predict(model, X, Y, cv=4)
Ridge Implement
Regression Ridge from sklearn.linear_model import Ridge ridge_model =
and Regression Ridge(alpha=0.5) ridge_model.fit(X_train, Y_train) yhat =
Prediction model ridge_model.predict(X_test)
Use Grid Search from sklearn.model_selection import GridSearchCV param_grid = {'alpha':
to find best [0.001, 0.01, 0.1, 1, 10, 100]} grid_search = GridSearchCV(Ridge(),
model param_grid, cv=5) grid_search.fit(X, Y) `best_params =
Grid Search parameters grid_search.best_params_

Advanced Data Analysis - Lecture Notes
No ratings yet
Advanced Data Analysis - Lecture Notes
874 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Northern Napa Valley Winery
No ratings yet
Northern Napa Valley Winery
5 pages
Python Data Analysis: Exploratory Data Analysis
No ratings yet
Python Data Analysis: Exploratory Data Analysis
1 page
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Python Basics - Hamza Zahoor
No ratings yet
Python Basics - Hamza Zahoor
6 pages
Cheat Sheet Modeldeploy
No ratings yet
Cheat Sheet Modeldeploy
2 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
download
No ratings yet
download
3 pages
aide memoire preparation des données
No ratings yet
aide memoire preparation des données
2 pages
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
No ratings yet
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
9 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
Hint_sheet
No ratings yet
Hint_sheet
13 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
data analysis
No ratings yet
data analysis
42 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Data Exploration in Python PDF
No ratings yet
Data Exploration in Python PDF
1 page
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Data_Engineer_Interview__1740985064
No ratings yet
Data_Engineer_Interview__1740985064
14 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
ML Complete Notes Hridoy.docx
No ratings yet
ML Complete Notes Hridoy.docx
5 pages
Practical_1
No ratings yet
Practical_1
5 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Tutorial Data Visualization Pandas Matplotlib Seaborn
No ratings yet
Tutorial Data Visualization Pandas Matplotlib Seaborn
32 pages
Code shabab error 7
No ratings yet
Code shabab error 7
5 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
pp DWDM 4 5
No ratings yet
pp DWDM 4 5
26 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Appendix Sta404
No ratings yet
Appendix Sta404
5 pages
Analisa Pengaruh Fasilitas Dan Kepuasan Pelanggan Terhadap Loyalitas Pelanggan Menginap Di Mikie Holiday Resort Dan Hotel Berastagi
No ratings yet
Analisa Pengaruh Fasilitas Dan Kepuasan Pelanggan Terhadap Loyalitas Pelanggan Menginap Di Mikie Holiday Resort Dan Hotel Berastagi
13 pages
Nural Network
No ratings yet
Nural Network
12 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
Activity 5 (Time Series) - Rudinas
No ratings yet
Activity 5 (Time Series) - Rudinas
7 pages
Clog P Dengan Aktivitas (Log 1/IC) : Regression
No ratings yet
Clog P Dengan Aktivitas (Log 1/IC) : Regression
6 pages
M346-201306 xxx
No ratings yet
M346-201306 xxx
24 pages
7 Regression With Stationary Time-series Data-revised(1)
No ratings yet
7 Regression With Stationary Time-series Data-revised(1)
75 pages
Chapter 7
No ratings yet
Chapter 7
39 pages
Business Statistics by Gupta 365 379
No ratings yet
Business Statistics by Gupta 365 379
15 pages
EViews 7 Users Guide I
100% (3)
EViews 7 Users Guide I
686 pages
Econometrics Lecture Note Chapter 4 and 5
No ratings yet
Econometrics Lecture Note Chapter 4 and 5
39 pages
11 ANOVA (Student Version)
No ratings yet
11 ANOVA (Student Version)
30 pages
Muhammad Luthfi Mahendra - 2001036085 - Chapter 6 Resume
No ratings yet
Muhammad Luthfi Mahendra - 2001036085 - Chapter 6 Resume
5 pages
Regression Analysis
100% (1)
Regression Analysis
43 pages
Linear Regression: Volker Tresp 2017
No ratings yet
Linear Regression: Volker Tresp 2017
25 pages
BDS-Homework-1-Submission.ipynb - Colab
No ratings yet
BDS-Homework-1-Submission.ipynb - Colab
11 pages
Influence and Outliers
No ratings yet
Influence and Outliers
37 pages
Zuur Et Al 2009 BOOK - Chap01 - Introduction
No ratings yet
Zuur Et Al 2009 BOOK - Chap01 - Introduction
10 pages
Full Download Econometrics of panel data : methods and applications First Edition Biørn PDF DOCX
100% (2)
Full Download Econometrics of panel data : methods and applications First Edition Biørn PDF DOCX
55 pages
Sonek Assignment 2
No ratings yet
Sonek Assignment 2
3 pages
BSNL Research
No ratings yet
BSNL Research
3 pages
Lecture 10 PDF
No ratings yet
Lecture 10 PDF
73 pages
Random Effects Probit and Logit Understanding Predictions and Marginal Effects
No ratings yet
Random Effects Probit and Logit Understanding Predictions and Marginal Effects
9 pages
Assessment 2 UEL CN 7000
No ratings yet
Assessment 2 UEL CN 7000
10 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
13 pages
Sample Sec 3
No ratings yet
Sample Sec 3
16 pages
0.1 Gam - Logit: Generalized Additive Model For Dichoto-Mous Dependent Variables
No ratings yet
0.1 Gam - Logit: Generalized Additive Model For Dichoto-Mous Dependent Variables
8 pages