Lecture Week 6-Data Scraping and Data Wrangling
Lecture Week 6-Data Scraping and Data Wrangling
Python-Week 6
• Data wrangling—also called data cleaning
or data remediation—refers to a variety of
processes designed to transform raw data
into more readily used formats. The exact
methods differ from project to project
depending on the data you’re leveraging
and the goal you’re trying to achieve.
• Most Commonly-used Data Wrangling
Data
include:
• Merging multiple data sources into a
single dataset for analysis
Wrangling • Identifying gaps in data (for example,
empty cells in a spreadsheet) and either
filling or deleting them
• Deleting data that’s either unnecessary
or irrelevant to the project you’re
working on
• Identifying extreme outliers in data and
either explaining the discrepancies or
removing them so that analysis can
take place
Data Wrangling
Steps
• Each data project requires a unique
approach to ensure its final dataset is
reliable and accessible. That being
said, several processes typically
inform the approach. These are
commonly referred to as data
wrangling steps or activities:
high variability.
Imputation
third individual in the same experimental
condition and block.
Regression imputation
• The predicted value obtained by
regressing the missing variable on other
variables. So instead of just taking the
mean, you’re taking the predicted value,
based on other variables. This preserves
relationships among variables involved in
the imputation model.
Regression
Imputation