Dand Syllabus v7 Terms 1
Dand Syllabus v7 Terms 1
➔ Use basic Python code to clean a dataset for analysis
➔ Run code to create visualizations from the wrangled data
➔ Analyze trends shown in the visualizations and report your conclusions
➔ Determine if this program is a good fit for your time and talents
NUMBERS AND STRINGS ➔ Learn about Python's numeric and string data types
➔ Use variables to store data
➔ Use built-in functions and methods
DATA STRUCTURES AND ➔ Use collection data types: lists, sets, and dictionaries
LOOPS ➔ Write `
for` and `while` loops to express repetition
➔ Practice refactoring and problem solving
FILES AND MODULES ➔ Use modules from the Python standard library and from
third-party libraries
➔ Read data from files on disk
➔ Use online resources to help solve problems
DATA ANALYSIS PROCESS ➔ Learn about the keys steps of the data analysis process
➔ Investigate multiple datasets using Python and Pandas
PANDAS AND NUMPY: ➔ Perform the entire data analysis process on a dataset
CASE STUDY 1 ➔ Learn to u
se NumPy and Pandas to wrangle, explore, analyze,
PANDAS AND NUMPY: ➔ Perform the entire data analysis process on a dataset
CASE STUDY 2 ➔ Learn more about NumPy and Pandas to wrangle, explore,
analyze, and visualize data.
Basic SQL ➔ Write common SQL commands including SELECT, FROM, and
WHERE, as well as corresponding logical operators
SQL Joins ➔ Write JOINs in SQL, as you are now able to combine data from
multiple sources to answer more complex business questions
SQL Aggregations ➔ Write common aggregations in SQL including COUNT, SUM, MIN,
and MAX
➔ Write CASE and DATE functions, as well as work with NULLs
Advanced SQL Queries ➔ Edit a database using CREATE TABLE, INSERT INTO, UPDATE, and
other statements
➔ Use window functions and subqueries to add steps to a query
➔ Use documentation to learn new functions and complete
complex tasks
SAMPLING ➔ Apply the concepts of probability and normalization to sample
DISTRIBUTIONS data sets
HYPOTHESIS TESTING ➔ Use critical values to make decisions on whether or not a
treatment has changed the value of a population parameter
T-TESTS ➔ Test the effect of a treatment or compare the difference in
means for two groups when we have small sample sizes
REGRESSION ➔ Build a linear regression model to understand the relationship
between independent and dependent variables
➔ Use linear regression results to make a prediction
INTRO TO DATA ➔ Identify each step of the data wrangling process (gathering,
WRANGLING assessing, and cleaning)
➔ Wrangle a CSV file downloaded from Kaggle using fundamental
gathering, assessing, and cleaning code
GATHERING DATA ➔ Gather data from multiple sources, including gathering files,
programmatically downloading files, web-scraping data, and
accessing data from APIs
➔ Import data of various file formats into pandas, including flat
files (e.g. TSV), HTML files, TXT files, and JSON files
➔ Store gathered data in a PostgreSQL database
CLEANING DATA ➔ Identify each step of the data cleaning process (defining, coding,
and testing)
➔ Clean data using Python and pandas
➔ Test cleaning code visually and programmatically using Python
WHAT IS EDA? ➔ Define and identify the importance of exploratory data analysis
(EDA)
EXPLORE ONE VARIABLE ➔ Quantify and visualize individual variables within a dataset
➔ Create histograms and boxplots
➔ Transform variables
➔ Examine and identify tradeoffs in visualizations
EXPLORE TWO VARIABLES ➔ Properly apply relevant techniques for exploring the relationship
between any two variables in a data set
➔ Create scatter plots
➔ Calculate correlations
➔ Investigate conditional means
EXPLORE MANY ➔ Reshape data frames and use aesthetics like color and shape to
VARIABLES uncover information
DIAMONDS AND PRICE ➔ Use predictive modeling to determine a good price for a
PREDICTIONS diamond
DESIGN PRINCIPLES ➔ Select the most effective chart or graph based on the data
being displayed
➔ Use color, shape, size, and other elements effectively
TELLING STORIES WITH ➔ Create Tableau dashboards and stories to effectively
TABLEAU communicate data