0% found this document useful (0 votes)

113 views

Data Scientist Nanodegree Syllabus: Before You Start

This document outlines the syllabus for the Data Scientist Nanodegree program. It includes 3 projects: [1] building a blog to analyze and communicate findings from a dataset; [2] creating machine learning pipelines to categorize emergency text messages; and [3] designing a recommendation engine for an online community. Supporting lessons cover the data science process, software engineering, data engineering, experiment design, and communication skills. The goal is for students to learn skills needed for data scientist jobs such as data analysis, machine learning, and communicating results.

Uploaded by

Aditya the Retro

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views

Data Scientist Nanodegree Syllabus: Before You Start

Uploaded by

Aditya the Retro

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Scientist Nanodegree Syllabus

Before You Start

Prerequisites: The Data Scientist Nanodegree program is an advanced program designed to prepare you
for data scientist jobs. As such, you should have a high comfort level with a variety of topics before starting
the program. In order to successfully complete this program, we strongly recommend that the following
prerequisites are fulfilled. If you do not have the necessary prerequisites, Udacity has courses and programs
that prepare you for this Nanodegree program.

● Programming:
○ Python Programming: Writing functions, logic, control flow, and building basic applications, as
well as common data analysis libraries like NumPy and pandas
○ SQL programming: Querying databases using joins, aggregations, and subqueries
○ Comfortable with using the Terminal, version control in Git, and using GitHub
● Probability and Statistics
○ Descriptive Statistics: Calculating measures of center and spread, estimation distributions
○ Inferential Statistics: Sampling distributions, hypothesis testing
○ Probability: Probability theory, conditional probability
● Mathematics
○ Calculus: Maximizing and minimizing algebraic equations
○ Linear Algebra: Matrix manipulation and multiplication
● Data wrangling
○ Accessing database, CSV, and JSON data
○ Data cleaning and transformations using pandas and Sklearn
● Data visualization with matplotlib
○ Exploratory data analysis and visualization
○ Explanatory data visualizations and dashboards
● Machine Learning
○ Feature Engineering
○ Supervised Learning: Regression, classification, decision trees, random forest
○ Unsupervised Learning: PCA, Clustering

The following programs can prepare you to take this nanodegree program. There are also several free
courses that you can use to prepare.
● Programming for Data Science with Python
● Data Analyst Nanodegree Program
● Intro to Machine Learning Nanodegree Program

Educational Objectives: The ultimate goal of the Data Scientist Nanodegree program is for you to learn the
skills you need to perform well as a data scientist. As a graduate of this program, you will be able to:
● Use Python and SQL to access and analyze data from several different data sources.

Updated 8/7/19

● Use principles of statistics and probability to design and execute A/B tests and recommendation
engines to assist businesses in making data-automated decisions.
● Deploy a data science solution to a basic flask app.
● Manipulate and analyze distributed datasets using Apache Spark.
● Communicate results effectively to stakeholders.

Estimated Length of Program: 4 months
Program Structure: Self-paced
Textbooks required: None
Textbooks optional: Elements of Statistical Learning, Machine Learning: A Probabilistic Perspective, Python
Machine Learning
Instructional Tools Available: Video lectures, mentor-led student community, forums, project reviews

Syllabus

Project 1: Write a Data Science Blog Post

In this project, you will choose a dataset, identify three questions, and analyze the data to find answers to
these questions. You will create a GitHub repository with your project, and write a blog post to communicate
your findings to the appropriate audience. This project will help you reinforce and extend your knowledge of
machine learning, data visualization, and communication.

Supporting Lessons: Solving Problems with Data Science

Supporting Lessons Learning Outcomes

THE DATA SCIENCE ➔ Apply the CRISP-DM process to business applications

PROCESS ➔ Wrangle, explore, and analyze a dataset
➔ Apply machine learning for prediction
➔ Apply statistics for descriptive and inferential understanding
➔ Draw conclusions that motivate others to act on your results

COMMUNICATING WITH ➔ Implement best practices in sharing your code and written summaries
STAKEHOLDERS ➔ Learn what makes a great data science blog
➔ Learn how to create your ideas with the data science community

Project 2: Build Pipelines to Classify Messages with Figure Eight

Figure Eight (formerly Crowdflower) crowdsourced the tagging and translation of messages to apply artificial
intelligence to disaster response relief. In this project, you’ll build a data pipeline to prepare the message
data from major natural disasters around the world. You’ll build a machine learning pipeline to categorize
emergency text messages based on the need communicated by the sender.

Updated 8/7/19

Supporting Lessons: Software Engineering for Data Scientists

Supporting Lessons Learning Outcomes

SOFTWARE ENGINEERING ➔ Write clean, modular, and well-documented code

PRACTICES ➔ Refactor code for efficiency
➔ Create unit tests to test programs
➔ Write useful programs in multiple scripts
➔ Track actions and results of processes with logging
➔ Conduct and receive code reviews

OBJECT ORIENTED ➔ Understand when to use object oriented programming

PROGRAMMING ➔ Build and use classes
➔ Understand magic methods
➔ Write programs that include multiple classes, and follow good code
structure
➔ Learn how large, modular Python packages, such as pandas and
scikit-learn, use object oriented programming
➔ Portfolio Exercise: Build your own Python package

WEB DEVELOPMENT ➔ Learn about the components of a web app

➔ Build a web application that uses Flask, Plotly, and the Bootstrap
framework
➔ Portfolio Exercise: Build a data dashboard using a dataset of your choice
and deploy it to a web application

Supporting Lessons: Data Engineering for Data Scientists

Supporting Lessons Learning Outcomes

ETL PIPELINES ➔ Understand what ETL pipelines are

➔ Access and combine data from CSV, JSON, logs, APIs, and databases
➔ Standardize encodings and columns
➔ Normalize data and create dummy variables
➔ Handle outliers, missing values, and duplicated data
➔ Engineer new features by running calculations
➔ Build a SQLite database to store cleaned data

NATURAL LANGUAGE ➔ Prepare text data for analysis with tokenization, lemmatization, and
PROCESSING removing stop words
➔ Use scikit-learn to transform and vectorize text data
➔ Build features with bag of words and tf-idf
➔ Extract features with tools such as named entity recognition and part of
speech tagging
➔ Build an NLP model to perform sentiment analysis

MACHINE LEARNING ➔ Understand the advantages of using machine learning pipelines to
PIPELINES streamline the data preparation and modeling process
➔ Chain data transformations and an estimator with scikit-learn’s Pipeline
➔ Use feature unions to perform steps in parallel and create more complex
workflows

Updated 8/7/19

➔ Grid search over pipeline to optimize parameters for entire workflow

➔ Complete a case study to build a full machine learning pipeline that
prepares data and creates a model for a dataset

Project 3: Design a Recommendation Engine with IBM

IBM has an online data science community where members can post tutorials, notebooks, articles, and
datasets. In this project, you will build a recommendation engine, based on user behavior and social
network in IBM Watson Studio's data platform, to surface content most likely to be relevant to a user.

Supporting Lessons: Experiment Design

Supporting Lessons Learning Outcomes

EXPERIMENT DESIGN ➔ Understand how to set up an experiment, and the ideas associated with
experiments vs. observational studies
➔ Defining control and test conditions
➔ Choosing control and testing groups

STATISTICAL CONCERNS ➔ Applications of statistics in the real world

OF EXPERIMENTATION ➔ Establishing key metrics
➔ SMART experiments: Specific, Measurable, Actionable, Realistic, Timely

A/B TESTING ➔ How it works and its limitations

➔ Sources of Bias: Novelty and Recency Effects
➔ Multiple Comparison Techniques (FDR, Bonferroni, Tukey)
➔ Portfolio Exercise: Using a technical screener from Starbucks to analyze the
results of an experiment and write up your findings

Supporting Lessons: Recommendations

Supporting Lessons Learning Outcomes

INTRODUCTION TO ➔ Distinguish between common techniques for creating recommendation

RECOMMENDATION engines including knowledge based, content based, and collaborative
ENGINES filtering based methods.
➔ Implement each of these techniques in python.
➔ List business goals associated with recommendation engines, and be able
to recognize which of these goals are most easily met with existing
recommendation techniques.

MATRIX FACTORIZATION ➔ Understand the pitfalls of traditional methods and pitfalls of measuring
FOR RECOMMENDATIONS the influence of recommendation engines under traditional regression
and classification techniques.
➔ Create recommendation engines using matrix factorization and
FunkSVD
➔ Interpret the results of matrix factorization to better understand latent
features of customer data
➔ Determine common pitfalls of recommendation engines like the cold start
problem and difficulties associated with usual tactics for assessing the

Updated 8/7/19

effectiveness of recommendation engines using usual techniques, and

potential solutions.

Project 4: Data Science Capstone Project

In this capstone project, you will leverage what you’ve learned throughout the program to build a data
science project of your choosing. You will define the problem you want to solve, identify and explore the
data, then perform your analyses and develop a set of conclusions. You will present the analysis and your
conclusions in a blog post and GitHub repository. This project will serve as a demonstration of your ability as
a data scientist, and will be an important component of your job-ready portfolio.

Supporting Lessons: Data Science Projects

Supporting Lessons Learning Outcomes

ELECTIVE 1: DOG BREED ➔ Use convolutional neural networks to classify different dogs according to
CLASSIFICATION their breeds
➔ Deploy your model to allow others to upload images of their dogs and
send them back the corresponding breeds
➔ Complete one of the most popular projects in Udacity history, and show
the world how you can use your deep learning skills to entertain an
audience!

ELECTIVE 2: STARBUCKS ➔ Use purchasing habits to arrive at discount measures to obtain and retain
customers.
➔ Identify groups of individuals that are most likely to be responsive to
rebates.

ELECTIVE 3: ARVATO ➔ Work through a real-world dataset and challenge provided by Arvato
FINANCIAL SERVICES Financial Services, a Bertelsmann company
➔ Top performers have a chance at an interview with Arvato or another
Bertelsmann company!

ELECTIVE 4: SPARK FOR BIG ➔ Take a course on Apache Spark and complete a project using a massive,
DATA distributed dataset to predict customer churn
➔ Learn to deploy your Spark cluster on either AWS or IBM Cloud

ELECTIVE 5: YOUR CHOICE ➔ Use your skills to tackle any other project of your choice

Updated 8/7/19

Digital Transformation As A Springboard For Product, Process and Business Model Innovation
No ratings yet
Digital Transformation As A Springboard For Product, Process and Business Model Innovation
7 pages
IADCS Handbook V3 - Oct 2004
No ratings yet
IADCS Handbook V3 - Oct 2004
69 pages
1.1 Rhcsa-Exam-Questions PDF
100% (2)
1.1 Rhcsa-Exam-Questions PDF
42 pages
Electronic Reverse Auction and The Public Sector: Factors of Success Moshe E. Shalev & Stee Asbjorensen
100% (3)
Electronic Reverse Auction and The Public Sector: Factors of Success Moshe E. Shalev & Stee Asbjorensen
25 pages
Business and Data Analytics
No ratings yet
Business and Data Analytics
4 pages
A Comparative Study of Machine Learning Algorithms For Gas Leak Detection
No ratings yet
A Comparative Study of Machine Learning Algorithms For Gas Leak Detection
9 pages
Adobe Analytics Complete Self-Assessment Guide
From Everand
Adobe Analytics Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
2.3.1 The McCulloch-Pitts Model of Neuron
No ratings yet
2.3.1 The McCulloch-Pitts Model of Neuron
2 pages
SYNON FAQ's
No ratings yet
SYNON FAQ's
15 pages
Syllabus MBA - Logistics-Management
No ratings yet
Syllabus MBA - Logistics-Management
26 pages
Information System and Architecture
No ratings yet
Information System and Architecture
4 pages
ERP-Research-eBook-Top 10 Pharmaceutical Solutions
No ratings yet
ERP-Research-eBook-Top 10 Pharmaceutical Solutions
12 pages
Title Proposal PPT 1 1
No ratings yet
Title Proposal PPT 1 1
6 pages
MIS Project
67% (3)
MIS Project
12 pages
Erp Questions To Ask Ebook
No ratings yet
Erp Questions To Ask Ebook
19 pages
Chapter08 - Leveraging Secondary Brand Associations To Build Brand Equity
No ratings yet
Chapter08 - Leveraging Secondary Brand Associations To Build Brand Equity
22 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
Bachelor of Science in Accountancy: Program Curriculum Ay 2020 - 2021
No ratings yet
Bachelor of Science in Accountancy: Program Curriculum Ay 2020 - 2021
6 pages
Data Warehousing
No ratings yet
Data Warehousing
24 pages
Data Science Bootcamp Curriculum 2
No ratings yet
Data Science Bootcamp Curriculum 2
7 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
Emc Data Science Study WP PDF
No ratings yet
Emc Data Science Study WP PDF
6 pages
Abell Model-Business Modeling - (Chapter 2 MSO)
No ratings yet
Abell Model-Business Modeling - (Chapter 2 MSO)
35 pages
Application of Big Data Analytics-5089
No ratings yet
Application of Big Data Analytics-5089
7 pages
How Do You Make Money by Giving Something Away For Free? With Ian Makgill
100% (1)
How Do You Make Money by Giving Something Away For Free? With Ian Makgill
27 pages
4 Data Science-Big Data
No ratings yet
4 Data Science-Big Data
22 pages
5.web Data Mining
No ratings yet
5.web Data Mining
41 pages
PPTs of Business Analytics
No ratings yet
PPTs of Business Analytics
22 pages
Assignment Solutions - 233
No ratings yet
Assignment Solutions - 233
6 pages
POL BigDataStatisticsJune2014
No ratings yet
POL BigDataStatisticsJune2014
27 pages
D7.2 Data Managment Plan v1.04
No ratings yet
D7.2 Data Managment Plan v1.04
14 pages
Case - Study of Data Warehouse
No ratings yet
Case - Study of Data Warehouse
14 pages
Data Analytics Program Brochure
No ratings yet
Data Analytics Program Brochure
29 pages
Data Mart Info
No ratings yet
Data Mart Info
5 pages
CU Data Science With SQL and Tableau
No ratings yet
CU Data Science With SQL and Tableau
4 pages
Careers - Data Analytics Training - Internshala Trainings PDF
No ratings yet
Careers - Data Analytics Training - Internshala Trainings PDF
2 pages
Assignement - Data Science For Business Growth and Big Data and Business Analytics
No ratings yet
Assignement - Data Science For Business Growth and Big Data and Business Analytics
5 pages
Management of Sales Territories and Quotas
No ratings yet
Management of Sales Territories and Quotas
7 pages
Capstone Project Proposal Template
No ratings yet
Capstone Project Proposal Template
1 page
Data Smart For Product Managers
100% (1)
Data Smart For Product Managers
13 pages
Data Science Case Study For Introduction
No ratings yet
Data Science Case Study For Introduction
19 pages
DAO2702 Programming For Business Analytics S2AY1819
No ratings yet
DAO2702 Programming For Business Analytics S2AY1819
3 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
Data Mapping
No ratings yet
Data Mapping
3 pages
Big Data Research Paper
No ratings yet
Big Data Research Paper
14 pages
BEM2044 W1 Introduction To Qualitative Marketing Research and Research Philosophy-1
No ratings yet
BEM2044 W1 Introduction To Qualitative Marketing Research and Research Philosophy-1
23 pages
Final UTS Report For Data Science Institute 2017-1-3
100% (3)
Final UTS Report For Data Science Institute 2017-1-3
39 pages
(Ebook) Operations Research Using Excel: A Case Study Approach by Vikas Singla ISBN 9780367646431, 0367646439 - Download the ebook with all fully detailed chapters
100% (1)
(Ebook) Operations Research Using Excel: A Case Study Approach by Vikas Singla ISBN 9780367646431, 0367646439 - Download the ebook with all fully detailed chapters
82 pages
Semester: 3 Course Name: Marketing Analytics Course Code: 18JBS315 Number of Credits: 3 Number of Hours: 30
No ratings yet
Semester: 3 Course Name: Marketing Analytics Course Code: 18JBS315 Number of Credits: 3 Number of Hours: 30
4 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
Foundations of Statistics With R
100% (1)
Foundations of Statistics With R
4 pages
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
No ratings yet
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
19 pages
Engineering-A Review Web Data Scrapping
No ratings yet
Engineering-A Review Web Data Scrapping
4 pages
Utilization of Big Data in The Monetary Sector
No ratings yet
Utilization of Big Data in The Monetary Sector
16 pages
Advanced Analytics: The Next Wave of Business Intelligence
No ratings yet
Advanced Analytics: The Next Wave of Business Intelligence
17 pages
MRA - Big Data Analytics - Its Impact On Changing Trends in Retail Industry
No ratings yet
MRA - Big Data Analytics - Its Impact On Changing Trends in Retail Industry
4 pages
Data Warehousing Case Studies .Compressed-1
No ratings yet
Data Warehousing Case Studies .Compressed-1
8 pages
Masters Thesis Proposal
No ratings yet
Masters Thesis Proposal
22 pages
Project
No ratings yet
Project
14 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Data Science Projects
No ratings yet
Data Science Projects
4 pages
A Guide To Deep Learning in Healthcare: Nature Medicine January 2019
No ratings yet
A Guide To Deep Learning in Healthcare: Nature Medicine January 2019
7 pages
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
(Ebook - ) Secrets of Borland C++ Masters PDF
No ratings yet
(Ebook - ) Secrets of Borland C++ Masters PDF
765 pages
Unit 3 AI Srs 13-14
No ratings yet
Unit 3 AI Srs 13-14
45 pages
IRS Publication 1075 - Effective January 1, 2014
No ratings yet
IRS Publication 1075 - Effective January 1, 2014
173 pages
Asav Vmware Guide
No ratings yet
Asav Vmware Guide
14 pages
A History of Windows - Microsoft Windows
No ratings yet
A History of Windows - Microsoft Windows
7 pages
2008 03
No ratings yet
2008 03
84 pages
FDD Process Model Diagram
No ratings yet
FDD Process Model Diagram
1 page
Vision Manual
100% (1)
Vision Manual
388 pages
AI - The Time To Root For Machines!: There Are Already Some Magnificent Works in AI To Boast About
No ratings yet
AI - The Time To Root For Machines!: There Are Already Some Magnificent Works in AI To Boast About
3 pages
Object-Oriented Oblivion: Hackery or Quackery?
No ratings yet
Object-Oriented Oblivion: Hackery or Quackery?
4 pages
What Is IOC (Or Dependency Injection) ?: Spring Tutorial
No ratings yet
What Is IOC (Or Dependency Injection) ?: Spring Tutorial
24 pages
Micro-Project Report Micro-Project Report: "Algebraic Function"
No ratings yet
Micro-Project Report Micro-Project Report: "Algebraic Function"
14 pages
Unix Basics
No ratings yet
Unix Basics
15 pages
Collins Harp Enterprises
No ratings yet
Collins Harp Enterprises
3 pages
Chapter 2 Decesion Making Statements DK Mamonai 09CE37
No ratings yet
Chapter 2 Decesion Making Statements DK Mamonai 09CE37
12 pages
Tugas Mata Kuliah Basis Data
No ratings yet
Tugas Mata Kuliah Basis Data
9 pages
Ethereum Evm Illustrated
No ratings yet
Ethereum Evm Illustrated
116 pages
C Programming - File Handling
No ratings yet
C Programming - File Handling
24 pages
Installing Lync Server 2013 Std. Ed. On Windows Server 2012 - OrcsWeb Hosting
No ratings yet
Installing Lync Server 2013 Std. Ed. On Windows Server 2012 - OrcsWeb Hosting
14 pages
VMW PPT Library Icons-Diagrams 2q12 1 of 3
No ratings yet
VMW PPT Library Icons-Diagrams 2q12 1 of 3
32 pages
Adaptive Lab Matlab Part3
0% (1)
Adaptive Lab Matlab Part3
3 pages
Omap U Boot Utils Refman
No ratings yet
Omap U Boot Utils Refman
72 pages
Monitoreo Sap Basis
No ratings yet
Monitoreo Sap Basis
6 pages
Catalog UFSBI
No ratings yet
Catalog UFSBI
3 pages
Ultimate SU650 3D NAND SSD: Features Ordering Information
No ratings yet
Ultimate SU650 3D NAND SSD: Features Ordering Information
2 pages
Data and Message Security
100% (1)
Data and Message Security
13 pages
Zip2 Elon Musk US6148260A
No ratings yet
Zip2 Elon Musk US6148260A
11 pages