Data Scientist Nanodegree Syllabus: Before You Start
Data Scientist Nanodegree Syllabus: Before You Start
Updated 8/7/19
● Use principles of statistics and probability to design and execute A/B tests and recommendation
engines to assist businesses in making data-automated decisions.
● Deploy a data science solution to a basic flask app.
● Manipulate and analyze distributed datasets using Apache Spark.
● Communicate results effectively to stakeholders.
Estimated Length of Program: 4 months
Program Structure: Self-paced
Textbooks required: None
Textbooks optional: Elements of Statistical Learning, Machine Learning: A Probabilistic Perspective, Python
Machine Learning
Instructional Tools Available: Video lectures, mentor-led student community, forums, project reviews
Syllabus
COMMUNICATING WITH ➔ Implement best practices in sharing your code and written summaries
STAKEHOLDERS ➔ Learn what makes a great data science blog
➔ Learn how to create your ideas with the data science community
Updated 8/7/19
NATURAL LANGUAGE ➔ Prepare text data for analysis with tokenization, lemmatization, and
PROCESSING removing stop words
➔ Use scikit-learn to transform and vectorize text data
➔ Build features with bag of words and tf-idf
➔ Extract features with tools such as named entity recognition and part of
speech tagging
➔ Build an NLP model to perform sentiment analysis
MACHINE LEARNING ➔ Understand the advantages of using machine learning pipelines to
PIPELINES streamline the data preparation and modeling process
➔ Chain data transformations and an estimator with scikit-learn’s Pipeline
➔ Use feature unions to perform steps in parallel and create more complex
workflows
Updated 8/7/19
EXPERIMENT DESIGN ➔ Understand how to set up an experiment, and the ideas associated with
experiments vs. observational studies
➔ Defining control and test conditions
➔ Choosing control and testing groups
MATRIX FACTORIZATION ➔ Understand the pitfalls of traditional methods and pitfalls of measuring
FOR RECOMMENDATIONS the influence of recommendation engines under traditional regression
and classification techniques.
➔ Create recommendation engines using matrix factorization and
FunkSVD
➔ Interpret the results of matrix factorization to better understand latent
features of customer data
➔ Determine common pitfalls of recommendation engines like the cold start
problem and difficulties associated with usual tactics for assessing the
Updated 8/7/19
ELECTIVE 1: DOG BREED ➔ Use convolutional neural networks to classify different dogs according to
CLASSIFICATION their breeds
➔ Deploy your model to allow others to upload images of their dogs and
send them back the corresponding breeds
➔ Complete one of the most popular projects in Udacity history, and show
the world how you can use your deep learning skills to entertain an
audience!
ELECTIVE 2: STARBUCKS ➔ Use purchasing habits to arrive at discount measures to obtain and retain
customers.
➔ Identify groups of individuals that are most likely to be responsive to
rebates.
ELECTIVE 3: ARVATO ➔ Work through a real-world dataset and challenge provided by Arvato
FINANCIAL SERVICES Financial Services, a Bertelsmann company
➔ Top performers have a chance at an interview with Arvato or another
Bertelsmann company!
ELECTIVE 4: SPARK FOR BIG ➔ Take a course on Apache Spark and complete a project using a massive,
DATA distributed dataset to predict customer churn
➔ Learn to deploy your Spark cluster on either AWS or IBM Cloud
ELECTIVE 5: YOUR CHOICE ➔ Use your skills to tackle any other project of your choice
Updated 8/7/19