CU Data Science
CU Data Science
DATA SCIE N CE
Online Certificate Program
OVERVIEW
R is quickly becoming one of the most popular and effective programming languages
of data science. In this program, you’ll apply data science tools to the collection of
data and the translation of data into information, constructing models that can be
used to address the questions that you’re investigating. You’ll have the opportunity to
apply data analytics as a four-part process: gathering data, looking for patterns in that
data, finding insights in any patterns you discover, and using those insights to make
decisions. This process does not make decisions for you, but it will help you to better
understand the effects of the decisions you might make. Through an examination of
real-world data sets and different modeling techniques, as well as an in-depth look
at how the programming language R can be used to help you find patterns and derive
insights, you will gain valuable experience working in each stage of the data analytics
process, helping you and your organization to make better decisions – and gain a
sound scientific understanding of why you’re making the choices you’re making.
COURSES
• Understanding Data Analytics
• Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
• Finding Patterns in Data Using Cluster and Hotspot Analysis
• Regression Analysis and Discrete Choice Models
• Supervised Learning Techniques
• Neural Networks and Machine Learning
• Making Data-Driven Recommendations Using Optimization
• Making Predictions Using Simulation
Visit ecornell.cornell.edu
7 COURSES
KEY TAKEAWAYS
• Explore the data analytics process and • Predict the value of continuous variables
examine the tools available to improve with linear regression
decision making • Use neural networks to make
• Use unsupervised learning techniques predictions about new data
to help identify patterns in data and • Make forecasts from data collected over
create visualizations to better spot those time and measure their accuracy
patterns • Create linear programs and simulations
• Categorize data using supervised to optimize system performance and
learning algorithms dynamics
Visit ecornell.cornell.edu
7 COURSES
3
COURSE
DESCRIPTIONS
100% ONLINE
By some estimates, 90% of the data that has ever existed has been created in the
last two years. This is a staggering figure and has given rise to new challenges and
opportunities in almost every industry: What kind of data do you need to collect to
compete, and how can you make sense of it once you have collected it? As technology
evolves and the volume of data increases, how can you make the best use of all this
information? How can you use the data to help drive your decision making? How can
you make data work for you? How can you ensure your data accurately reflects the
population in which you’re interested?
In this course, you will determine the types of engineering and business questions you
can answer, the kinds of problems you can solve, and the decisions you can make, all
through using data analytics. You will explore best practices for collecting information
so that you can make informed predictions, develop insights, and better inform
organizational decision making. You will see real-world examples that demonstrate
how those tools work. Additionally, you will have a chance to apply some of the
concepts to your own work. You will explore best practices for sampling and examine
how different types of sampling are suited for different situations. Finally, you will
see real-world examples that demonstrate how those tools work and have a chance to
practice sampling techniques in some case-study scenarios.
Visualization is one of the most simple and effective ways to find patterns in data.
These patterns include: What is the general range and shape of the data set? Are there
any clusters of observations? Which variables correlate with each other? Are there any
obvious outliers?
As your data set grows in terms of the number of data points and variables, however,
it becomes increasingly difficult to visualize all this information at once. At most, you
can plot data points on a three-dimensional axis and add further distinctions of size,
Visit ecornell.cornell.edu
7 COURSES
4
COURSE
DESCRIPTIONS
100% ONLINE
color, shape, and so on. Yet this can easily become too busy and difficult to read. How,
then, do we find patterns in really big data sets?
In this course, you will explore several powerful and commonly utilized techniques for
distilling patterns from data. You will implement each of these techniques using the
free and open-source statistical programming language R with real-world data sets.
The focus will be on making these methods accessible for you in your own work.
When you have large groups of objects, it is often helpful to split them into meaningful
groups or clusters. One example of this would be to identify different types of
customers so that a company can more efficiently route their calls to a helpline. As
a second example, suppose an automobile manufacturer wanted to segment their
market to target the ads more carefully. One approach might be to take a database of
recent car sales, including the social demographics associated with each customer,
and segment the population purchasing each type of automobile into meaningful
groups.
Specialized approaches exist if your data contains information that relates to time
and geography. You can use this additional information to identify geographical and
temporal hotspots. Hotspots are regions of high activity or a high value of a particular
variable. These results can help you focus your attention on a particular region where
a problem is occurring more than usual, such as the incidence of asthma in a large
city. In both cluster and hotspot analysis, the results can help you discover new and
interesting features, problems, and red flags regarding the data being analyzed.
In this course, you will explore several powerful and commonly utilized techniques for
performing both cluster and hotspot analysis. You will implement these techniques
using the free and open-source statistical programming language R with real-world
data sets. The focus will be on making these methods accessible and applicable to your
work.
Visit ecornell.cornell.edu
7 COURSES
5
COURSE
DESCRIPTIONS
100% ONLINE
A story can play an important role in understanding data. It can help distill complex
information into something manageable — something we can think about easily,
relate to, and use to make decisions. For many problems that we encounter globally,
however, a story that describes what already happened is not enough precision for the
job we want to perform. Often, we would like to use available data to make numerically
accurate predictions about what might happen in the future. This task requires the
construction of mathematical models that are well suited to our real-world problems.
In this course, you will explore several types of statistical models used with data to
make predictions. These models bring with them a whole batch of important concerns,
such as estimation and validation, that make the entire process into both an art and a
science. You will implement each of these techniques using the free and open-source
statistical programming language R with real-world data sets. The focus will be on
making these methods accessible for you in your own work.
Supervised learning is a general term for any machine learning technique that
attempts to discover the relationship between a data set and some associated labels
for prediction. In regression, the labels are continuous numbers. This course will
focus on classification, where the labels are taken from a finite set of numbers or
characters. The prototypical and perhaps most well-known example of classification
is image recognition. The goal is to take an image (represented by its pixel values) and
determine what objects are in the image. Is it a dog? A grapefruit? A stop sign?
Visit ecornell.cornell.edu
7 COURSES
6
COURSE
DESCRIPTIONS
100% ONLINE
In this course, you will explore several powerful and commonly utilized techniques for
supervised learning. You will implement each of these techniques using the free and
open-source statistical programming language R with real-world data sets. The focus
will be on making these methods accessible for you in your own work.
In this course, you will explore the mechanics of neural networks and the intricacies
involved in fitting them to data for prediction. Using packages in the free and
open-source statistical programming language R with real-world data sets, you will
implement these techniques. The focus will be on making these methods accessible
for you in your own work.
Statistics is about using data to estimate certain values and evaluate certain
hypotheses; this makes perfect sense for passively studying how the world works (i.e.,
the scientific method). More often than not, however, we find ourselves wanting to use
this statistical information to make decisions regarding the systems involved. Suppose
we estimate that the demand for jet fuel next month will be greater than normal. How
does this information affect the decision of an oil refinery to purchase crude oil from
their various sources? How does an airline company decide how many flight crews to
Visit ecornell.cornell.edu
7 COURSES
7
COURSE
DESCRIPTIONS
100% ONLINE
employ based on the current flight schedule? How does past sales information across
the U.S. influence a company’s decision over where to place its warehouses?
In this course, you will explore the mathematics of linear programs, how to solve them,
and how to evaluate your model. You will implement these techniques using packages
in the free and open-source statistical programming language R to solve real-world
logistical business problems. The focus will be on making these methods accessible
for you in your own work.
Simulation is about quantifying the outcome of specific “what if” questions. What if
the average demand for tickets on a 150-seat aircraft is actually 200? What if people
who have purchased a ticket don’t show up? What if we offered a different number, or
economy and first-class tickets? Perhaps most importantly, what effect do these “what
if” scenarios have on total revenue?
As you might guess, many “what if” questions in the real world are fundamentally
uncertain; there is no deterministic formula for predicting exactly how many people
will not show up for a given flight. You can, however, use historical data to estimate no-
show probabilities. Once you conclude that uncertainty plays an important part in your
problem, it may be that you will have to turn to a probabilistic simulation. Running
many replications of the simulation will then help you statistically analyze the system’s
behavior and assess the effects of different design choices.
Visit ecornell.cornell.edu
7 COURSES
8
COURSE
DESCRIPTIONS
100% ONLINE
In this course, you will explore the intricacies of designing and analyzing probabilistic
simulations. You will also run simulations using packages in the free and open-source
statistical programming language R to solve real-world logistical business problems.
The focus will be on making these methods accessible for you in your own work.
Visit ecornell.cornell.edu