0% found this document useful (0 votes)
6 views

CU Data Science

The Online Certificate Program in Data Science focuses on utilizing the R programming language to apply data analytics tools for decision-making. It consists of eight courses covering topics such as data collection, pattern recognition, machine learning, and optimization techniques. Participants will earn a Data Science Certificate from Cornell College of Engineering and gain practical experience through real-world data sets.

Uploaded by

ivyprepbkk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

CU Data Science

The Online Certificate Program in Data Science focuses on utilizing the R programming language to apply data analytics tools for decision-making. It consists of eight courses covering topics such as data collection, pattern recognition, machine learning, and optimization techniques. Participants will earn a Data Science Certificate from Cornell College of Engineering and gain practical experience through real-world data sets.

Uploaded by

ivyprepbkk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

DATA SCIE N CE
Online Certificate Program

OVERVIEW
R is quickly becoming one of the most popular and effective programming languages
of data science. In this program, you’ll apply data science tools to the collection of
data and the translation of data into information, constructing models that can be
used to address the questions that you’re investigating. You’ll have the opportunity to
apply data analytics as a four-part process: gathering data, looking for patterns in that
data, finding insights in any patterns you discover, and using those insights to make
decisions. This process does not make decisions for you, but it will help you to better
understand the effects of the decisions you might make. Through an examination of
real-world data sets and different modeling techniques, as well as an in-depth look
at how the programming language R can be used to help you find patterns and derive
insights, you will gain valuable experience working in each stage of the data analytics
process, helping you and your organization to make better decisions – and gain a
sound scientific understanding of why you’re making the choices you’re making.

COURSES COURSE LENGTH FORMAT


8 2 weeks 100% online

COURSES
• Understanding Data Analytics
• Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
• Finding Patterns in Data Using Cluster and Hotspot Analysis
• Regression Analysis and Discrete Choice Models
• Supervised Learning Techniques
• Neural Networks and Machine Learning
• Making Data-Driven Recommendations Using Optimization
• Making Predictions Using Simulation

Visit ecornell.cornell.edu
7 COURSES

2 INSIDE the PROGRAM


100% ONLINE

KEY TAKEAWAYS
• Explore the data analytics process and • Predict the value of continuous variables
examine the tools available to improve with linear regression
decision making • Use neural networks to make
• Use unsupervised learning techniques predictions about new data
to help identify patterns in data and • Make forecasts from data collected over
create visualizations to better spot those time and measure their accuracy
patterns • Create linear programs and simulations
• Categorize data using supervised to optimize system performance and
learning algorithms dynamics

WHO SHOULD ENROLL


• Current and aspiring data scientists
• Analysts
• Engineers
• Researchers
• Technical managers

WHAT YOU’LL EARN


• Data Science Certificate from Cornell College of
Engineering
• 160 Professional Development Hours (16 CEUs)

Visit ecornell.cornell.edu
7 COURSES

3
COURSE
DESCRIPTIONS
100% ONLINE

UNDERSTANDING DATA ANALYTICS

By some estimates, 90% of the data that has ever existed has been created in the
last two years. This is a staggering figure and has given rise to new challenges and
opportunities in almost every industry: What kind of data do you need to collect to
compete, and how can you make sense of it once you have collected it? As technology
evolves and the volume of data increases, how can you make the best use of all this
information? How can you use the data to help drive your decision making? How can
you make data work for you? How can you ensure your data accurately reflects the
population in which you’re interested?

In this course, you will determine the types of engineering and business questions you
can answer, the kinds of problems you can solve, and the decisions you can make, all
through using data analytics. You will explore best practices for collecting information
so that you can make informed predictions, develop insights, and better inform
organizational decision making. You will see real-world examples that demonstrate
how those tools work. Additionally, you will have a chance to apply some of the
concepts to your own work. You will explore best practices for sampling and examine
how different types of sampling are suited for different situations. Finally, you will
see real-world examples that demonstrate how those tools work and have a chance to
practice sampling techniques in some case-study scenarios.

FINDING PATTERNS IN DATA USING ASSOCIATION RULES, PCA, AND


FACTOR ANALYSIS

Visualization is one of the most simple and effective ways to find patterns in data.
These patterns include: What is the general range and shape of the data set? Are there
any clusters of observations? Which variables correlate with each other? Are there any
obvious outliers?

As your data set grows in terms of the number of data points and variables, however,
it becomes increasingly difficult to visualize all this information at once. At most, you
can plot data points on a three-dimensional axis and add further distinctions of size,

Visit ecornell.cornell.edu
7 COURSES

4
COURSE
DESCRIPTIONS
100% ONLINE

color, shape, and so on. Yet this can easily become too busy and difficult to read. How,
then, do we find patterns in really big data sets?

In this course, you will explore several powerful and commonly utilized techniques for
distilling patterns from data. You will implement each of these techniques using the
free and open-source statistical programming language R with real-world data sets.
The focus will be on making these methods accessible for you in your own work.

FINDING PATTERNS IN DATA USING CLUSTER AND HOTSPOT ANALYSIS

When you have large groups of objects, it is often helpful to split them into meaningful
groups or clusters. One example of this would be to identify different types of
customers so that a company can more efficiently route their calls to a helpline. As
a second example, suppose an automobile manufacturer wanted to segment their
market to target the ads more carefully. One approach might be to take a database of
recent car sales, including the social demographics associated with each customer,
and segment the population purchasing each type of automobile into meaningful
groups.

Specialized approaches exist if your data contains information that relates to time
and geography. You can use this additional information to identify geographical and
temporal hotspots. Hotspots are regions of high activity or a high value of a particular
variable. These results can help you focus your attention on a particular region where
a problem is occurring more than usual, such as the incidence of asthma in a large
city. In both cluster and hotspot analysis, the results can help you discover new and
interesting features, problems, and red flags regarding the data being analyzed.

In this course, you will explore several powerful and commonly utilized techniques for
performing both cluster and hotspot analysis. You will implement these techniques
using the free and open-source statistical programming language R with real-world
data sets. The focus will be on making these methods accessible and applicable to your
work.

Visit ecornell.cornell.edu
7 COURSES

5
COURSE
DESCRIPTIONS
100% ONLINE

REGRESSION ANALYSIS AND DISCRETE CHOICE MODELS

A story can play an important role in understanding data. It can help distill complex
information into something manageable — something we can think about easily,
relate to, and use to make decisions. For many problems that we encounter globally,
however, a story that describes what already happened is not enough precision for the
job we want to perform. Often, we would like to use available data to make numerically
accurate predictions about what might happen in the future. This task requires the
construction of mathematical models that are well suited to our real-world problems.

In this course, you will explore several types of statistical models used with data to
make predictions. These models bring with them a whole batch of important concerns,
such as estimation and validation, that make the entire process into both an art and a
science. You will implement each of these techniques using the free and open-source
statistical programming language R with real-world data sets. The focus will be on
making these methods accessible for you in your own work.

SUPERVISED LEARNING TECHNIQUES

Supervised learning is a general term for any machine learning technique that
attempts to discover the relationship between a data set and some associated labels
for prediction. In regression, the labels are continuous numbers. This course will
focus on classification, where the labels are taken from a finite set of numbers or
characters. The prototypical and perhaps most well-known example of classification
is image recognition. The goal is to take an image (represented by its pixel values) and
determine what objects are in the image. Is it a dog? A grapefruit? A stop sign?

There are many practical classification tasks, such as determining whether an


individual’s financial history makes them high risk for a loan, whether there is a defect
in a material based on some sensor readings, or whether a new email is spam or not.
These problems share the same basic form and can be solved with many different
types of mathematical, statistical, and probabilistic models developed by the machine
learning community.

Visit ecornell.cornell.edu
7 COURSES

6
COURSE
DESCRIPTIONS
100% ONLINE

In this course, you will explore several powerful and commonly utilized techniques for
supervised learning. You will implement each of these techniques using the free and
open-source statistical programming language R with real-world data sets. The focus
will be on making these methods accessible for you in your own work.

NEURAL NETWORKS AND MACHINE LEARNING

Neural networks, a nonlinear supervised learning modeling tool, have become


hugely popular within the last two decades because they have been successfully
applied to a wide range of problems, including automatic language processing,
image classification, object detection, speech recognition, and pattern recognition.
They are mathematical models that are loosely built up based on an analogy to the
interconnected neuron in the brain. They take in a vector or matrix of input data
and output either a classification value or an approximation to a functional value.
The beauty is that the relationships between the inputs and outputs can be highly
nonlinear and complex.

In this course, you will explore the mechanics of neural networks and the intricacies
involved in fitting them to data for prediction. Using packages in the free and
open-source statistical programming language R with real-world data sets, you will
implement these techniques. The focus will be on making these methods accessible
for you in your own work.

MAKING DATA-DRIVEN RECOMMENDATIONS USING OPTIMIZATION

Statistics is about using data to estimate certain values and evaluate certain
hypotheses; this makes perfect sense for passively studying how the world works (i.e.,
the scientific method). More often than not, however, we find ourselves wanting to use
this statistical information to make decisions regarding the systems involved. Suppose
we estimate that the demand for jet fuel next month will be greater than normal. How
does this information affect the decision of an oil refinery to purchase crude oil from
their various sources? How does an airline company decide how many flight crews to

Visit ecornell.cornell.edu
7 COURSES

7
COURSE
DESCRIPTIONS
100% ONLINE

employ based on the current flight schedule? How does past sales information across
the U.S. influence a company’s decision over where to place its warehouses?

The quantification and mathematical solution of these types of decision-making


problems are known broadly as optimization. The general features of an optimization
problem are a set of quantifiable decisions that have a quantifiable effect that should
be minimized or maximized (think cost or revenue) and a set of constraints on the
possible values of those decisions. There are many different optimization branches,
but the most prominent, due to its widespread applicability and computational
efficiency, is linear programming, where the objective function and constraints are all
linear.

In this course, you will explore the mathematics of linear programs, how to solve them,
and how to evaluate your model. You will implement these techniques using packages
in the free and open-source statistical programming language R to solve real-world
logistical business problems. The focus will be on making these methods accessible
for you in your own work.

MAKING PREDICTIONS USING SIMULATION

Simulation is about quantifying the outcome of specific “what if” questions. What if
the average demand for tickets on a 150-seat aircraft is actually 200? What if people
who have purchased a ticket don’t show up? What if we offered a different number, or
economy and first-class tickets? Perhaps most importantly, what effect do these “what
if” scenarios have on total revenue?

As you might guess, many “what if” questions in the real world are fundamentally
uncertain; there is no deterministic formula for predicting exactly how many people
will not show up for a given flight. You can, however, use historical data to estimate no-
show probabilities. Once you conclude that uncertainty plays an important part in your
problem, it may be that you will have to turn to a probabilistic simulation. Running
many replications of the simulation will then help you statistically analyze the system’s
behavior and assess the effects of different design choices.

Visit ecornell.cornell.edu
7 COURSES

8
COURSE
DESCRIPTIONS
100% ONLINE

In this course, you will explore the intricacies of designing and analyzing probabilistic
simulations. You will also run simulations using packages in the free and open-source
statistical programming language R to solve real-world logistical business problems.
The focus will be on making these methods accessible for you in your own work.

Visit ecornell.cornell.edu

You might also like