0% found this document useful (0 votes)
61 views

Introduction To Statistics For Data Science: Opensap

Uploaded by

kokocodename47
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Introduction To Statistics For Data Science: Opensap

Uploaded by

kokocodename47
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

openSAP

Introduction to Statistics for Data Science


Week 4 Unit 1

00:00:06 Welcome to week four of the openSAP course Statistics for Data Science.
00:00:12 Here we'll be covering the topic of probability and Bayes' theorem.
00:00:18 In unit one, we'll be introducing the topic of probability, and we'll be trying
00:00:23 to be keep this simple so that you can learn a few key concepts,
00:00:29 and we'll be trying to avoid too much specialist notation and complication.
00:00:34 There are many times in modern life where you like to predict the likely outcome
00:00:40 of an event or a series of events. Let's look at a few examples.
00:00:46 What's the probability of randomly drawing a face card from a deck of cards?
00:00:51 What is the probability of throwing a seven using two dice? What's the probability of rain
today?
00:00:58 Should I take an umbrella with me? How likely is it
00:01:01 that Liverpool will beat Manchester United based on recent form?
00:01:07 A few more complex examples: What's the probability of having an adverse reaction
00:01:14 to a drug? Is it worth the UK National Health Service investing
00:01:20 in this particular new drug? And much more complex:
00:01:25 What is the likelihood, for example, that climate change is manmade?
00:01:30 What's the likelihood that this cluster of customers is needing
00:01:33 to make a house insurance claim? So here are some commonly used symbols.
00:01:40 We'll just introduce you to a few of these. P(A). So the probability of event A.
00:01:47 The probability of throwing a six with a die is P(6).
00:01:52 The "and" symbol: P(A ∩ B), the probability of event A and B.
00:01:58 The probability of two sixes: P(6 ∩ 6). The "or" symbol is the other way round:
00:02:04 The probability of event A or event B. The probability of choosing one card
00:02:10 from a standard deck and getting either a Queen of Hearts or a King of Hearts.
00:02:16 The given symbol: A given B with that up and down line.
00:02:20 The probability of A given prior event B that is. The probability of being diagnosed, for
example, with cancer
00:02:27 given that the patient is a smoker. We'll look at examples later on.
00:02:33 But how is probability represented? Probability is typically represented
00:02:37 between zero and one or as a percentage.
00:02:40 So zero means that the event is impossible. One means that the event will certainly happen.
00:02:49 Any number in between represents the probability but not certainty of the event.
00:02:55 For example, a probability of 0.7 for rain today means that the weather forecasters
00:03:02 have calculated a probability of 70% that it will rain today.
00:03:08 Note that this is not through some clairvoyance or magic but by calculating all the possible
outcomes
00:03:17 and the associated likelihood for each one of those. If we calculate all the possible outcomes
for an event,
00:03:24 this is the so-called sample space. For example, there are six possible events,
00:03:31 or six outcomes, when throwing one die. So the sample space in curly brackets
00:03:37 is one, two, three, four, five, six. From this, we can answer several questions.
00:03:43 What is the probability of throwing a six? What is the probability of throwing an even number?

00:03:51 But how do you calculate the probabilities? How do we go about calculating the probabilities
00:04:00 of an event in particular? So we already have one important piece of information,
00:04:06 the sample space. Now we need to have all the possible events
00:04:11 for each of the following. And we use the following formula.
00:04:17 Probability of E equals the total number of outcomes in E divided by the total number of
outcomes
00:04:24 in the sample space. For example, what's the probability
00:04:28 of throwing a six? The probability of throwing a six is one divided by six
00:04:34 which comes to 0.16 or 16%. What's the probability of throwing an even number?
00:04:41 Well, there are three even numbers, two, four, and six in the sample space
00:04:48 of six possible events. Therefore, the calculation for the possibilities
00:04:53 of throwing an even number is three divided by six or 0.5 or 50%.
00:05:00 Please note, this calculates probabilities when each outcome is equally possible.
00:05:07 We'll look later on at real-world situations where this is not the case.
00:05:13 We ask a random group of people which is their favorite style of music
00:05:19 from rock, disco, country, hip hop, reggae, and classical. If we choose a random person
00:05:26 what is the likelihood that they like classical music? Well, the probability of liking classical
music
00:05:33 equals the frequency for classical music divided by the total frequencies in the table
00:05:39 that you can see on the screen. That is the probability of that event equals 30
00:05:43 divided by 100 which comes to .3 or 30%.
00:05:49 In summary: You can calculate the likelihood of an event or series of events using probability.

00:05:56 You need to understand all the possible outcomes in the so-called sample space
00:06:01 and then the particular event or group of events that you are analyzing.
00:06:07 In the next unit, we'll be looking at conditional probability.
00:06:11 That is the likelihood of an event taking place conditioned on another event having taken
place.

2
Week 4 Unit 2

00:00:06 Welcome to unit two of this week's course on probability and Bayes' theorem.
00:00:11 In this unit, we'll be looking at conditional probability, that is, the likelihood of an event based
on knowledge
00:00:19 of another event having taken place already. It often happens that the probability of an event
00:00:28 is conditional on a previous event. If you have a bag of snooker balls, you can calculate
00:00:35 the probability that you will pull out a red one or a green one.
00:00:40 However, what is the probability that you will pull out a second red or a second green?
00:00:47 Well, the latter is zero as there is only one green. However, there is the possibility
00:00:54 of taking another random red, though the probability has changed
00:00:59 because you have already taken one red out. How do you calculate that conditional
probability?
00:01:08 Let's look at the chance of pulling out a random red ball. There are 15 red balls out of a total of
22 balls.
00:01:16 Therefore, the probability of taking out a red ball is equal to 15 divided by 22,
00:01:26 which comes to 0.68, 68%. However, now you've reduced the number of red balls to 14.
00:01:35 Therefore, the probability of the next random ball being red is slightly less.
00:01:42 The probability of this event now equals 14 divided by 21, which comes to 0.66, 66%.
00:01:53 Note that if you do not put the balls back after taking them out,
00:01:57 then the calculated probabilities are dependent. Just a reminder, the upright symbol, the pipe
symbol
00:02:06 is used to indicate conditional probability. For example, the probability of B given A
00:02:13 is the probability of events with the upright and A means the conditional probability of B given
A.
00:02:21 Let's look at an example. The probability of B and A equals the probability of A
00:02:27 multiplied by the probability of B given A. So, the probability of getting two red balls,
00:02:34 that is, A and B, is as follows. The probability of B and A equals 15 divided by
00:02:43 multiplied by 14 divided by 21. That comes to 210 divided by 462,
00:02:53 which equates to 0.46, 46%. Pause the video here and try calculating
00:03:02 the conditional probability of getting a red, a pink, and a black.
00:03:08 What is the probability of pulling an ace from a pack of cards having already picked out an
ace?
00:03:17 There are 52 cards in a standard pack of cards and four are aces.
00:03:23 Therefore, the simple probability of picking out the first ace is four divided by 52.
00:03:31 The probability of picking one ace from the pack is four divided by 52, which is one divided by
13 or 7.7%.
00:03:42 However, once this ace has been picked out, there are only three of them left and 51 cards in
total.
00:03:50 Therefore, the simple probability of drawing that second ace is three divided by 51,
00:03:56 which is 5.9%. These combine to calculate the combined probability
00:04:02 of pulling two aces. This can be done by multiplying together
00:04:06 the above simple probabilities. The calculation is on the slide.
00:04:12 So, the probability of picking two aces equals the probability of picking the first ace
00:04:18 multiplied by the probability of picking the second ace, given that the first ace was picked.
00:04:25 This comes to 12 divided by 2,652, which is one divided by 221 or 45%.

3
00:04:34 Let's rearrange the formula using simple algebra to find another use for conditional probability.

00:04:42 We started with calculating the probability of events A and B, which we calculated by
multiplying
00:04:49 the simple probability of A by the probability of B given A. You may want to pause the video for
a few seconds
00:04:58 to remind yourself about that formula. However, what if we want to calculate the likelihood
00:05:05 of B given A? We need to shift the formula around.
00:05:10 Let's swap the sides around to show what we are calculating and then move the simple
probability of A
00:05:19 from multiplying on one side to dividing on the other side. You can see this formula on the
slide.
00:05:26 We'd recommend again pausing the video here to look at the formula in a little bit more detail.
00:05:33 Let's look at an example. 80% of your team like tiramisu,
00:05:37 and 20% like tiramisu and sticky toffee pudding. What's the probability of team members
00:05:43 liking sticky toffee pudding given that they like tiramisu? Using the formula on the screen,
00:05:52 you can see that the answer comes to 25%. In summary, using probability,
00:06:00 you can calculate the likelihood of an event or series of events.
00:06:06 You need to understand all possible outcomes in the sample space and the particular event
00:06:13 or group of events you are analyzing. Events which are conditional upon each other,
00:06:20 can be combined to calculate probabilities. By rearranging the formula, you can calculate
00:06:28 the probability of more than one dependent event, which is the probability of A and B,
00:06:35 as well as the probability of one event being contingent on a second event, the probability of B
given A.
00:06:43 In the next unit, we'll be introducing you to Bayes' theorem.

4
Week 4 Unit 3

00:00:06 Welcome to unit three of week four. In this unit, we will be giving you
00:00:09 an introduction to Bayes' theorem. In the 18th century,
00:00:15 Thomas Bayes came up with an approach to how statistical probabilities should change in
response to new evidence.
00:00:25 This was developed later on by Pierre-Simon Laplace and Sir Harold Jeffreys.
00:00:32 Bayes' Theorem provides us with the ability to update our beliefs based on prior knowledge of
factors
00:00:41 that might be related to the event. But what does that mean?
00:00:46 Well, very often we want to calculate the probability based on some prior information.
00:00:52 We started looking into this when we looked at conditional probability.
00:00:57 Here we are going a little bit deeper into how to calculate probabilities based on that prior
information.
00:01:06 But what does this look like in practice? Remember first that the pipe symbol means "given
that".
00:01:13 P(A|B) is the probability of an outcome A given the evidence of B. How would we express this
in an example
00:01:20 such as the probability of getting lung cancer, given that the person is a smoker.
00:01:27 P(Lung Cancer | Smoker). So where can we use Bayes' theorem?
00:01:34 Bayes' theorem has many, many applications, but here are just a few examples.
00:01:40 What is the likelihood that a CEO of a stock market listed company gets fired given that their
company's share
00:01:48 price underperforms their competitors' by more than 10% over a year?
00:01:55 Or you have developed an app to review travel expense claims after an audit has found that
3% of them contain errors,
00:02:03 and it finds errors in 94% of those picked up in the audit, and 8% of those with no errors.
00:02:11 What is the likelihood of there actually being an error given the expense has been marked
00:02:17 as approved by your program or in error by your program? What's the likelihood of people
working outside
00:02:24 without a hat getting skin cancer in the UK? The key word here is "given".
00:02:30 Each calculation is dependent on another probability. Let's see how this is done.
00:02:38 Let's look at an example involving medical tests. Medical tests are rarely perfect
00:02:44 and can produce inaccurate results. We can test the accuracy
00:02:49 of the medical test using Bayes' theorem. We know that a certain percentage
00:02:54 of the population will, for example, get lung cancer, but we want to know how that is affected
by prior conditions
00:03:02 such as drinking, smoking, or lack of exercise. Bayes allows us to calculate the likelihood,
00:03:10 or the probability, of someone getting cancer given that they smoke.
00:03:17 What does this all mean in practice? Well, remember the pipe symbol which means given.
00:03:22 P is for probability. P is the probability of event A, such as cancer,
00:03:27 given B, if you know the person is a smoker for example. Hint: This is what we want to
calculate.
00:03:33 So P(B|A) is the probability of B given A. This is all divided by the simple probability of B.
00:03:41 Therefore, the formula calculates the probability of event A given the prior information we have
about event B.
00:03:51 There are three parts to this calculation: The probability of the evidence conditional
00:03:57 on the hypothesis: P of B given A. The prior probability of the hypothesis: P of A.

5
00:04:06 The prior probability of the evidence: P of B. The example here is with some invented values.
00:04:15 Let's assume that the prior probabilities are as such: 4% of the population have some form of
cancer.
00:04:22 20% of the population smoke. 60% of the people with cancer smoke; that is,
P(Smoker|Cancer).
00:04:33 The probability of the evidence conditional on the hypothesis – here, the probability of being a
smoker
00:04:42 and you have cancer – is P of B given A is 0.6. The prior probability of the hypothesis –
00:04:48 that is, having cancer – the simple probability of A is 0.04. The prior probability of the
evidence,
00:04:54 the prior probability of being a smoker: P equals 0.2. We know that 4% of the population have
cancer,
00:05:02 but in this case, what is the probability of getting cancer given that you are a smoker?
00:05:09 We'll use S for smoker and C for cancer: If we put the numbers into the formula,
00:05:14 you can see on the screen how the calculation works. You may want to pause the video here
for a few seconds
00:05:21 to go through this more slowly. Therefore, the likelihood with this formula
00:05:28 of getting cancer from smoking is 12% based on these made-up figures.
00:05:34 In summary: We've learnt the following: You can use Bayes' theorem to allow probabilities
00:05:41 to be re-evaluated based on new evidence. Bayes' theorem allows you to evaluate the
likelihood
00:05:50 of an event based on the occurrence of a prior event. The formula for Bayes' theorem
00:05:57 is as you can see on the screen. To calculate the probability of event A given the occurrence
00:06:03 of a previous event B, we need to take the probability of event B given A, multiply that by the
simple probability
00:06:12 of A and divide that whole result by the simple probability of B.
00:06:19 The probability of A occurring is calculated based on the relevant event B, which has occurred.

00:06:27 In the next unit, we will look at an example of Bayes' theorem in practice.

6
Week 4 Unit 4

00:00:06 Welcome to this final unit of week four, where we continue to look at Bayes' theorem
00:00:12 and a worked example. Hopefully, by now, you have a working understanding
00:00:18 of Bayes' Theorem and how it helps you find the probability
00:00:23 of an outcome based on prior, relevant information. In this unit, we'll help you deepen your
understanding
00:00:31 by providing one more example, so that you can work through it.
00:00:36 Let's remind ourselves of the formula What is the likelihood of event A
00:00:40 given the evidence we already have of B? We calculate this by taking the probability of event B

00:00:47 given A, multiplying that by the simple probability of A, and dividing all of that
00:00:53 by the simple probability of event B. Let's get to the example.
00:01:01 You look out of the window and see clouds everywhere. Of course, seeing the clouds may
make you think that
00:01:09 there is the likelihood of rain, but that's not actually guaranteed.
00:01:15 On plenty of occasions, you've noticed that the early clouds have cleared up to leave a
beautiful day.
00:01:23 However, you have a feeling that when the day starts cloudy, you are more likely to get rain
later on,
00:01:30 if it hasn't stopped raining. Would it make sense to take a coat
00:01:35 and umbrella with you or not? How do you calculate the likelihood
00:01:39 that this will turn into rain later? That is the probability of R, that is rain later,
00:01:46 given C, early morning clouds. You know from old weather data
00:01:52 that rainy days start off cloudy 70% of the time. That is the probability of C given R equals 0.7.

00:02:00 Be careful about this. It's the simple probability of the day having started
00:02:05 cloudy earlier, given rain later in the day. Yet, 50% of days start off cloudy.
00:02:12 You're not in California. That is the simple probability of C equals 0.5.
00:02:19 This month, June, normally only has a few cloudy days, three days in the month, 10%.
00:02:26 That is the simple probability of R is 0.1. What's the likelihood of rain?
00:02:32 Do you need a coat? You don't want to carry unnecessary weight around
00:02:36 if the day is likely to be a nice one, so let's work out the probability.
00:02:43 We've reminded you of the formula at the top of the slide, so let's slot the data into the
example.
00:02:50 It's very common to replace the event names with letters, and we did this with R and C in the
last slide.
00:02:58 However, to keeps things clear for us, let's use the full description
00:03:03 when we're running this calculation. What's the likelihood of rain given that,
00:03:08 when you woke up, it was cloudy? Pause the video
00:03:13 and spend a short amount of time seeing how this formula has been built to calculate the
likelihood
00:03:21 of rain given early morning cloud in June. So what do you already know that will
00:03:27 help us calculate the probability? We know the probability of cloud given rain equals 0.7.
00:03:35 We also know the simple probability of rain is And finally, we know the simple probability
00:03:43 of cloud in the morning is 0.5. Therefore, using Bayes' theorem,
00:03:49 the probability of getting rain if it's cloudy in June is 0.7 multiplied by 0.1 all divided by 0.5,

7
00:03:59 which comes to 0.14 or 14%. So perhaps, in June, we can take the risk of not taking
00:04:07 rain gear, even if it's cloudy in the morning. In summary: Calculating probabilities is very
useful,
00:04:16 but it's important to be able to factor in information that will relate to and affect the probability.
00:04:24 Bayes' theorem allows you to evaluate the likelihood of an event based on the occurrence of a
prior event.
00:04:32 This was illustrated with a very simple example. However, Bayes' theorem can be used in
much more interesting
00:04:41 and complex ways, such as calculating the likelihood of false positives on a health screening
test.
00:04:49 Thank you for your participation in week four's activities. We hope you enjoyed the work on
probability and Bayes.
00:04:58 In week five, you'll be learning about probability distributions and hypothesis testing.

8
www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation
to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are
cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. See www.sap.com/copyright for additional trademark information and notices.

You might also like