0% found this document useful (0 votes)
1K views

Edexcel As and A Level Mathematics Statistics and Mechanics Year 1 - AS

A level textbook

Uploaded by

Kan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
1K views

Edexcel As and A Level Mathematics Statistics and Mechanics Year 1 - AS

A level textbook

Uploaded by

Kan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 240
11 —19 PROGRESSION me endorsed for Edexcel AS and A level Mathematics Statistics and Mechanics Year 1/AS Contents oO Contents Overarching themes Extra online content STATISTICS 1 1a 12 13 14 15 ra 22 23 24 25 31 3.2 33 3.4 35 41 42 51 52 Data collection Populations and samples Sampling Non-random sampling ‘Types of data The large data set Mixed exercise 1 Measures of location and spread Measures of central tendency Other measures of location Measures of spread Variance and standard deviation Coding Mixed exercise 2 Representations of data Outliers Box plots Cumulative frequency Histograms Comparing data Mixed exercise 3 Correlation Correlation Linear regression Mixed exercise 4 Probability Calculating probabilities Venn diagrams 40 41 43 46 48 53 54 59 63 66 69 70 72 53 54 61 62 63 71 72 73 Th Mutually exclusive and independent events Tree diagrams Mixed exercise 5 Statistical distributions Probability distributions The binomial distribution Cumulative probabilities Mixed exercise 6 Hypothesis testing Hypothesis testing Finding critical values One-tailed tests Two-tailed tests Mixed exercise 7 Review exercise 1 MECHANICS 8 Modelling in mechanics 8.1 Constructing a model 82 Modelling assumptions 83 Quantities and units 8.4 Working with vectors Mixed exercise 8 9 Constant acceleration 9.1 _ Displacement-time graphs 9.2 Velocity-time graphs 9.3 Constant acceleration formulae 1 94 Constant acceleration formulae 2 9.5 Vertical motion under gravity Mixed exercise 9 75 78 80 83 84 88 91 94 98 99 101 105 107 109 113 118 119 120 122 125 128 130 131 133 BT 142 146 152 10 10.1 10.2 10.3 10.4 10.5 10.6 se qa 112 113 Forces and motion Force diagrams Forces as vectors Forces and acceleration Motion in 2 dimensions Connected particles Pulleys Mixed exercise 10 Variable acceleration Functions of time Using differentiation Maxima and minima problems 156 157 160 162 166 169 173 amy 181 182 185 186 11.4. Using integration 11.5. Constant acceleration formulae Mixed exercise 11 Review exercise 2 Exam-style practice: Paper 2 Appendix Answers Index Contents 188 191 193 197 201 204 209 231 Overarching themes oO Overarching themes The following three overarching themes have been fully integrated throughout the Pearson Edexcel AS and A level Mathematics series, so they can be applied alongside your learning and practice. 1. Mathematical argument, language and proof * Rigorous and consistent approach throughout + Notation boxes explain key mathematical language and symbols * Dedicated sections on mathematical proof explain key principles and strategies * Opportunities to critique arguments and justify methods 2. Mathematical problem solving ‘The Mathematical Problem-solving cycle © Hundreds of problem-solving questions, fully integrated specify the problem into the main exercises r ‘} + Problem-solving boxes provide tips and strategies interpret results espe alles + Structured and unstructured questions to build confidence * Challenge boxes provide extra stretch L maven J represent information 3. Mathematical modelling * Dedicated modelling sections in relevant topics provide plenty of practice where you need it + Examples and exercises include qualitative questions that allow you to interpret answers in the context of the model * Dedicated chapter in Statistics & Mechanics Year 1/AS explains the principles of modelling in mechanics Finding your way around the book Access an online digital edition using, the code at the front of the book. Modelling in 8 mechanics Each chapter starts with alist of objectives The real world applications of the maths you are about to leam are highlighted at the start of the chapter with links to relevant questions in the chapter The Prior knowledge check helps make sure you are ready to start the chapter Overarching themes Exercise questions are = carefully graded so they increase in difficulty and gradually bring you up to exam standard Exercises are packed with exam-style questions to ensure you are ready for the exams Problem-solving boxes provide hints, tips and strategies, and Watch out boxes highlight areas where students often lose marks in their exams Bxam-style questions are flagged with ©) Each chapterends Challenge boxes _—_Each section begins Step-by-step problem aching with a Mixed exercise give you a chance to with explanation —_ worked examples questions areflagged anda Summary of tackle some more and key learning focus on the key with @ key points difficult questions points types of questions you'll need to tackle Every few chapters a Review exercise helps you consolidate your learning with lots of exam-style questions Exam-style practice Mathematics AS Level Paper 2: Statistics and Mechanics ERASER tomas st ese Review exercise 1 Aull AS level practice paper at the back of the book helps you prepare for the real thing Extra online content oO Extra online content Whenever you see an Online box, it means that there is extra online content available to support you SolutionBank SolutionBank provides a full worked solution for every question in the book. EXD huivoressctnonsae SEE available in SolutionBank. Download all the solutions asa POF or quickly find the solution you need online all the extra online content for free at: www.pearsonschools.co.uk/smimaths You can also access the extra online content by scanning this QR Extra online content Use of technology Explore topics in more detail, visualise problems and consolidate your understanding. Use pre-made GeoGebra activities or Casio resources for a graphic calculator. GeeGebra GeoGebra-powered interactives Interact with the maths you are learning using GeoGebra's easy-to- use tools, CUD tai nninecen CES graphically using technology. CASIO. Graphic calculator interactives [EXE]:Show coordinates cS Explore the maths you are learning and gain confidence in using a graphic calculator Calculator tutorials Our helpful video tutorials will guide you through how to use your calculator in the exams. They cover both Casio's scientific and colour graphic calculators. EDD wrkouteach coertcen FS quickly using the "C, and power functions on your calculator. Finding the value of the first derivative ‘to access the function press: Geax) © Step-by-step guide with audio instructions on exactly which buttons to press and what should appear on your calculator’s screen Published by Pearson Education Limited, 80 Strand, London WC2R ORL ‘wow pearsonschoolsandtecollegescouk Copies of official specications forall Pearson qualifications may be found on the website ‘qualifications pearson com Text © Pearson Education Limited 2017 Edited by Tech-Set Lid, Gateshead Typeset by Tech-Set Lie, Gateshead Original illustrations © Pearson Education Limited 2017 Cover lustration Marcus@ka-artists The rights of Greg Attwood, lan Bettison, Alan Clegg, Gil Oyer, Jane Dyer Keith Gack, Susan Hooker, Michael Jennings, Jean Littlewood, Sronwen Moran James Nicholson, Su Nicholson, Laurence Pateman, Keith Pledger and Harry Smith tobe identified as authors of this work have been asserted by them in accordance withthe Copyright, Designs and Patents Act 1988. First published 2017 zo19i8i7 10987654321 British Library Cataloguing in Publiation Data 2 catalogue record for this book is avallable from the Bish Library ISBN 978 1 292.232539 Copyright notice Allrights reserved, No part of this publication may be reproduced in any form or by any means (incliding photocopying or storing it in any mecium by electronic means and whether ar nat ‘ransieatly or incidentally to some othe use of this publication) without the written permission ofthe copyright owner, except in accordance withthe provsions ofthe Copyright, Designs and Patents Act 1988 ar under the terms of licence issued by the Copyright Licensing Agency, Bamards Inn 86 Fetter Lane, London ECLA 1EN (ane.caca.ul) Aplications for the copyright ‘owners written permission should be addressed tothe publishee Printed in the UK by Bell and Bain L14,Glasgow Acknowledgements Te authors and publisher would lke to thank the following individuals and organisations for permission to reproduce photographs: (Key: bottom; centre; Heft right top) Fotolia.com: Arousa 156, 197cr ayeren 20,113 (9, mdesigneri25 1, 113tL,Okea 40, 113k, Getty Images: Ble Wess / Boston Red Sox 69, 113c,bortonia 98, 173t rckoyre 130, 197c: ‘Shutterstock.com: Anette Holmberg $8, 113c Carlos E. Santa Mara 118, 197t Fer Gregory 181, 197tr, Joggie Botma 123, John Evans 83, 113tr () All other images © Pearson Education Pearson has robust editorial processes, including answer and fact checks to ensure the accuracy fof the content inthis publication, and every effet is made to ensure ths publication is fee of ertors. We are, however only human, and occasionally erors do occur Pearson isnot lable for ‘ay winurerstanl gs hal arne a3 esul of ei ts punta, DOLL is oun pil Lo fensute thatthe content is accurate Ifyou spot an error please do contact us at [email protected] so we can make sute itis comrected Contains public sector information licensed under the Open Government Licence v2.0. ‘Anote from the publisher In order to ensure that tis resource offers high-quality suppor forthe associated Pearson qualification, ithas been through a review process by the awarding body. Ths process confirms that this resource fully covers the teaching and learning cantert of the specification or part of a speciation at which ts aimed. It also confirms that demonstrates an appropriate balance betneen the development of subject sls, knowledge and understanding, in addition to preparation for assessment. Endorsement does not cover any guidance on assessment activities or processes (eg practice {questions or advice on how to answer assessment questions, included inthe resource nor does it prescribe any particular approach to the teaching or delivery ofa related course ‘While the publishers have made every attempt to ensure that advice on the qualification and its assessment is accurete, the official specification and associated assessment guidance materials are the only authoritative source of information and should always be reerre to for definitive guidance. Pearson examiners have not contributed to any sections inthis resource relevant to examination papers for which they have responsibilty. Examiners wll not use endorsed resources as source of material for any assessment set by Pearson, Endorsement of a resource does not mean that the resource is required to achieve ths Pearson {qualification nor does it mean that it the only suitable material available to support the {uaifiation, and any resource lists produced by the awarding body shall include ths and other appropriate resources Data collection After completing this chapter you should be able to: ‘© Understand ‘population’, ‘sample’ and ‘census’, and comment on the advantages and disadvantages of each > pages 2-3 © Understand the advantages and disadvantages of simple random sampling, systematic sampling, stratified sampling, quota sampling and opportunity sampling > pages 6-9. © Define qualitative, quantitative, discrete and continuous data, and understand grouped data — pages 9-10 © Understand the large data set and how to collect data from it, identify types of data and calculate simple statistics + pages 11-16 rae orients 4 a 1 Find the mean, median, mode and range of these data sets: a 1,3,4,4,6,7,8,9, 11 b 20,18, 17, 20, 14, 23,19, 16 Meteorologists collect and analyse weather data to help them predict weather patterns, Selecting weather data from specific dates and places is an example of sampling. Section 1.5 © GCSE Mathematics Here is a question from a questionnaire surveying TV viewing habits How much TV do you watch? To-1hours O1-2hours 113-4 hours Give two criticisms of the question and write an improved question, © GCSE Mathematics Rebecca records the shoe size, x, of the female students in her year. The results are given in the table. Find: the number of female students who take shoe size 37 the shoe size taken by the smallest number of female students the shoe size taken by the greatest number of female students the total number of female students in the year. Number of * | students, f 35 3 36 a7 37 29 38 34 39 12 ‘© GCSE Mathematics Chapter 1 @ Populations and samples = In statistics, a population is the whole set of items that are of interest. For example, the population could be the items manufactured by a factory or all the people in a town. Information can be obtained from a population. This is known as raw data. ® A census observes or measures every member of a population. = A sample is a selection of observations taken from a subset of the population which is used to find out information about the population as a whole. There are a number of advantages and disadvantages of both a census and a sample. Advantages dvantages Census |+ Itshould givea completely accurate result |+ Time consuming and expensive + Cannot be used when the testing process destroys the item + Hard to process large quantity of data Sample |+ Less time consuming and expensive than |+ The data may not be as accurate a census + The sample may not be large enough + Fewer people have to respond to give information about small sub- + Less data to process than in a census groups of the population The size of the sample can affect the validity of any conclusions drawn. + The size of the sample depends on the required accuracy and available resources. Generally, the larger the sample, the more accurate it is, but you will need greater resources. If the population is very varied, you need a larger sample than if the population were uniform. + Different samples can lead to different conclusions due to the natural variation in a population. ® Individual units of a population are known as sampling units. ™ Often sampling units of a population are individually named or numbered to form a list called a sampling frame. ex) A supermarket wants to test a delivery of avocados for ripeness by cutting them in half, a Suggest a reason why the supermarket should not test all the avocados in the delivery, The supermarket tests a sample of 5 avocados and finds that 4 of them are ripe. They estimate that 80% of the avocados in the delivery are ripe. b Suggest one way that the supermarket could improve their estimate. a Testing all the avocados would mean that there were none left to sell b They could take a larger sample, for example 10 avocados. This would give a better estimate of the werall proportion of ripe avocados. Data collection Lara 1 A school uses a census to investigate the dietary requirements of its students. a_ Explain what is meant by a census, b Give one advantage and one disadvantage to the school of using a census. 2. A factory makes safety harnesses for climbers and has an order to supply 3000 harnesses. The buyer wishes to know that the load at which the harness breaks exceeds a certain figure. a Suggest a reason why a census would not be used for this purpose. The factory tests four harnesses and the load for breaking is recorded: 320kg — 260kg = 240Kkg_—«180kg b The factory claims that the harnesses are safe for loads up to 250 kg. Use the sample data to comment on this claim. © Suggest one way in which the company can improve their prediction 3 A city council wants to know what people think about its recycling centre. ‘The council decides to carry out a sample survey to learn the opinion of residents. a Write down one reason why the council should not take a census. b Suggest a suitable sampling frame. ¢ Identify the sampling units. 4 A manufacturer of microswitches is testing the reliability of its switches. It uses a special machine to switch them on and off until they break. a Give one reason why the manufacturer should use a sample rather than a census. The company tests a sample of 10 switches, and obtains the following results: 23150 25071 19480 22921-7455 b The company claims that its switches can be operated an average of 20000 times without breaking. Use the sample data above to comment on this claim. ¢ Suggest one way the company could improve its prediction. 5 A manager of a garage wants to know what their mechanics think about a new pension scheme designed for them. The manager decides to ask all the mechanics in the garage. a Describe the population the manager will use. b Write down the main advantage in asking all of their mechanics. Chapter 1 @® Sampling In random sampling, every member of the population has an equal chance of being selected. The sample should therefore be representative of the population. Random sampling also helps to remove bias from a sample. There are three methods of random sampling: + Simple random sampling + Systematic sampling + Stratified sampling ™ Asimple random sample of size 1 is one where every sample of size 1 has an equal chance of being selected. To carry out a simple random sample, you need a sampling frame, usually a list of people or things. Each person or thing is allocated a unique number and a selection of these numbers is chosen at random. There are two methods of choosing the numbers: generating random numbers (using a calculator, computer or random number table) and lottery sampling, In lottery sampling, the members of the sampling frame could be written on tickets and placed into a ‘hat’. The required number of tickets would then be drawn out. The 100 members of a yacht club are listed alphabetically in the club’s membership book. The committee wants to select a sample of 12 members to fill in a questionnaire. a Explain how the committee could use a calculator or random number generator to take a simple random sample of the members, b Explain how the committee could use a lottery sample to take a simple random sample of the members. ‘a Allocate a number from 1 to 100 to each member of the yacht club. Use your calculator or a random number generator to generate 12 random numbers between 1 and 100. Go back to the original population and select the people corresponding to these numbers. b Write all the names of the members on (identical) cards and place them into a hat. Draw out 12 names to make up the sample of members. = In systematic sampling, the required elements are chosen at regular intervals from an ordered list. For example, if a sample of size 20 was required from a population of 100, you would take every fifth person since 100 + 20=5. 4 Data collection The first person to be chosen should be chosen at random. So, for example, if the first person chosen is number 2 in the list, the remaining sample would be persons 7, 12, 17 etc. © In stratified sampling, the population is divided into mutually exclusive strata (males and females, for example) and a random sample is taken from each. The proportion of each strata sampled should be the same. A simple formula can be used to calculate the number of people we should sample from each stratum: number in stratum The number sampled ina stratum = Sumber in stratum _ ‘umber in population x overall sample size Gan ‘A factory manager wants to find out what his workers think about the factory canteen facilities. ‘The manager decides to give a questionnaire to a sample of 80 workers. It is thought that different age groups will have different opinions. There are 75 workers between ages 18 and 32. There are 140 workers between 33 and 47. There are 85 workers between 48 and 62. a Write down the name of the method of sampling the manager should use. b Explain how he could use this method to select a sample of workers’ opinions. a Stratified sampling b There are: 75 + 140 + 85 = 300 workers altogether. 18-32: 22 x 80 = 20 workers. 300 47, 140 39 = 371 m 33-47: 300, x 80 = 375 = 37 workers. 48-62: © y 80 = 222 = 23 workers. 300 3 Number the workers in each age group. Use a random number table (or generator) to produce the required quantity of random numbers. Give he questionnaire to the workers corresponding to these numbers. Each method of random sampling has advantages and disadvantages. Simple random sampling Advantages Disadvantages + Free of bias + Not suitable when the population size or the + Easy and cheap to implement for small sample size is large populations and small samples + Asampling frame is needed + Each sampling unit has a known and equal chance of selection wa Chapter 1 Systematic sampling Advantages Disadvantages + Simple and quick to use + Asampling frame is needed + Suitable for large samples and large + It can introduce bias if the sampling frame is populations not random Stratified sampling Advantages Disadvantages + Sample accurately reflects the population [+ Population must be clearly classified into structure distinct strata + Guarantees proportional representation of || + Selection within each stratum suffers from groups within a population the same disadvantages as simple random sampling Exel 1 a The head teacher of an infant school wishes to take a stratified sample of 20% of the pupils at the school. The school has the following numbers of pupils. ‘Year 1 Year 2. Year 3 40 60 80, ‘When describing advantages or Work out how many pupils in each age group will be in disadvantages of a particular the sample. sampling method, always refer b Describe one benefit to the head teacher of using a to the context of the question. stratified sample. 2. A survey is carried out on 100 members of the adult population of a city suburb. The population Of the suburb is 2000, Au alphabetical list of the inhabitants of the subur is available. a Explain one limitation of using a systematic sample in this situation. b Describe a sampling method that would be free of bias for this survey. 3. A gym wants to take a sample of its members. Each member has a 5-digit membership number, and the gym selects every member with a membership number ending 000. a Is this a systematic sample? Give a reason for your answer. b Suggest one way of improving the reliability of this sample. 4 A head of sixth form wants to get the opinion of Year 12 | Year 13 year 12 and year 13 students about the facilities nk a a available in the common room. The table shows the numbers of students in each year. a Suggest a suitable sampling method that might be used to take a sample of 40 students, b How many students from each gender in each of the two years should the head of sixth form ask? Female 85 78 Data collection 5 A factory manager wants to get information about the ways their workers travel to work. There are 480 workers in the factory, and each has a clocking-in number. The numbers go from 1 to 480. Explain how the manager could take a systematic sample of size 30 from these workers. 6 The director of a sports club wants to take a sample of members. The members each have a unique membership number. There are 121 members who play cricket, 145 members who play hockey and 104 members who play squash. No members play more than one sport. a Explain how the director could take a simple random sample of 30 members and state one disadvantage of this sampling method. ‘The director decides to take a stratified sample of 30 members. b State one advantage of this method of sampling. ¢ Work out the number of members who play each sport that the director should select for the sample, 1.3) Non-random sampling There are two types of non-random sampling that you need to know: + Quota sampling + Opportunity sampling = In quota sampling, an interviewer or researcher selects a sample that reflects the characteristics of the whole population. The population is divided into groups according to a given characteristic. The size of each group determines the proportion of the sample that should have that characteristic. Asan interviewer, you would meet people, assess their group and then, after interview, allocate them into the appropriate quota. This continues until all quotas have been filled, If a person refuses to be interviewed or the quota into which they fitis full, then you simply ignore them and move on to the next person. = Opportunity sampling consists of taking the sample from Opportunity people who are available at the time the study is, sampling is sometimes called carried out and who fit the criteria you are looking for. convenience sampling. This could be the first 20 people you meet outside a supermarket on a Monday morning who are carrying shopping bags, for example. There are advantages and disadvantages of each type of sampling, Quota sampling Advantages | Disadvantages + Allows a small sample to still be + Non-random sampling can introduce bias representative of the population + Population must be divided into groups, No sampling frame required which can be costly or inaccurate Quick, easy and inexpensive + Increasing scope of study increases number + Allows for easy comparison between different | of groups, which adds time and expense groups within a population + Non-responses are not recorded as such Chapter 1 Opportunity sampling Advantages Disadvantages Easy to carry out + Unlikely to provide a representative sample + Inexpensive + Highly dependent on individual researcher 1 Interviewers in a shopping centre collect information on the spending habits from a total of 40 shoppers. a Explain how they could collect the information using: i quota sampling ii opportunity sampling b Which method is likely to lead to a more representative sample? 2. Describe the similarities and differences between quota sampling and stratified random sampling. 3 An interviewer asks the first 50 people he sees outside a fish and chip shop on a Friday evening about their eating habits, a. What type of sampling method did he use? b Explain why the sampling method may not be representative. ¢ Suggest two improvements he could make to his data collection technique. 4 A researcher is collecting data on the radio-listening habits of people in a local town. She asks the first 5 people she sees on Monday morning entering a supermarket. The number of hours per week each person listens is given below: 4°97 6 8 2 a Use the sample data to work out a prediction for the average number of hours listened per week for the town as a whole. b Describe the sampling method used and comment on the reliability of the data. © Suggest two improvements to the method used. 5 Ina research study on the masses of wild deer in a particular habitat, scientists catch the first 5 male deer they find and the first 5 female deer they find. a. What type of sampling method are they using? b Give one advantage of this method. ‘The masses of the sampled deer are listed below. Male(kg) | 75 | 80 | 90 | 85 | 82 Female (kg) | 67 | 72 | 75 | 68 | 65 ¢ Use the sample data to compare the masses of male and female wild deer. Suggest two improvements the scientists could make to the sampling method. Data collection 6 The heights, in metres, of 20 ostriches are listed below: 1.8, 1.9, 2.3, 1.7, 2.1, 2.0, 2.5, 2.7, 2.5, 2.6, 2.3, 2.2, 2.4, 2.3, 2.2, 2.5, 1.9, 2.0, 2.2, 2.5 a Take an opportunity sample of size five from the data. : : GED resample ofan b Starting from the second data value, take a systematic eesuuilt/ aro Tom ine sample of size five from the data. dite would be to'select the ¢ Calculate the mean height for each sample, first five heights from the list. d State, with reasons, which sampling method is likely to be more reliable. @® Types of data ™ Variables or data associated with numerical observations are called quantitative variables or quantitative data. For example, you can give a number to shoe size so shoe size is a quantitative variable. ® Variables or data associated with non-numerical observations are called qualitative variables or qualitative data. For example, you can't give a number to hair colour (blonde, red, brunette). Hair colour is a qualitative variable © A variable that can take any value in a given range is a continuous variable. For example, time can take any value, e.g. 2 seconds, 2.1 seconds, 2.01 seconds etc, ® Avariable that can take only specific values in a given range is a discrete variable. For example, the number of girls in a family is a discrete variable as you can't have 2.65 girls in a family. Large amounts of data can be displayed in a frequency table or as grouped data. ® When data is presented in a grouped frequency table, the specific data values are not shown. The groups are more commonly known as classes. © Class boundaries tell you the maximum and minimum values that belong in each class. The midpoint is the average of the class boundaries. © The class width is the difference between the upper and lower class boundaries. The lengths, x mm, to the nearest mm, of the forewings of a random sample of male adult butterflies are measured and shown in the table. Length of ‘Number of forewing (mm) butterflies, f 30-31 2 32.33 25 34.36 30 37-39 13 Chapter 1 a State whether length is i quantitative or qualitative ii discrete or continuous. b Write down the class boundaries, midpoint and class width for the class 34-36. a 1 Quantitative | Be careful when finding class ji Continuous boundaries for continuous data. The data values have been rounded to the nearest mm, so the upper class boundary for the 30-31 mm class is 31.5 mm. b Class boundaries 33.5 mm, 36.5 mm Midpoint = 3(33.5 + 36.5) = Class width 1 State whether each of the following variables is qualitative or quantitative. a Height of a tree b Colour of car ¢ Time waiting in a queue 4 Shoe size e Names of pupils in a class 2 State whether each of the following quantitative variables is continuous or discrete. a Shoe size b Length of leaf ¢ Number of people on a bus d Weight of sugar ¢ Time required to run 100m f Lifetime in hours of torch batteries 3 Explain why: a ‘Type of tree’ isa qualitative variable b ‘The number of pupils in a class’ is a discrete quantitative variable © ‘The weight of a collie dog’ is a continuous quantitative variable, 4 The distribution of the masses of two-month-old lambs is shown in the grouped frequency table. Mass, m (kg) | Frequency 12 13 8 ans GBBD Thee cass boundaries are given using l3em<14 28 ‘inequalities, so the values given in the table are the actual class boundaries. l4sm<15 32 15=m<16 22 a Write down the class boundaries for the third group. b Work out the midpoint of the second group. © Work out the class width of the first group. 10 Data collection @® The large data set You will need to answer questions based on real data in your exam. Some of these questions will be based on weather data from the large data set provided by Edexcel. The data set consists of weather data samples provided by the Met Office for five UK weather stations and three overseas weather stations over two set periods of time: May to October 1987 and May to October 2015. The weather stations are labelled on the maps below. Leuchars. , Jacksonville Leeming Heathrow Hum The large data set contains data for a number of different variables at each weather station: + Daily mean temperature in °C - this is the average of the hourly temperature readings during a 24-hour period. * Daily total rainfall including solid precipitation such as snow and hail, which is melted before being included in any measurements - amounts less than 0.05 mm are recorded as’‘tr’ or ‘trace’ * Daily total sunshine recorded to the nearest tenth of an hour + Daily mean wind direction and windspeed in knots, averaged over 24 hours from midnight to midnight. Mean wind directions are given as bearings and as cardinal (compass) directions. The data for mean windspeed is also categorised according to the Beaufort scale Beaufort scale | Descriptive Average speed at term 10 metres above ground Aknot (ka) isa 0 Gin tsi | CED Aha 13 Light 1 to 10 knots kn = 1.15 mph. 4 Moderate 11 to 16 knots 5 Fresh 17 to 21 knots + Daily maximum gust in knots this isthe Fe *rseas locations, the highest instantaneous windspeed recorded. CREED 0: the overseas ection, the ony I data recorded are: The direction from which the maximum gust. Daily mean temperature was blowing is also recorded © Daily total rainfall + Daily maximum relative humidity, given + Daily mean pressure as a percentage of air saturation with water + Daily mean windspeed vapour. Relative humidities above 95% give rise to misty and foggy conditions a Chapter 1 Daily mean cloud cover measured in ‘oktas’ or eighths of the sky covered by cloud Daily mean visibility measured in decametres (Om). This is the greatest horizontal distance at which an object can be seen in daylight Daily mean pressure measured in hectopascals (hPa) Any missing data values are indicated in the large data set as n/a or ‘not available’. Data from Hurn for the first days of June 1987 is shown to the right. You u are expected to be able to take a sample from the large data set, identify different types of data and cal culate statistics from the data. If you need to do calculations on the large data set in your exam, the relevant extract from the data set will be provided. Look at the extract from the large data set given above. a Describe the type of data represented by daily total rainfall. HURN © Crown Copyright Met Office 1987 g 3 P| | |e = 2/3/42] 2) 32] © = 3 & # # & 2 2 2 2 2 Be 2 & |4|4)|4\|2| 48| 8 O1/6/1987 | 15.1 0.6 45 7 Light 19 onensey [125 | 47 [0 | 7] ishe | 29 03/6/1987, 13.8 uw 5.6 i Moderate} 25 04/6/1987, is 5.3 78 7 Light 17 05/6/1987 | 13.1 19.0 0.5 10 Light 33 06/6/1987 | 13.8 0 89 19 Fresh 46 07/6/1987 13.2 tr 38 i Moderate} 27 08/6/1987, 12.9 1 1.7 9 Light 19 09/6/1987, 11.2 tr $4 6 Light 19 10/6/1987 52 13 o7 4 Light nla 11/6/1987 | 12.6 0 12.5 6 Light 18 12/6/1987 10.4 0 119 5 Light nla 13/6/1987, 96 0 8.6 5 Light 15 14/6/1987, 10.2 0 13.1 5 Light 18 15/6/1987 Rez a7 71 4 Light 25 16/6/1987 10.4 5.6 8.3 6 Light 25 17/6/1987 12.8 Ol = 10 Light 27 revise | 130 [74 [32 | 9 | Lighe 8 19/6/1987, 14.0 tr 04 12 Moderate} 33 20/6/1987 | 12.6 0 WW 6 Light it Alison is investigating daily maximum gust. She wants to select a sample of size 5 from the first 20 days in Hurn in June 1987. She uses the first two digits of the date as a sampling frame and generates five random numbers between 1 and 20. b State the type of sample selected by Alison. ¢ Explain why Alison’s process might not generate a sample of size 5. 12 Data collection [a Continuous quantitative data. Although you won't need to recall ‘specific data values from the large data set in your exam, you will need to know the limitations of the data set and the approximate range of values for each variable. b Simple random sample 6 Some of the data values are not available (ova Using the extract from the large data set on the previous page, calculate: a the mean daily mean temperature for the first five days of June in Hurn in 1987 b the median daily total rainfall for the week of 14th June to 20th June inclusive. The median daily total rainfall for the same week in Perth was 19.0 mm. Karl states that more southerly countries experience higher rainfall during June, © State with a reason whether your answer to part b supports this statement. a 1514125 + 13.84 15.5 + 131= 700 700 +5 = 140°C (1 dp) b The values are: O, 3.7, 5.6, 0.1, 7.4, tr, O In ascending order: 0, 0, ty, O1, 37, 56, —— 74 The median is the middle value so O.1 mm. Perth isin Australia, which is south of the SL iD taaaiminorcie fa) mean and median of discrete data. very small sample from a single location in each country so does not provide enough Reus Don't just look at the numerical values. You also need to consider whether the sample is large enough, and whether there are other geographical factors which could affect rainfall in these two locations. 1 From the eight weather stations featured in the large data set, write down: a the station which is furthest north b the station which is furthest south ¢ an inland station d_acoastal station € an overseas station 2 Explain, with reasons, whether daily maximum relative humidity is a discrete or continuous variable. 13 Chapter 1 Questions 3 and 4 in this exercise use the following extracts from the large data set. LEEMING HEATHROW © Crown Copyright Met Office 2015 © Crown Copyright Met Office 2015 9 > 9 = & oa 3 co im gs g @ 5 = 2 =e A = a/2£!1¢/ ai/e|s) ¥ = 2 7 2 2/2/24 2/2 )2| 4 & ‘ 2 a € a é a 2) 2) 3/2 2) 2/3 | 2 = 2 = z g|2|2| 3 3 | 2] 2] 3 & 3 2 & & 2 3 € 2 2 = 2 = 2 2 = 2 2 4 4 a & 4 a a a 4 4 01/06/2015 89 10 a1 15 01/06/2015 121 0.6 41 43. 02/06/2015 10.7 tr 89 7 02/06/2015 15.4 tr 16 18, 03/06/2015 12.0 0 10.0 8 03/06/2015 15.8 0 od 9 04/06/2015 1L7 0 12.8 7 04/06/2015 16.1 O08 144 6 05/06/2015 15.0 0 8.9 9 05/06/2015 19.6 tr 5.3 9 06/06/2015 11.6 tr 54 7 06/06/2015 14.5 0 12.3 12 07/06/2015 12.6 0 13.9 10 07/06/2015 14.0 cv) 131 5 08/06/2015 94 0 97 7 08/06/2015 14.0 te 64 7 09/06/2015 97 o 1d 5 09/06/2015 14 0 os 10 rone2015 | 10 | o | 146 | 4 roosr01s | 143 | o | 72 | 10 ©® 3 a Work out the mean of the daily total sunshine for the first 10 days of June 2015 in: Leeming ii Heathrow. b Work out the range of the daily total sunshine for the first 10 days of June 2015 in: Leeming Heathrow. ¢ Supraj says that the further north you are, the fewer the number of hours of sunshine. State, with reasons, whether your answers to parts a and b support this conclusion, 14 LULID state in your answer whether Leeming is north or south of Heathrow. Data collection ® 4 Calculate the mean daily total rainfall in Heathrow for the first 10 days of June 2015. Explain clearly how you dealt with the data for 2/6/2015, 5/6/2015 and 8/6/2015. ® 5 Dominic is interested in seeing how the average monthly temperature changed over the summer months of 2015 in Jacksonville. He decides to take a sample of two days every month and average the temperatures before comparing them. a. Give one reason why taking two days a month might be: i a good sample size fia poor sample size. b He chooses the first day of each month and the last day of each month, Give a reason why this method of choosing days might not be representative. © Suggest a better way that he can choose his sample of days. ® 6 The table shows the mean daily temperatures at each of the eight weather stations for August 2015: Camborne | Heathrow | Hurn | Leeming | Leuchars | Beijing | Jacksonville | Perth Mean daily mean temp: 15.4 18.1 16.2 15.6 14.7 26.6 26.4 13.6 co) ‘© Crown Copyright Met Office a Give a geographical reason why the temperature in August might be lower in Perth than in Jacksonville. b Comment on whether this data supports the conclusion that coastal locations experience lower average temperatures than inland locations. ® 7 Brian calculates the mean cloud coverage in Leeming in September 1987. He obtains the answer 9.3 oktas. Explain how you know that Brian’s answer is incorrect. ®) 8 The large data set provides data for 184 consecutive days in 1987. Marie is investigating daily mean windspeeds in Camborne in 1987. a Describe how Marie could take a systematic sample of 30 days from the data for Camborne in 1987. (3 marks) b Explain why Marie’s sample would not necessarily give her 30 data points for her investigation. (1 mark) 15 Chapter 1 You will need access to the large data set and spreadsheet software to answer these questions. 1 a Find the mean daily mean pressure in Beijing in October 1987. b Find the median daily rainfall in Jacksonville in July 2015. € i Drawa grouped frequency table for the daily mean temperature in Heathrow in July and August 2015. Use intervals 10 = t < 15, etc, ee le ne ii Drawa histogram to display this data, a spreadsheet to work iii Draw a frequency polygon for this data. are ce 2 a i Take a simple random sample of size 10 from the data for daily mean each class, windspeed in Leeming in 1987. fi Work out the mean of the daily windspeeds using your sample. b i Take a sample of the last 10 values from the data tor daily mean indspeed in Leuchars in 1987. Work out the mean of the daily mean windspeeds using your sample. ¢ State, with reasons, which of your samples is likely to be more representative. d_ Suggest two improvements to the sampling methods suggested in part a. e Use an appropriate sampling method and sample size to estimate the mean windspeeds in Leeming and Leuchars in 1987. State with a reason whether your calculations support the statement ‘Coastal locations are ly to have higher average windspeeds than inland locations. Teo R att 1 The table shows the daily mean temperature recorded on the first 15 days in May 1987 at Heathrow. Day of month] 1 | 2/3] 4[]5]6]7][8]9 [|u|] i2] 3] ia] is Daily mean | 4.6! 8.8 | 7.2 | 7.3 |10.1{ 11.9] 122] 12.1152) 11.1] 10.6] 12.7] 89 | 100] 9.5 temp (°C) ‘© Crown Copyright Met Office a Use an opportunity sample of the first 5 dates in the table to estimate the mean daily mean temperature at Heathrow for the first 15 days of May 1987. b Describe how you could use the random number function . on your calulatie to select a simple random sample of GID ire sureyouseserne 5 dates from this data. ¢ Use a simple random sample of 5 dates to estimate the mean daily mean temperature at Heathrow for the first 15 days of May 1987, Use all 15 dates to calculate the mean daily mean temperature at Heathrow for the first 15 days of May 1987. Comment on the reliability of your two samples. your sampling frame. 2. a Give one advantage and one disadvantage of using: i acensus a sample survey. b Itis decided to take a sample of 100 from a population consisting of 500 elements, Explain how you would obtain a simple random sample from this population. 16 Data collection 3 a Explain briefly what is meant by: i apopulation ii a sampling frame, b A market research organisation wants to take a sample of i owners of diesel motor cars in the UK fi persons living in Oxford who suffered injuries to the back during July 1996. Suggest a suitable sampling frame in each case. 4 Write down one advantage and one disadvantage of using: a stratified sampling b simple random sampling. 5 ‘The managing director of a factory wants to know what the workers think about the factory canteen facilities. 100 people work in the offices and 200 work on the shop floor. The factory manager decides to ask the people who work in the offices a Suggest a reason why this is likely to produce a biased sample. b Explain briefly how the factory manager could select a sample of 30 workers using: i systematic sampling ii stratified sampling _ifi_ quota sampling. 6 There are 64 girls and 56 boys in a school Explain briefly how you could take a random sample of 15 pupils using: a simple random sampling b stratified sampling. 7 As part of her statistics project, Deepa decided to estimate the amount of time A-level students at her school spent on private study each week. She took a random sample of students from those studying arts subjects, science subjects and a mixture of arts and science subjects. Each student kept a record of the time they spent on private study during the third week of term, a Write down the name of the sampling method used by Deepa. b Give a reason for using this method and give one advantage this method has over simple random sampling, 8 A conservationist is collecting data on African springboks. She catches the first five springboks she finds and records their masses. a State the sampling method used. b Give one advantage of this type of sampling method. The data is given below: Wke Toke 82kg 4kg 78k. ¢ State, with a reason, whether this data is discrete or continuous, Calculate the mean mass. A second conservationist collects data by selecting one springbok in each of five locations. The data collected is given below: T9ke 86kg ke G68kg 75kg. ¢ Calculate the mean mass for this sample. f State, with a reason, which mean mass is likely to be a more reliable estimate of the mean mass of African springboks. g Give one improvement the second conservationist could make to the sampling method. 7 Chapter 1 9 Data on the daily total rainfall in Beijing during 2015 is gathered from the large data set. The daily total rainfall (in mm) on the first of each month is listed below: May Ist 9.0 June Ist 0.0 July Ist 1.0 August Ist 32.0 September Ist 4.1 October Ist 3.0 a State, with a reason, whether or not this sample is random, (1 mark) b Suggest two alternative sampling methods and give one advantage and one disadvantage of each in this context. (2 marks) ¢ State, with a reason, whether the data is discrete or continuous. (1 mark) d Calculate the mean of the six data values given above, (1 mark) Comment on the reliability of this value as an estimate for the mean daily total rainfall in Beijing during 2015. (1 mark) oes You will need access to the large data set and spreadsheet software to answer these questions. a Take a systematic sample of size 18 for the daily maximum relative humidity in Camborne during 1987. 'b Give one advantage of using a systematic sample in this context. ¢ Use your sample to find an estimate for the mean daily maximum relative humidity in Camborne during 1987. d_ Comment on the reliability of this estimate, Suggest one way in which the reliability can be improved. 18 Data collection ene 1 + Instatistics, a population is the whole set of items that are of interest. + Acensus observes or measures every member of a population. 2 + Asample isa selection of observations taken from a subset of the population which is used to find out information about the population as a whole. + Individual units of a population are known as sampling units. + Often sampling units of a population are individually named or numbered to form a list. called a sampling frame. 3 + Asimple random sample of size n is one where every sample of size » has an equal chance of being selected + In systematic sampling, the required elements are chosen at regular intervals from an ordered list. + In stratified sampling, the population is divided into mutually exclusive strata (males and females, for example) and a random sample is taken from each. + In quota sampling, an interviewer or researcher selects a sample that reflects the characteristics of the whole population + Opportunity sampling consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for. 4 + Variables or data associated with numerical observations are called quantitative variables or quantitative data. + Variables or data associated with non-num variables or qualitative data | observations are called qualitative 5 + Avariable that can take any value in a given range is a continuous variable. + Avariable that can take only specific values in a given range is a discrete variable. 6 - When data is presented in a grouped frequency table, the specific data values are not shown. The groups are more commonly known as classes. + Class boundaries tell you the maximum and minimum values that belong in each class. + The midpoint is the average of the class boundaries. + The class width is the difference between the upper and lower class boundaries. 7 Ifyou need to do calculations on the large data set in your exam, the relevant extract from the data set will be provided. 19 Measures of location and spread c= After completing this chapter you should be able to: © Calculate measures of central tendency such as the mean, median and mode + pages 21-25 © Calculate measures of location such as percentiles and deciles ~> pages 25-28 © Calculate measures of spread such as range, interquartile range and interpercentile range ~> pages 28-29 © Calculate variance and standard deviation ~> pages 30-33 © Understand and use coding ~> pages 33-36 Cac aad ‘State whether each of these variables is qualitative or quantitative: a Colour of car b Miles travelled by a cyclist ¢ Favourite type of pet d_ Number of brothers and sisters. Section 1.4 * 2. State whether each of these variables is ™ discrete or continuous: = Number of pets owned Distance walked by ramblers Fuel consumption of lorries Number of peas ina pod Times taken by a group of athletes to run 1500 m. Section 1.4 3 Find the mean, median, mode and range of the data shown in this frequency table. Numberof [3] 15] 6] 7 peas ina pod Frequency | 4 | 7 [11/18] 6 © GCSE Mathematics eanco Wildlife biologists use statistics such as mean wingspan and standard deviation to compare populations of endangered birds in different habitats. “+ Mixed exercise Q12 5 Measures of location and spread @ Measures of central tendency A measure of location is a single value which describes a position in a data set. If the single value describes the centre of the data, itis called a measure of central tendency. You should already know how to work out the mean, median and mode of a set of ungrouped data and from ungrouped frequency tables. = The mode or modal class is the value or class that occurs most often. = The median is the middle value when the data values are put in order. = The mean can be calculated using the C= ‘* Y represents the mean of the data. You say ‘x bar’. ‘+ Syrepresents the sum of the data values. ‘¢ nis the number of data values. The mean of a sample of 25 observations is 6.4. The mean of a second sample of 30 observations is 7.2. Calculate the mean of all 55 observations. For the first set of observations: zx Ex 7 90 64 = 5e Ex = 64 x 25 = 160 ‘Sum of data values = mean x number of data values. For the second set of observations: x a, Y= iy 8072 = 35 CED V0.0 can use xand y to represent Sy =72 x 30 = 216 two different data sets. You need to use different —— letters for the number of observations in each 160+ an = data set. Mean = "Se an = 6.64 (2 dp) fata set You need to decide on the best measure to use in particular situations. + Mode This is used when data is qualitative, or quantitative with either a single mode or two modes (bimodal). It is not very informative if each value occurs only once. + Median This is used for quantitative data. It is usually used when there are extreme values, as they do not affect it. + Mean This is used for quantitative data and uses all the pieces of data. It therefore gives a true measure of the data. However, itis affected by extreme values. You can calculate the mean, median and mode for discrete data presented in a frequency table. ™ For data given ina frequency table, the mean can be calculated using the formula Cc Saf © Sxfis the sum of the products of the ¥ data values and their frequencies. “ # 3fis the sum of the frequencies. 21 Chapter 2 Rebecca records the shirt collar size, x, [Shirt collar size is 155] 16 ) 165] 17 of the male students in her year. Numbestamensll 6.1 a@ lon | 84 [a ‘The results are shown in the table. Find for this data: a the mode b the median € the mean. 4. Explain why a shirt manufacturer might use the mode when planning production numbers. 165 is the collar size with the a Mode = 16.5 demas highest frequency. b There are 95 observations so the median ts the 25-71 = aot There are 20 observations up to 15.5 and 49 observations up to 16. Median = 16 eres eee frequency table into your calculator, and calculate the mean 162 and median without having to enter the whole calculation, ‘The 48th observation is therefore 16. 45 + 263.5 +464 +561 +204 _ 1537.5 _ = 95 = ae = d The mode is an actual data value and gives the manufacturer information on the most common siz le not uucer Informetion On be: mesiicommon Stes: ‘The mean is not one of the data worn/purchased, values and the median is not tive of the most 1 Meryl collected wild mushrooms every day for a week. When she got home each day she weighed them to the nearest 100g. The weights are shown below: 500-700 400, 300 900700 700 a Write down the mode for this data. necessarily indi popular colla b Calculate the mean for this data. ¢ Find the median for this data. On the next day, Meryl collects 650 g of wild mushrooms. 4 Write down the effect this will have QTE) Try to answer part d without recalculating the on the mean, the mode and the median. ayerages. You could recalculate to check your answer. 2 Joe collects six pieces of data, x1, x2, x3, X4, Xs and x6. He works out that Ex is 256.2. a Calculate the mean for this data. He collects another piece of data. It is 52. b Write down the effect this piece of data will have on the mean. 22 Measures of location and spread 3 From the large data set, the daily mean visibility, v metres, for Leeming in May and June 2015 was recorded each day, The data is summarised as follows: May: n= 31, Sv = 724000 GED sou con't need to June: n = 30, Sv = 632000 refer to the actual large data a Calculate the mean visibility in each month einai ea cee oe given with the question. b Calculate the mean visibility for the total recording period 4 Asmall workshop records how long it takes, in minutes, for cach of their workers to make a certain item. The times are shown in the table. Worker A{pi[ci[p{[e[r[G[ufi]fs Timeinminutes | 7 | 12 | 10 | 8 | 6 | 8 | s [2%] ul] 9 a Write down the mode for this data. b Calculate the mean for this data. ¢ Find the median for this data. d_ The manager wants to give the workers an idea of the average time they took. Write down, with a reason, which of the answers to a, b and ¢ she should use. 5 The frequency table shows the number of Breakdowns | 0[1]2]3]4] 5] breakdowns, hb, per month recorded by a road fi haulage firm over a certain period of time. Frequeney s[ulel3 [i a Write down the modal number of breakdowns. b Find the median number of breakdowns, ¢ Calculate the mean number of breakdowns. d_ Ina brochure about how many loads reach their destination on time, the firm quotes ‘one of the answers to a, b or ¢ as the number of breakdowns per month for its vehicles. Write down which of the three answers the firm should quote in the brochure. 6 The table shows the frequency distribution for Number of petals | 5] 6] 7]8]9 the number of petals in the flowers of a group Frequency 31s7baot3 Ta of celandines. ~ - Calculate the mean number of petals. ©® 7 A naturalist is investigating how many eggs the endangered kakapo bird lays in each brood cycle. The results are given in this frequency table. Number of eggs | 1 | 2 | 3 Secu Frequency 7Lp|2 Use the formula for the mean of an ungrouped frequency table to write an equation involving p. If the mean number of eggs is 1.5, find the value of p. 23 Chapter 2 You can calculate the mean, the class containing the median and the modal class for continuous data presented in a grouped frequency table by finding the midpoint of each class interval. The length x mm, to the nearest Length of pine cone (mm) | 30-31 | 32-33 | 34-36 | 37-39 mm, of a random sample of pine Frequency, 30 13 cones is measured. The data is shown in the table. a a Write down the modal class. b Estimate the mean. ¢ Find the median class. The modal class is the class with the. ighest frequency. a Modal class = 34-36 b Mean = 205% 24.925 x 25 +95 x 20438 x13 = e Use x ine the midpoint of each class interval as the value of x. ‘The answer is an estimate because you don’t know the exact data values. © Section 1.4 ¢ There are 70 observations so the median is the 35.5th. The 35.5th observation will lie in the class 34-36, [Scns 1 The weekly wages (to the nearest £) of the production Weaiy-wage | Frequency line workers in a small factory is shown in the table. © - a Write down the modal class. 175-225 4 b Calculate an estimate of the mean wage. 226-300 8 © Write down the interval containing the median. ati 390 18 351-400 28 401-500 7 © 2 The noise levels at 30 locations near an outdoor concert venue were measured to the nearest decibel. The data collected is shown in the grouped frequency table. Noise (decibels) | 65-69 | 70-74 | 75-79 85-89 | 90-94 | 95-99 Frequency 1 4 6 8 4 1 a Calculate an estimate of the mean noise level. (1 mark) b Explain why your answer to part a is an estimate. (1 mark) © 3 The table shows the daily mean temperature at Heathrow in October 1987 from the large data set. 1451< 16] 1651< 18 3 2 © Crown Copyri Temp@C) | 6=1<8 | 8=1<10 Frequeney 3 7 a Write down the modal class. b Calculate an estimate for the mean daily mean temperature. 24 Measures of location and spread © 4 Two DIY shops (A and B) recorded the ages of their workers. Problem-solving Age of worker | 16-25 | 26-35 | 36-45 | 46-55 | 56-65 | 66-75 Since age is always Frequency A 5 16 14 2 26 14 ee ret = ne 5 5 joundaries for the 16% Freqoeney B 4 Be 10 "8 5 s group are 16 and 26. This By comparing estimated means for each shop, determine which means that the midpoint shop is better at employing older workers. of the class is 21. @ Other measures of locai The median describes the middle of the data set It splits the data set into two equal (50%) halves You can calculate other measures of location such as qual s and percentiles. The lower quartile is one-quarter This is the median value. _--The upper quartile is ~N a ¥ of the way through the data set. | eet three-quarters of the way eae Highest. through the data set. value & & value Percentiles splitthe — - <2 es SI eH SS data set into 100 parts. ae ~~~ 85%6 of the data values The 10th percentile lies 10% are less than the 85th one-tenth of the way i percentile, and 15% are through the data. greater. Use these rules to find the upper and lower quartiles for discrete data. ™ To find the lower quartile for discrete data, divide n by 4. If this is a whole number, the lower quartile is halfway between this data point and the one above. If it is not a whole number, round up and pick this data point. ™ To find the upper quartile for discrete data, find 3 of n. If this is a whole number, the upper quartile is halfway between this data point and the one above. If it is not a whole number, round up and pick this data point. ae From the large data set, the daily maximum gust (knots) during the first 20 days of June 2015 is recorded in Hurn. The data is shown below: 4 15 17 7 1 18 19 19 22 22 23 23 23 4 25 26 27 28 36 39 ROEELD 0, is the lower quartile, Q, is the median and Q, is the upper quartile. © Crown Copyright Met Office Find the median and quartiles for this data. 25 Chapter 2 5.5th value 18 knots Q, = 15.5th value, ———___________ 25.5 knots Qs When data are presented in a grouped frequency table you can use a technique called interpolation to estimate the median, quartile: and percentiles, When you use interpolation, you are assuming that the data values are evenly distributed within each class. Aa ‘The length of time (to the nearest minute) spent on the internet each evening by a group of students is shown in the table. a Find an estimate for the upper quartile. 3x70 a Upper quartile: = 52.5th value Using interpolation: 365 - 57 365-335 57-27 Qs-335 _ 255 a 3a Q, = 36.05 b The 10th percentile is the 7th data value. Pio- 31S 7 Ba5- S15 27— Po 315 26 Qeis the median. It lies halfway between the 10th and 11th data values (which are 22 knots and 23 knots respectively). Ba550 the lower quartile is halfway between the Sth and 6th data values. 3x2 * 20 = 15 30 the lower quartile is halfway between the 15th and 16th data values. CREED 1 grouped continuous data, or data presented in a cumulative frequency table: S Q= in data value Q= th data value Q= 2th data value Length of time spent | 5931 | 32-33 | 34-36 | 37-39 6m internet (minutes) Frequeney 2 >3s |» | 3 b Find an estimate for the 10th percentile. ___ The endpoints on the line represent the class boundaries. The values on the bottom are the cumulative frequencies for the previous classes and this class. Goicuentar Use proportion to estimate Q,, The 52.5th value 525-27 51-27 Q 335 365-335 boundaries. Equate these two fractions to form an equation and solve to find Q,. CHEE vou can write the 10th percentile aS Pip lies of the way into the class, so Q, lies of the way between the class Measures of location and spread Teste 1 From the large data set, the daily mean pressure (hPa) during the last 16 days of July 2015 in Perth is recorded. The data is given below: 1024 1022 1021_—S 1013, 1009-1018. 1017S 1024 1027 1029-1031. «1025, «1017-1019 1017S «1014 a Find the median pressure for that period. b Find the lower and upper quartiles. 2 Rachel records the number of CDs in the collections of , students in her year. The results are in the table bel THE en unerouped students in her year. The results are in the table below. peter a ae Number of CDs_[ 35 | 36 | 37 | 38 | 39 need to use interpolation, Use Frequeney 3]7|29 [4] 2 the rules for finding the median 7 and quartiles of discrete data. Find Q), Q.and Q;. © 3 A hotel is worried about the reliability of its lift. ‘Number of | Frequency It keeps a weekly record of the number of times breakdowns it breaks down over a period of 26 weeks. O41 8 The data collected is summarised in the 23 7 table opposite. 15 Use interpolation to estimate the median number of breakdowns. (2 marks) 4 The weights of 31 Jersey cows were recorded to the nearest kilogram. The weights are shown in the table. a Find an estimate for the 5 median weight. wee 300-349 | 350-399 | 400-449 | 450-499 | so0-s49 b Pind the lower quartile, Q,. Frequency 3 6 10 7 5 © Find the upper quartile, Qs. d_ Interpret the meaning of the value you have found for the upper quartile in part e. © 5 A roadside assistance firm kept a record over a week of the amount of time, in minutes, people were kept waiting for assistance. The times are shown below. <1<30 | 30<7<40 | 40=7<50 | 50=1<60 | 0 <1<70 Time waiting, ¢ (minutes) | 20 Frequency 6 10 18 1B 2 a Find an estimate for the mean wait time. (1 mark) b Calculate the 65th percentile. (2 marks) The firm writes the following statement for an advertisement: Only 10% of our customers have to wait longer than 56 minutes © By culating a suitable percentile, comment on the validity of this claim. (3 marks) 27 Chapter 2 © © The table shows the recorded wingspans, in metres, of 100 endangered Californian condors. a [Wingspan w(m) [1.0 = w<15 | 15=w<20|20=w<25|25=w<30| 30=w | Frequeney 4 20 37 28) i a Estimate the 80th percentile and interpret the value. (3 marks) b State why it is not possible to estimate the 90th percentile. (1 marks) @® Measures of spread ‘A measure of spread is a measure of how spread out, ras tH i : 4 Measures of spread the data is. Here are two simple measures of sprea ee atleies nace oF ™ The range is the difference between the largest dispersion or measures of variation. and smallest values in the data set. = The interquartile range (IQR) is the difference between the upper quartile and the lower quartile, Q, - Q,. The range takes into account all of the data but can be affected by extreme values. The interquartile range is not affected by extreme values but only considers the spread of the middle 50% of the data. = The interpercentile range is the difference between the values for two given percentiles. The 10th to 90th interpercentile range is often used since it is not affected by extreme values but still considers 80% of the data in its calculation. The table shows the masses, in tonnes, of 120 African bush elephants. Mass, m(Q) | 40=m<4.5 | 45=m<50 | 5.05m<5.5 | 5.55m<6.0 | 6.05m<65 Frequency B 23 31 34 19 Find estimates for: atherange bb theinterquartilerange the 10th to 90th interpercentile range. The largest possible value is 65 and the smallest possible value is 40. a Range is 6.5 — 4.0 = 25 tonnes. b Q, = 30th data value: 4.87 tonnes. Q, = 90th data value: 5.84 tonnes, The interquartile range is therefore 5.84 ~ 4.87 = 0.97 tonnes. Q-45 _ 30-13 50-45 23 € 10th percentile = 12th data value: ——— 4.4G tonnes. 20th percentile ]OBth data valve: 6.18 tonnes. Use interpolation to find the 10th and 90th The 10th to 90th interpercentile range is | —— percentiles, then work out the difference between. therefore 6.18 — 4.46 72 tonnes. them. 28 Measures of location and spread Teste ®1 ‘The lengths of a number of slow worms were Lengths of slow | Frequeney measured, to the nearest mm. worms (mm) The results are shown in the table. 125-139 4 a Work out how many slow worms were 140-154 4 measured. 155-109 2 b Estimate the interquartile range for the lengths 170-184 7 of the slow worms. 185-199 20 © Calculate an estimate for the mean length of 200-214 24 slow worms. 215-229 10 d_ Estimate the number of slow worms: ree Wine whose length is more than one . 7 : . For part d, work out ¥ + IQR, and determine which interquartile range above the mean. ° . class interval it falls in. Then use proportion to work out how many slow worms from that class interval you need to include in your estimate. The table shows the monthly income for workers in a factory. Monthly income, x (£) | 900 =x < 1000 | 1000 = x < 1100 | 1100 (x - ¥) a formulae easier to use and learn. Ex (Ex) The second version of the formula, =*- -— (=) , is easier to work with when given raw data. It can be thought of as ‘the mean of the squares minus the square of the mean’. Sex 5 The third version, —*, is easier to use if you can use your calculator to find S., quickly. The units of the variance are the units of the data squared. You can find a related measure of spread that has the same units as the data. = The standard deviation is the square root of the variance: CEEEED is the symbol we use for the standard deviation of a data set. Hence a” is used for the variance. ‘The marks gained in a test by seven randomly selected students are: 3 4 6 2 8 & 5 Find the variance and standard deviation of the marks of the seven students. En 3+4464+24+8+84+5=36 =O+16+ 3644464464 +25 =218 Use the ‘mean of the squares minus the square =218 _ (38) a aco of the mean’ 2x? _ (2a) oa deviation, 69 = 2.17 G = You can use these versions of the formulae for variance and standard deviation for grouped data that is presented in a frequency table: (3) where fis the frequency for each group and Sis the total frequency. 30 Measures of location and spread Shamsa records the time spent out of school during the lunch hour to the nearest minute, x, of the female students in her year. The results are shown in the table. Time spent out of school (min) | 35 | 36 | 37 Frequeney 3 [17 [29 & Calculate the standard deviation of the time spent out of school. Un? = 3 x 35? + 17 x 36? + 29 x 37? + 34 x 38? = 14504 Bfr= 3x 35 +17 x 36 + 29x37 +34 x 38 = 3082 Bfa3417 + 29434 =83 14504 _ (3062 63 63 \* 074147 o = 0.74147... = 0.861 (3 51) If the data is given in a grouped frequency table, you can calculate estimates for the variance and standard deviation of the data using the midpoint of each class interval. Andy recorded the length, in minutes, of each telephone call he made for a month. The data is summarised in the table below. Length of telephone call (min) | 0 <1 =5 [5 _ 6487.5 265)? manually for the OS ep Vee ) = 128,.85802 midpoint of each es TeREE Ga class interval 31 Chapter 2 Pexercise @) 1 Given that fora variablex: Zx=24 Yx?=78 Find: a the mean b the variance o? ¢ the standard dey © 2 Ten collie dogs are weighed (w kg). The summary data for the weights is: Xw=241 Ew? = 5905 Use this summary data to find the standard deviation of the collies’ weights. (2 marks) 3 Eight students’ heights (/em) are measured. They are as follows: 165 170 190 180 175 «185176184 a Work out the mean height of the students. b Given ¥/? ¢ Work out the standard deviation, 54307 work out the variance. Show all your working. ® 4 Fora set of 10 numbers: Sx=50 Ex? = 310 For a different set of 15 numbers: S. 568 Find the mean and the standard deviation of the combined set of 25 numbers. © 5 Nahab asks the students in his year group how Numberoffs] &] 9] 10) 1] 2 much pocket money they get per week. Femeny [ie] 8 [28 bas | 20 The results, rounded to the nearest pound, are shown in the table, a Use your calculator to work out the mean and standard deviation of the pocket money. Give units with your answer. (marks) b How many students received an amount of pocket money more than one standard deviation above the mean? (2 marks) © 6 Ina student group, a record was kept of the ‘Number of days number of days of absence each student had —_| absent OP) ea |e 14 over one particular term. The results are Frequesty 220 lw? s shown in the table. Use your calculator to work out the standard deviation of the number of days absent. (2 marks) 7 Acertain type of machine contained a part that tended to wear out after different amounts of time. The time it took for 50 of the parts to wear out was recorded. The results are shown in the table. Lifetime, h (hours) [ 5 CD etsy necdw ance nese leoh _ and standard deviation of the original # the mean of the coded datais given by ="—“ Gta civen the statistics forthe coded data. You can rearrange the formulae as: © r=by+a the standard deviation of the coded data is given by o, =, where o, is the standard deviation © as=te, of the original data. A scientist measures the temperature, are given below: 332°C 355°C 306°C BITC 340°C. aera to code this data. b Calculate the mean and standard deviation of the coded data. ‘C, at five different points in a nuclear reactor. Her results a Use the coding y © Use your answer to part b to calculate the mean and standard deviation of the original data. 33 Chapter 2 a [Original data, x|332[355 306] 317 [340] Coded data, y [3.2|5.5|06| 17 [40 b Sy = 15, Sy? = 5974 15 3 F 2246 172 = 7 90 on = 172°C (3 ol) eax) From the large data set, data on the maximum gust, g knots, is recorded in Leuchars during May and June 2015, 5 The data was coded using h = “[>— and the following statisties found: Shy = 43.58 h=2 n=61 Calculate the mean and standard deviation of the maximum gust in knots. -5 0 F=2%10+5 =25 knots 0, = 0}, X 10 = BAB knots (3 st) 13 rdtT 1 Asst of data values, x, is shown below: 110-90 50 80 30 7 60 10" b Calculate the mean of the coded data values. ¢ Use your answer to part b to calculate the mean of the original data. a Code the data using the coding y 34 Measures of location and spread A set of data values, x, is shown below: S73 31 73 38 80 17 24 a Code the data using the coding y b Calculate the mean of the coded data values. © Use your answer to part b to calculate the mean of the original data. The cocled mean price of televisions ina shop was worked out. Using the coding y ==S° the mean price was 1.5. Find the true mean price of the televisions. (2 marks) The coding y = x — 40 gives a standard deviation LISD Adding or subtracting forp of 2.34, constants does not affect how Write down the standard deviation of x. Seer ignore the ‘-40' when finding the The lifetime, x, in hours, of 70 light bulbs is shown standard deviation for x. in the table. Lifetime, x (hours) [20 < x= 22] 22 8 8 iT | = ol Be 4 WI 2 °. ——e 10 15 2025 3038.40 4 5055 €0¢5 70756085 PE time, atin) 36 45 b Shaded area = (40 ~ 36) x 13.6 + (45 ~ 40) x 3.2 —— = 7OA students A random sample of daily mean temperatures (7, °C) was taken from the large data set for Hurn in 2015. The temperatures were summarised in a grouped frequency table and represented by a histogram. a Give a reason to support the use of a histogram to represent this data. b Write down the underlying feature associated with each of the bars in a histogram. On the histogram the rectangle representing the 16 < T'< 18 class was 3.2cm high and 2cm wide. The frequency for this class was 8. © Show that each day is represented by an area of 0.8 em?. 4 Given that the total area under the histogram was 48 en, find the total number of days in the sample. © 2 Some students take part in an Chapter 3 Temperature is continuous and the data were given in a grouped frequency table, b The area of the bar is proportional to the frequency. © Area of bar= 3.2% 2=64 64+6=06cnt There were GO days in the sample. 1. The data shows the mass, in pounds, of 50 adult Mini a Gena | _FReuany puffer fish. 10Sm<15 4 a Draw a histogram for this data. 1sam<20 2 b On the same set of axes, draw a frequency polygon. 20= m= 25 a 2 =m =30 8 30m <35 3 obstacle race. The time it took each student to complete the race was noted. The results are shown in the histogram, a Givea reason to justify the use of a histogram to represent this data, The number of students who took between 60 and 70 seconds is 90. b Find the number of students who took between 40 and 60 seconds. ¢ Find the number of students who took 80 seconds or less. Calculate the total number of students who took part in the race. Frequency density 40° 50 60) 70) 801000120 ‘Time (seconds) ae width is always proportional to frequency ina histogram, but not necessarily equal to frequency. ®3 Representations of data A Fun Day committee at a local sports + ce centre organised a throwing the + Hate cricket ball competition. The distance PEE thrown by every competitor was recorded. The histogram shows the data. The number of competitors who threw less than 20m was 40. a Why is a histogram a suitable diagram to represent this data? b How many people entered the Frequency density competition? ¢ Estimate how many people threw between 30 and 40 metres. 0 10 20 30 40 50 60 d_ How many people threw between 45 and 65 metres? Distance (metres) € Estimate how many people threw less than 25 metres. A farmer found the masses of a random sample of lambs, The masses were summarised in a grouped frequency table and represented by a histogram, The frequency for the class 28 < m < 32 was 32. a Show that 25 small squares on the histogram represents 8 lambs. b Find the frequency of the 24 < m < 26 class. ¢ How many lambs did the farmer weigh in total? Estimate the number of lambs that had masses between 25 and 29kg. Frequency density 2 2 mM 2% ® 6 32 4 Mass, » ecu You can use area to solve histogram problems where no vertical scale is given. You could also use the information given in the question to work out a suitable scale for the vertical axis. 51 Chapter 3 5 The partially completed histogram shows the time, in minutes, that passengers were delayed at an airport. & a i Copy and complete the table. 2 ‘Time, r(min) | Frequency 3 0=1<20 4 8 20=1<30 E 30<1< 35 15 355 1<40 25 4051550 1 20 300 «40070 50<1< 70 Time, « (minutes) ii Copy and complete the histogram. (4marks) b Estimate the number of passengers that were delayed for between 25 and 38 minutes. (2 marks) 6 The variable y was measured to the nearest whole number. 60 observations were taken and are recorded in the table below. y 1012 | 1344 [ 1517 [| 1825 Frequency 6 24 18 12 a Write down the class boundaries for the 13-14 class. A histogram was drawn and the bar representing the 13-14 class had a width of 4cm and a eeu hei ight of 6cm. For the bar representing the 15-17 class, find: b ithe width ii the height. (1 mark) (2 marks) (1 mark) Remember that area is proportional to frequency. 7 From the large data set, the daily mean temperature for Leeming during May 2015 is summarised in the table. A histogram was drawn. The 8 = ¢ < 10 group was represented by a bar of width | cm and a height of 8 cm. a Find the width and height of the bar representing the 11 << 12 group. (2 marks) b Use your calculator to estimate the mean and standard deviation of temperatures in Leeming ¢ Use linear interpolation to find an estimate for the lower quartile of temperatures. in May 2015. (3 marks) Daily mean temperature, ¢ (°C) Frequency 4s1< 8 8=/<10 W=r pages 61-62 © Interpret the coefficients of a regression line equation for bivariate data > pages 63-64 © Understand when you can use a regression line to make predictions > pages 64-66 Gear eke The table shows the scores out of 10 on a maths test and on a physics test for 7 students. Maths | 6|7|7/|8| 9/9 |10 Physics |9[7[6|7|5|4|5 Show this information on a scatter diagram. ‘€ GCSE Mathematics 2 Astraight line has equation y = 0.34 -3.21x. PARE reaenarecarene eel Write down : ; a strong correlation between a the gradient of the line greenhouse gas emissions and rising b the -intercept of the line atmospheric temperatures. + GCSE Mathematics > Mixed exercise Q2 LS Chapter 4 Correlation «= Bivariate data is data which has pairs of values for two variables. You can represent bivariate data on a scatter diagram. This scatter diagram shows the results from an experiment on how breath rate affects pulse rate: Each cross represents a data point. This subject had a breath rate of 32 breaths per minute and a pulse rate of 89 beats per minute. The researcher could control this variable. It is called the independent or explanatory variable. It is usually plotted on the horizontal axis. Pulse beats per minute 5 2 «2 630 (35 Breaths per minute The researcher measured this variable. It is called the dependent or response variable. tis usually plotted on the vertical axis. The two different variables in a set of bivariate data are often related, * Correlation describes the nature of the linear relationship between two variables. M y ” ? ¥ xx x x, x a as 6, a ms x; x O95 xx x x xx x % x ® 2 o % o % o x o x 0 % Strong negative Weak negative No (or zero) Weak positive Strong positive correlation correlation linear correlation correlation correlation For negatively correlated variables, when one variable increases the other decreases. ETD You should only : : use correlation to describe data For positively correlated variables, when one variable increases, ee eee the other also increases. ee eae Variables with no linear correlation could still show a non-linear relationship. Ge In the study of a city, the population density, in people/hectare, and the distance from the city centre, in km, was investigated by picking a number of sample areas with the following results. Area A; spi>c[p[eET FI] Glu 1 J Distance (km) o6 | 38 | 24 | 30 | 20 | 15 | 18 | 34 | 40 | 09 Population density 4 Jlefhectare) 50 2 14 20 33 47 25 8 16 38 a Draw a scatter diagram to represent this data. b Describe the correlation between distance and population density. ¢ Interpret your answer to part b. 60 ee a ‘Scatter diagram of population density and distance from centre x x ee x x Oo 1 2 8 4 @ Distance from centre (km) Grveumti © As distance from the centre increases, the Make sure you interpret results in the context of population density decreases the question. wo $338 8 Population density (peoplehnectare} area 8 3 ° b There is weak negative correlation. Two variables have a causal relationship if a change in one variable causes a change in the other. Just. because two variables show correlation it does not necessarily mean that they have a causal relationship. = When two variables are correlated, you need to consider the context of the question and use your common sense to determine whether they have a causal relationship. Hourly pay at age 25 ao p Hideko was interested to see if there was a relationship between what people earn and the age at which they left education or training. She asked 14 friends to fill in an anonymous questionnaire and recorded her results in a scatter diagram. Hourly pay (£) a Describe the type of correlation shown. Hideko says that her data supports the conclusion that more education causes people to earn a lower hourly rate of pay. “doko ro . 15161718 192021222324 4 b Give one reason why Hideko’s conclusion might not be valid. ‘Age at which education or training ended a Weak negative correlation, b Respondents who leit education later would have significantly less work experience than those who, left education earlier. This could be the cause of the reduced income shown in her results. barat T 1 Some research was done into the effectiveness of a weight-reducing drug. Seven people recorded their weight loss and this was compared x with the length of time for which they had been treated. A scatter 5 me diagram was drawn to represent this data. z| ** a Describe the type of correlation shown by the scatter diagram ee ‘Length of treatment b Interpret the correlation in context. 61 Chapter 4 2. The average temperature and rainfall were collected for a number of cities around the world ‘The scatter diagram shows this information. 250 a Describe the correlation between average temperature and average rainfall. ‘Average rainfall (mm) b Comment on the claim that hotter cities have less rainfall. 9 x 10 15 20 25 30 35 Average temperature (*C) 3. Eight students were asked to estimate the mass of a bag of sweets in grams. First they were asked to estimate the mass without touching the bag and then they were told to pick the bag up and estimate the mass again. The results are shown in the table below. Student A B c D E ¥F G H Estimate of mass not touching bag(g) | 25 18 32 27 21 35 28 30 Estimate of mass holding bag (2) ie [1 | 20 [17 | 15 | 26 | 2 | 20 a Draw a scatter diagram to represent this data. b Describe and interpret the correlation between the two variables. 4 Donal was interested to see whether there was a relationship between the value of a house and the speed of its internet connection, as measured by the time taken to download a 100 megabyte file. The table shows his results. “Time taken (5) 52 [55 [58 | 60 | 68 [83 [93 | 13 [136] 160 House value (£10005) | 300 | 310 | 270 | 200 | 230 | 205 | 208 | 235 | 175 | 180 a Draw a scatter diagram to represent this data, b Describe the type of correlation shown. Donal says that his data shows that a slow internet connection reduces the value of a house. ¢ Give one reason why Donal’s conclusion may not be valid. © 5 The table shows the daily total rainfall, r mm, and daily total hours of sunshine, s, in Leuchars. for a random sample of 11 days in August 1987, from the large data set. i 0 68 09 48 0 21.7 17 49 OL 22 OL 84 49 10.2 45 33 39 54 18 oF 1 46 ‘© Crown Copyright Met Office ‘The median and quartiles for the rainfall data are: Q\=0.1 Q=17 Q:=4.85 An outlier is defined as a value which lies either 1.5 * the interquartile range above the upper quartile or 1.5 x the interquartile range below the lower quartile. a Show that r = 21,7 is an outlier, (1 mark) b Give a reason why you might: include exclude this day's readings. (2 marks) ¢ Exclude this day’s readings and draw a scatter diagram to represent the data for the remaining ten days. (3 marks) Describe the correlation between rainfall and hours of sunshine. (1 mark) Do you think there is a causal relationship between the amount of rain and the hours of sunshine on a particular day? Explain your reasoning, (1 mark) 62 Correlation Linear regression When a scatter diagram shows correlation, you can draw a Dey line of best fit. This is a linear model that approximates CHEE ens ce the relationship between the variables. One type of line of best. : called the regression line. fit that is useful in statistics is a least squares regression line. This is the straight line that minimises the sum of the squares of the distances of each data point from the line. There are 4 data points on this scatter diagram. The regression line of ’ on xis the straight line that minimises the value of dj? + dh? + dy? + dz. tn general, if each data point is a distance d, from the line, the regression line minimises the value of Sd. The point (x2, y) isa vertical distance d, from the line. = The regression line of y on x is written in the aheomderorthe form y =a+ bx. variables is important. The regression You can use a calculator to find the values of the coefficients _line of yon x wil be different from aand b for a given set of bivariate data. You will not be the regression line of x on 3. required to do this in your exam. = The coefficient / tells you the change in y for each unit change in x. © If the data is positively correlated, b will be positive. © If the data is negatively correlated, b will be negative. From the large data set, the daily mean windspeed, w knots, and the daily maximum gust, g knots, were recorded for the first 15 days in May in Camborne in 2015. w [m4] 3] a3] 9 [isfis] 7 [is [ili] iu] 9] s [iw] 7 g__[ 33 | 37 [| 29 | 23 | 43 [ 38 | 17 [ 30 [ 28 | 29 | 29 | 23 | 21 | 28 | 20 © Crown Copyright Met Office The data was plotted on a scatter diagram: a Describe the correlation between daily mean windspeed and daily maximum gust. The equation of the regression line of g on w for these 15 days is g= 7.23 + L820. b Give an interpretation of the value of the gradient of this regression line. Daily maximum gust, gtknot 0. 0 5 10 15 20 if i i i is i Daily mean vinpeed, © JUStify the use of a linear regression line in this instance. ww (knots) 63 Chapter 4 4 There i a strong postive corelation QDs sksas between daly mean windepeed and daily Aeaa acc wing eeiecbey, maximum gust. b If the daily mean windspeed increases by 10 knots the daily maximum gust increases eeu A regression line is a valid model when the data shows linear correlation. The stronger the correlation, the more accurately the regression line will model the data. by approximately 18 knots. ¢ The correlation suggests that there is a linear relationship between g and w 50a linear regression line is a suitable model. If you know a value of the independent variable from a bivariate data set, you can use the regression line to make a prediction or estimate of the corresponding value of the dependent variable. = You should only use the regression line to make This is called interpolation. predictions for values of the dependent variable Making a prediction based on a value that are within the range of the given data. outside the range of the given data is called extrapolation, and gives a much x less reliable estimate. ‘The head circumference, yem, and gestation period, x weeks, for a random sample of eight newborn babies at a clinic were recorded. Gestation period (x weeks) 36 40 33 37 40 39 35, 38 Head circumference (y cm) 30.0 | 35.0 | 29.8 | 32.5 | 33.2 | 32.1 | 30.9 | 33.6 ‘The scatter graph shows the results. z ‘The equation of the regression line of y on x is y = 8.91 + 0.624x. of The regression equation is used to estimate the head circumference 3 5 of a baby born at 39 weeks and a baby born at 30 weeks. 5 a . 28, a Comment on the reliability of these estimates. 30 32 34 36 38 40 42 A nurse wants to estimate the gestation period for a baby born Gestation period (weeks) with a head circumference of 31.6cm. b Explain why the regression equation given above is not suitable for this estimate. a The prediction for 39 weeks is within the range of the data (interpolation) so is more likely to be accurate. The prediction for 30 weeks is outside the range of the data (extrapolation) 0 is less likely to be accurate. Correlation b The independent (explanatory) variable in You should only make predictions for this model is the gestation period, x. You the dependent variable. f you needed to predict a should not use this model to predict a value of fora given value of yyou would need to value of x for a given value of y. use the regression line of x on y. 1 An accountant monitors the number of items produced per month by a company together with the total production costs. The table shows this data. Number of items, (1000s) [ 21 | 39 | 48 | 24 | 72 | 75 | 15 | 35 | 62 | 81 [ 12 | 56 Production costs, p (£10008) | 40 | 58 | 67 | 45 | 89 | 96 | 37 | 53 | 83 [102] 35 | 75 a Draw a scatter diagram to represent this data. ‘The equation of the regression line of p on mis p = 21.0 + 0.98n. b_ Draw the regression line on your scatter diagram. ¢ Interpret the meaning of the figures 21.0 and 0.98. ‘The company expects to produce 740000 items in June, and 95 000 items in July. d Comment on the suitability of this regression line equation to predict the production costs in each of these months. 2 The relationship between the number of coats of paint applied Cosel | Proiation to a boat and the resulting weather resistance was tested in a paint (x) | (years) (y) laboratory. The data collected is shown in the table, 1 44 a Draw a scatter diagram to represent this data. 2 Xe) ‘The equation of the regression line is y = 2.93 + 1.45x. 3 mI Helen says that a gradient of 1.45 means that if 10 coats of paint 4 8.8 are applied the protection will last 14.5 years. 5 10.2 b Comment on Helen's statement. 3 The table shows the ages of some chickens and the number of eggs that they laid in a month. Age of chicken, a (months) is | 32 | 44 | oo | 71 | 79 [ 99 | 109 | 118 | 140 Number of eggs laidina month,z | 16 | 18 | 13 | 7 | 2] 7 [| Bi 6 | 9 a Draw a scatter diagram to show this information. Robin calculates the regression line of on a as n= 16.1 + 0.0634. b Without further calculation, explain why Robin's regression equation is incorrect. 4 Aisha collected data on the numbers of bedrooms, x, and the values, (£10003), of the houses in her village. She calculates the regression equation of y on x to be y = 190 + 50x. She states that the value of the constant in her regression equation means that a house with no bedrooms in her village would be worth £190 000. Explain why this is not a reasonable statement. 65 Chapter 4 © 5 The table shows the daily maximum relative humidity, 4 (%), and the daily mean visibility, v decametres (Dm), in Heathrow for the first two weeks in September 2015, from the large data set. a _| 94 | 95 | 92 | 80 | 97 | 94 | 93 | 90 | 87 | 95 | 93 | 92 | om | 98 v | 2600 | 2900 | 3900 | 4300 | 2800 | 2400 | 2700 | 3500 | 3000 | 2200 | 2200 | 3300 | 2800 | 2200 © Crown Copyright Met Office The equation of the regression line of v on his v = 12700 ~ 106h a Give an interpretation of the value of the gradient of the regression line. (1 mark) b Use your knowledge of the large data set to explain whether there is likely to be a causal relationship between humidity and visibility. (2marks) ¢ Give reasons why it would not be reliable to use this regression equation to predict: i the mean visibility on a day with 100% humidity (2marks) ii the humidity on a day with visibility of 3000dm. (2 marks) d State two ways in which better use could be made of the large data set to produce a model describing the relationship between humidity and visibility. (2 marks) Era 1 A survey of British towns recorded the number of serious road accidents in a week (x) in each town, together with the number of fast food restaurants (y). The data showed a strong positive correlation. Katie states that this shows that building more fast food restaurants in her town will cause more serious road accidents, Explain whether the data supports Katie’s statement. 2 The following table shows the mean CO, concentration in the atmosphere, c (ppm), and the increase in average temperature compared to the 30-year period 1951-1980, 1 (°C). Year | 2015 | 2013 | 2011 | 2009 | 2007 | 2005 | 2003 | 2001 | 1999 [1997 | 1995 | 1994 e(ppm) | 401 | 397 | 392 | 387 | 38a | 381 | 376 | 371 | 368 | 363 | 361 | 357 1e0)_ | 0.86 | 0.65 | 0.59 | 0.64 | 0.65 | 0.68 | 0.61 | 0.54 | 0.41 | 0.47 | 0.45 | 0.24 Source: Earth System Research Laboratory (CO; data); GISS Surface Temperature Analysis, NASA (temperature data) a Drawa scatter diagram to represent this data. b Describe the correlation between c and 1. ¢ Interpret your answer to part b. © 3 The table below shows the packing times for a particular employee for a random sample of orders in a mail order company. Number of items (nm) 2 3 a 4 5 5 6 7 8 8 8 9 i 13 Time (¢ min) It | 14 | 16 | 16 | 19 | 21 | 23 | 25 | 24 | 27 | 28 | 30 | 35 | 42 A scatter diagram was drawn to represent the data. a Describe the correlation between number of items packed and time taken, (1 mark) The equation of the regression line of ¢ on mis = 6.3 + 2.64n. Time, ¢ (min) b Give an interpretation of the value 2.64, (1 mark) Number of times (n) 66 ®4 @©«6 Correlation Energy consumption is claimed to be a good predictor of Gross National Product. An economist recorded the energy consumption (x) and the Gross National Product () for eight countries, The data is shown in the table. Energy consumption (x) 34 | 7.7 [120] 75 | s8 | 67 [13 | 131 Gross National Product (y) | 55 | 240 | 390 | 1100 | 1390 | 1330 | 1400 | 1900 225 + 12.9%. ‘The economist uses this regression equation to estimate the energy consumption of a country with a Gross National Product of 3500. Give two reasons why this may not be a valid estimate. (2 marks) The equation of the regression line of y on The table shows average monthly temperature, ¢ (°C), and the number of pairs of gloves, g, a shop sells each month. t1C)| 6 6 50 10 13 16 | 18 19 16 | 12 9 7 g 81 58 | 50 | 42 19 | 21 4 x 20 | 33: | 58 | 65 The following statistics were calculated for the data on temperature: mean = 15.2, standard deviation = 11.4 An outlier is an observation which lies 2 standard deviations from the mean. a Show that = 50 is an outlier. (1 mark) b Give a reason whether or not this outlier should be omitted from the data. (1 mark) ‘The equation of the regression line of r on g for the remaining data is 1 = 18.4 - 0.18g. ¢ Give an interpretation of the value -0.18 in this regression equation. (1 mark) James placed different masses (m) on a spring and measured the resulting length of the spring (s) in centimetres, The smallest mass was 20 g and the largest mass was 100 g. He found the equation of the regression line of s on m to be s = 44 + 0.2m. a Interpret the values 44 and 0.2 in this context. (2 marks) b Explain why it would not be sensible to use the regression equation to work out: i the value of s when m = 150 ii the value of m when y= 60. (2 marks) A student is investigating the relationship Chocolate brand | x(% cocoa) | __y (pence) between the price (y pence) of 100g of A 10 35 chocolate and the percentage (x%) of B 20 35 cocoa solids in the chocolate. G 30 40 ‘The data obtained is shown in the table. 7 = roo ., E 40, 60 a Draw a scatter diagram to represent r a oo this data, (2marks) 7 7 0 The equation of the regression line of Ht 70 130 yon xis y=17.0+ 1.54%. b Draw the regression line on your diagram. (2 marks) The student believes that one brand of chocolate is overpriced and uses the regression line to suggest a fair price for this brand. © Suggest, with a reason, which brand is overpriced. (1 mark) d_ Comment on the validity of the student's method for suggesting a fair price. (1 mark) 67 Chapter 4 You will need access to the large data set and spreadsheet software to answer these questions. 1. Investigate the relationship between daily mean windspeed, w, and daily maximum gust, g, in Leeming in 2015. a Drawa scatter diagram of w against g for the entire data set for Leeming in 2015. b Describe the correlation shown. ‘© Comment on whether there is likely to be a causal relationship between mean windspeed and maximum gust. The equation of the regression line of g on w is given by g = 4.97 + 2.15. d_ Use the equation of the regression line to predict the maximum gust on ‘a day when the mean windspeed i LIA You can use the SLOPE and INTERCEPT functions in some iO5knots i Sknots —il 1Zknots_-——iv 40 knots. Pace on © Comment on the accuracy of each prediction in part d. the values of a and b in f Calculate the equation of the regression line of w on g, and use it to Chis PSSST predict the mean windspeed on a day when the maximum gust was 30 knots. 2 Use a similar approach to investigate the daily total sunshine and daily mean total cloud cover in Heathrow in 1987. a Use a regression model to suggest values for the missing total sunshine data in the first half of May. b Do you think there is 2 causal relationship between these two variables? Give a reason for your answer. Seen 1 Bivariate data is data which has pairs of values for two variables. 2 Correlation describes the nature of the linear relationship between two variables. 3 When two variables are correlated, you need to consider the context of the question and use your common sense to determine whether they have a causal relationship. 4 The regression line of y on x is written in the form y=a+ bx. 5 The coefficient b tells you the change in y for each unit change in x. + If the data is positively correlated, b will be positive. + If the data is negatively correlated, b will be negative. 6 You should only use the regression line to make predictions for values of the dependent variable that are within the range of the given data. 68 Probability After completing this chapter you should be able to: © Calculate probabilities for single events © Draw and interpret Venn diagrams ¢ Understand mutually exclusive and independent events, and determine whether two events are independent @ Use and understand tree diagrams Sports teams use past performance to estimate probabilities and plan strategies. In softball and baseball, a player's batting average is an estimate of the probability that By they will make a hit. > Mixed exercise Q2 ee > pages 70-72 > pages 72-75 pages 75-78 + pages 78-80 A bag contains three red balls, four yellow balls and two blue balls. A ball is chosen at random from the bag. Write down the probability that the ball is: a blue ¢ notred b yellow d green. ‘© GCSE Mathematics Three coins are flipped. Write down all the possible outcomes. + GCSE Mathematics Poppy rolls a dice. She keeps rolling until she rolls a 6. Work out the probability that Poppy rolls the dice: a exactly three times b fewer than three times © more than three times. © GCSE Mathematics Chapter 5 @) catcutating probabilities If you want to predict the chance of something happening, you use probability. ‘An experiment is a repeatable process that gives rise to a number of outcomes. An event is a collection of one or more outcomes. A sample space is the set of all possible outcomes. Where outcomes are equally likely the probability of an event is the number of outcomes in the event divided by the total number of possible outcomes. All events have probability between 0 (impossible) and 1 (certain). Probabilities are usually written as fractions or decimals. ‘Two fair spinners each have four sectors numbered | to 4. The two spinners are spun together and the sum of the numbers indicated on each spinner is recorded. Find the probability of the spinners indicating asum of: a exactly 5 b more than 5. Spinner 2 es OT aaa Spinner 1 aA ah=2=5 rae I es b Plmore than 5) = 1678 — Gan ‘The table shows the times taken, in minutes, for a group of students to complete a number puzzle. ‘Time, t (min) S 20), find the possible values of 2. @® Venn diagrams = AVenn diagram can be used to represent events graphically. Frequencies or probabilities can be placed in the regions of the Venn diagram. Frequency density oS B888 ; : s Venn diagrams are named after the English mathematician John Venn (1834-1923). Arectangle represents the sample space, S, and it contains closed curves that represent events. For events A and Bin a sample space S: 1 The event A and B 2 Theevent Aor B 3 The event not 4 s s s A B A B This event is also called the This event is also called the This event is also called the intersection of A and B.It union of A and B. complement of A. It represents represents the event that both It represents the event that the event that A does not occur. Aand Boccur. either A or B, of both, occur. P(not 4) =1- P(A) You can write numbers of outcomes (frequencies) or the probability of the events in a Venn diagram to help solve problems. Ina class of 30 students, 7 are in the choir, 5 are in the school band and 2 are in the choir and the band, A student is chosen at random from the class. a. Draw a Venn diagram to represent this information. b Find the probability that: i the student is not in the band ii the student is not in the choir or the band. 72 20- b i Astudent not in the band is not B. : am 5 P(not B) = 3076 ii P(student is not in the choir or the band) 20.2 “3 A vet surveys 100 of her clients. She finds that: 25 own dogs 15 own dogs and cats 11 own dogs and fish 53 own cats 10 own cats and fish 7 own dogs, cats and fish 40 own fish Aclient is chosen at random, Find the probability that the client: owns dogs only b does not own fish € does not own dogs, cats or fish. | 06 a Plowns dogs only) | b Pldoes not own fish) = 1-04 = 06 © Pidoes not own dogs, cats or fish) = O11} Cand Probability Put the number in both the choir and the band in the intersection of Band C. This region represents the events in the sample space that are not in C or B: 30-(3+245)=20. There are 5 + 20 = 25 outcomes not in B, out of 30 equally likely outcomes, 20 outcomes are in neither event. Secu You can use a Venn diagram with probabilities to solve this problem, but it could also be solved using the number of outcomes. There are 7 clients who own all three pets. Start with 0.07 in the intersection of all three events. Work outwards to the intersections, 0.15-0.07 = 0.08 Each of dogs only, ‘cats only’ and ‘fish only’ can bbe worked out by further subtractions: 0.53 — (0.08 + 0.07 + 0.03) = 0.35 for ‘cats only’ {As the probability of the whole sample space is 1, the final area is 1 — (0.26 + 0.04 + 0.07 + 0.03 + 0.06 + 0.08 + 0.35) =0.11 This is the value on the Venn diagram outside D, 3 Chapter 5 1 There are 25 students in a certain tutor group at Philips College. There are 16 students in the tutor group studying German, 14 studying French and 6 students studying both French and German. a Draw a Venn diagram to represent this information. b Find the probability that a randomly chosen student in the tutor group: studies French ii studies French and German studies French but not German iv does not study French or German. 2 There are 125 diners in a restaurant who were surveyed to find out if they had ordered garlic bread, beer or cheesecake: 15 diners had ordered all three items 20 had ordered beer and cheesecake 43 diners had ordered garlic bread 26 had ordered garlic bread and cheesecake 40 diners had ordered beer 25 had ordered garlic bread and beer 4 diners had ordered cheesecake a Draw a Venn diagram to represent this information. A diner is chosen at random. Find the probability that the diner ordered: b i all three items ii beer but not cheesecake and not garlic bread iii garlic bread and beer but not cheesecake _iv none of these items. 3 A group of 275 people at a music festival were asked if they play guitar, piano or drums: one person plays all three instruments _15 people play piano only 65 people play guitar and piano 20 people play guitar only 10 people play piano and drums 35 people play drums only 30 people play guitar and drums a Draw a Venn diagram to represent this information. b A festival goer is chosen at random from the group. Find the probability that the person choscn: plays the piano i plays at least two of guitar, piano and drums iii_ plays exactly one of the instruments iv plays none of the instruments. © 4 The probability that a child in a school has blue eyes is 0.27 and the probability that they have blonde hair is 0.35. The probability that the child will have blonde hair or blue eyes or both is 0.45. A child is chosen at random from the school. Find the probability that the child has: a blonde hair and blue eyes b blonde hair but not blue eyes Drawa Venn diagram . tohelp you. ¢ neither feature. sed 5 A patient going in to a doctor's waiting room reads Hiya magazine with probability 0.6 and Dakor magazine with probability 0.4. The probability that the patient reads either one or both of the magazines is 0.7. Find the probability that the patient reads: a both magazines (2 marks) b- Hiya magazine only. (2 marks) 74 Probability 6 The Venn diagram shows the probabil : : ue 4 sports club taking part in various activities. A represents the event that the member takes part in archery. () (3 B represents the event that the member takes part in badminton. ies of members of a Zz s 0.05, C represents the event that the member takes part in croquet. Given that P(B) = 0.45: a find x (1 mark) b find y. (2 marks) 7 The Venn diagram shows the probabilities that students at M. Bp H a sixth-form college study certain subjects. ‘M represents the event that the student studies mathematics. P represents the event that the student studies physics. H represents the event that the student studies history. ou Given that PM) = P(P), find the values of p and q. (4 marks) Cees The Venn diagram shows the probabilities of a group of children liking three types of sweet. 4 a OX B 02 Given that P(B) = 2P(4) and that P(not C) = 0.83, find the values of p.gandr. @ Mutually exclusive and independent events When events have no outcomes in common they are called mutually exclusive. Ina Venn diagram, the closed curves do not overlap and you can s use a simple addition rule to work out combined probabilities: 4 a = P(A) + P(B). When one event has no effect on another, they are independent. Therefore if A and B are independent, the probability of A happening is the same whether or not B happens. «= For independent events, P(4 and B) = P(A) x P(B). You can use this multiplication rule to determine whether events are independent. = For mutually exclusive events, P(4 or B) 15 Chapter 5 Events 4 and B are mutually exclusive and P(A) = 0.2 and P(B) = 0.4. b P(A but not B) eno Find: a P(4 or B) =0.24+04=06 b F(A but not B) = F(A) © Pireither A nor B) Events 4 and Bare independent and P(A) = Find P(4 and B). PIA and B) = PLA) x PUB) = 5x = ax ‘The Venn diagram shows the number of students in a particular class who watch any of three popular TV programmes, a Find the probability that a student chosen at random watches B or C or both. b Determine whether watching A and watching B are 1 statistically independent. la 44+5410+7=26 Flwatches B or C or both) = 22 = 8 0 “15 34+4_7 30 ~ 30 44+5+10_ 19 30. ~ 30 4 _2 30° 15 7 19 _ 133 PIA) x PUB) = 35 * 35 = a09 b P(A) = PCB) = F(A and B) = So F(A and B) # F(A) x P(B) Therefore watching A and watching B are not independent. 76 and P(B) = ¢ P(neither 4 nor B) 1 ¥ ecu Show your calculations and then write down a conclusion stating whether or not the events are independent. Probability a estr ®7 EP) 8 ir) 9 Events 4 and B are mutually exclusive. P(4) = 0.2 and P(B) = 0.5. a Draw a Venn diagram to represent these two events. b Find P(A or B). ¢ Find P(neither A nor B). ‘Two fair dice are rolled and the result on each die is recorded. Show that the events ‘the sum of the scores on the dice is 4’ and ‘both dice land on the same number’ are not mutually exclusive. P(A) = 0.5 and P(B) = P(4) = 0.15 and P(4 and B) = 0.045. Given that events A and B are independent, find P(B). 0.3. Given that events A and B are independent, find P(A and B). ‘The Venn diagram shows the number of children in z Zz a play group that like playing with bricks (B), z action figures (F) or trains (7). a State, with a reason, which two types of toy are 5 mutually exclusive b Determine whether the events ‘plays with bricks’ and ‘plays with action figures’ are independent. The Venn diagram shows the probabilities that a group of students like pasta (4) or pizza (B). () 03 a Write down the value of x. (1 mark) b Determine whether the events ‘like pasta’ and ‘like pizza’ are independent. (3 marks) Aas Sand T are two events such that P(S) = 0.3, P(7) = 0.4 and P(S but not 7) = 0.18. a Show that Sand Tare independent. b Find: i P(Sand T) ii P(neither S nor T). W and X are two events such that P(W) = 0.5, P(W and not Y) = 0.25 and P(neither W nor X’) = 0.3. State, with a reason, whether W and X are independent events. (3 marks) ‘The Venn diagram shows the probabilities of members of a social club taking part in charitable activities. Re A represents taking part in an archery competition. (=) R represents taking part in a rafile, > F represents taking part in a fun run. () The probability that a member takes part in the archery " P competition or the raflle is 0.6 a Find the value of and the value of y. (2 marks) b Show that events R and Fare not independent. (3 marks) A 7 Chapter 5 ©) 10 Given that A and B are independent, find the two possible values for p and q. "i Cars Aand Bare independent events in a sample space S. Given that 4 and Bare independent, prove that: a A and not B' are independent b ‘not 4’ and ‘not B’ are independent. 5.4} Tree diagrams ® Atree diagram can be used to show the outcomes of two (or more) events happening in succession. A bag contains seven green beads and five blue beads, A bead is taken from the bag at random and not replaced. A second bead is then taken from the bag. Find the probability that: a both beads are green b the beads are different colours. * 6 Draw a tree diagram to show the events, Tr_—Green % Green ‘There are now only 6 green beads and 11 beads 4 = Blue in total, : 4 ore 12 Bue Pigreen and green) = z x __ Muttiply along the branch of the tree diagram. b P(different colours) : = Pigreen then blue) + P(blve then green) —} 1 eee ranch and edd the nyo, ae phe Sig? a 25 a “12 2° ce 78 Probability 1 A bag contains three red beads and five blue beads. Paull Bend A bead is chosen at random from the bag, the colour seer Red is recorded and the bead is replaced. A second bead ced < is chosen and the colour recorded. Sie a Copy and complete this tree diagram to show et the outcomes of the experiment. — b Find the probability that both beads are blue. sess ~ Blue Blue ¢ Find the probability that the second bead is blue. 2. A box contains nine cards numbered | to 9. A card is drawn at random and not replaced. It is noted whether the number is odd or even. A second card is drawn and it is also noted whether this number is odd or even. a Draw a tree diagram to represent this experiment. { Hint ) Tie eee erence b Find the probability that both cards are even. ¢ Find the probability that one card is odd and the other card is even. 3 The probability that Charlie takes the bus to school is 0.4. If he doesn’t take the bus, he walks. The probability that Charlie is late to school if he takes the bus is 0.2. The probability he is late to school if he walks is 0.3. a Draw a tree diagram to represent this information b Find the probability that Charlie is late to school. © 4 Mr Dixon plays golf. The probability that he scores par or under on the first hole is 0.7. If he scores par or under on the first hole, the probability he scores par or under on the second hole is 0.8. If he doesn’t score par or under on the first hole, the probability that he scores par or under on the second hole is 0.4. a Draw a tree diagram to represent this information. (3 marks) b State whether the events ‘scores par or under on the first hole’ and ‘scores par or under par on the second hole’ are independent. (1 mark) ¢ Find the probability that Mr Dixon scores par or under on only one hole. (3 marks) 5A biased coin is tossed three times and it is recorded whether it falls heads or tails. t P(heads) = + a. Draw a tree diagram to represent this experiment, (3 marks) b Find the probability that the coin lands on heads all three times. (1 mark) ¢ Find the probability that the coin lands on heads only once. (2 marks) The whole experiment is repeated for a second trial, Find the probability of obtaining either 3 heads or 3 tails in both trials. (3 marks) 19 Chapter 5 6 A bag contains 13 tokens, 4 coloured blue, 3 coloured red and 6 coloured yellow. Two tokens are drawn from the bag without replacement. a Find the probability that both tokens are yellow. (2 marks) A third token is drawn from the bag. b Write down the probability that the third token is yellow, given that the first two are yellow. (1 mark) ¢ Find the probability that all three tokens are different colours. (4marks) Mixed exercise @) 1 There are 15 coloured beads in a bag: seven beads are red. three are blue and five are green. ‘Three beads are selected at random from the bag and replaced. Find the probability that: a the first and second beads chosen are red and the third bead is blue or green (3 marks) b one red, one blue and one green bead are chosen. (3 marks) 2. A baseball player has a batting average of 0.341. This means her probability of making a hit when she bats is 0.341. She bats three times in one game, Estimate the probability that: a. she makes three hits b she makes no hits ¢ she makes at least one hit. @®3 The scores of 250 students in a test are recorded Sanye Feqecey || Freeaney ina table, (dual) (female) ‘One student is chosen at random, 20 = s<25 7 8 a Find the probability that the student is female. 25<5<30 15 B b Find the probability that the student scored 30<5<35 18 19 less than 35 35 <5<40 25 30 ¢ Find the probability that the student is a male M0 =5<45 30 26 and scored between 25 and 34. 45 =s<50 27 32 In order to pass the test, students must score 37 or more. d Estimate the probability that a student chosen at random passes the test. State one assumption you have made in making your estimate. 4 The histogram shows the distribution of masses, in kg, of 50 newborn babies. po a Find the probability that a baby 5 7 chosen at random has a mass oa greater than 3kg. (2 marks) zn b Estimate the probability that a baby chosen at random has a mass less than 3.75 kg (3 marks) rT 2 3 4 35 6 @F ‘Mass (kg) 80 ®5 Probability A study was made of a group of 150 children to determine which of three cartoons they watch on television, The following results were obtained: 35 watch Toontime 14 watch Porky and Skellingtons 54 watch Porky 12 watch Toontime and Skellingtons 62 watch Skellingtons 4 watch Toontime, Porky and Skellingtons 9 watch Toontime and Porky a Draw a Venn diagram to represent this data. (4 marks) b Find the probability that a randomly selected child from the study watches: i none of the three cartoons (2 marks) ii no more than one of the cartoons. (2 marks) ;- P(A or Bor both) = 3. The events 4 and Bare such that P(A) a_ Represent these probabilities on a Venn diagram. b Show that 4 and B are independent. The Venn diagram shows the number of students who like e £ s either cricket (C), football (F) or swimming (S). a Which two sports are mutually exclusive? (mark) () () b Determine whether the events ‘likes cricket’ and ‘likes football’ are independent. (3 marks) i For events J and K, P(J or K or both) = 0.5, P(K but not J) = 0.2 and P(J but not K) = 0.25. a Draw a Venn diagram to represent events J and K and the sample space S. (3 marks) b Determine whether events J and K are independent (3 marks) A survey of a group of students revealed that 85% have a mobile phone, 60% have an MP3 player and 5% have neither phone nor MP3 player. a Find the proportion of students who have both gadgets. (2 marks) b Draw a Venn diagram to represent this information, (3 marks) ¢ A student is chosen at random. Find the probability that they only own a mobile phone. (2 marks) d_ Are the events ‘own a mobile phone’ and ‘own an MP3 player’ independent? Justify your answer. (3 marks) The Venn diagram shows the probabilities that a group of 5 ni children like cake (A) or crisps (B). Determine whether the events ‘like cake’ and ‘like crisps’ are independent. (3 marks) O.ls 81 Chapter 5 11 A computer game has three levels and one of the objectives of every level is to collect a diamond. The probability that Becca collects a diamond on the first level is 3, the second level is 3 and the third level is }. The events are independent a Draw a tree diagram to represent Becca collecting diamonds on the three levels, of the game. (4 marks) b Find the probability that Becca: i collects all three diamonds (2 marks) ii collects only one diamond. (3 marks) ¢ Find the probability that she collects at least two diamonds each time she plays. (3 marks) ® 22 Ina factory, machines 4, B and C produce electronic components. Machine 4 produces 16% of the components, machine B produces 50% of the components and machine C produces the rest. Some of the components are defective. Machine 4 produces 4%, machine B 3% and machine C 7% defective components. a Draw a tree diagram to represent this information. b Find the probability that a randomly selected component is: i produced by machine B and is defective defective. mers The members of a cycling club are married couples. For any married couple in the club, the probability that the husband is retired is 0.7 and the probability that the wife is retired 0.4, Given that the wife is retired, the probability that the husband is retired is 0.8. Two married couples are chosen at random, Find the probability that only one of the two husbands and only one of the two wives is retired, Ate dane 1 AVenn diagram can be used to represent events graphically. Frequencies or probabilities can be placed in the regions of the Venn diagram. 2 For mutually exclusive events, P(A or B) = P(A) + P(B). 3 For independent events, P(A and B) = P(A) x PCB). 4 Atree diagram can be used to show the outcomes of two (or more) events happening in succession. 82 After completing this chapter you should be able to: © Understand and use simple discrete probability distributions including the discrete uniform distribution > pages 84-88 © Understand the binomial distribution as a model and comment on appropriateness page 88 © Calculate individual probabilities for the binomial distribution “+ pages 89-91 © Calculate cumulative probabilities for the binomial distribution You can use probability to model real-life events. If an archer fires a set of arrows at a target, the number of bullseyes can be modelled using a binomial distribution. ~ Mixed exercise Q15 > pages 91-94 taal 1 Three coins are flipped. Calculate the probability that: a all the coins land on tails b all the coins land on heads ¢ exactly one of the coins lands on tails d at least two coins land on heads « chapter 5 2 Two fair dice are rolled. Calculate the probability that the sum of the scores on the dice is: a five b even « odd da multiple of 3 e a prime number. € Chapter 5 ma \ Chapter 6 @) Probability distributions A random variable is a variable whose value C= Random variables are written using depends on the outcome of a random event. Rernet ee reer + The range of values that a random variable Tre peticlane ter seendomtan icc can take is called its sample space Raia ie Water iain eatna ead aera + Avariable can take any of a range of specific jetters, for example x ory: values. + The variable is diserete if it can only take certain numerical values. + The variable is random if the outcome is not known until the experiment is carried out. ® Aprobability distribution fully describes the probability of any outcome in the sample space. The probability distribution for a discrete random variable can be described in a number of different ways. For example, take the random variable .X = score when a fair dice is rolled’ It can be described: + asa probability mass function: P(Y =x) ==, =1,2,3,4,5,6 The probability that the random variable X takes a particular value x is written as P(Y'= 2), + using a table: x P(X=x) N 4 we jor] we ete] a Jot] ov T 6 + usingadiagram: — PLW=) 1 6 123 456% All of these representations show the probability that the random variable takes any given value in its sample space. When all of the probabilities are the same, as in this example, the distribution is known as a discrete uniform distribution. ‘Three fair coins are tossed. a Write down all the possible outcomes when the three coins are tossed. A random variable, X, is defined as the number of heads when the three coins are tossed. b_ Write the probability distribution of Xas: iatable — ii a probability mass function. HHH, HHT, HTH, HTT, THH, THT, TTH, TTT BIN Ne: etiheadsy 6 [70°] 1 LBs ae | ii fo x=0,3 oO otherwise Statistical distributions = The sum of the probabilities of all outcomes of an event add up to 1. For a random variable X, you can write P(X = x) = 1 forall x. A biased four-sided dice with faces numbered 1, 2, 3 and 4 is rolled. The number on the bottom- most face is modelled as a random variable X. Given that P(Y = a Find the value of k. b Give the probability distribution of X in table form. © Find the probability that: iX>20 0 il4 a The probability distribution will be: ecu Write an equation and solve it to find the value b The probability distribution is: of k, Then substitute this value of kinto = EERE Pox= x)= for cach x to find the probabilitics, rxnn |3e| Z| |e <1 > 2 heme ting or 4 so eae SS rarsa=te SZ li 1< X <4 is the same as getting 2 or 3 6,4 _10_2 Mist's) 35 t257 5" 5 This random variable only models it There are no elements in the sample the behaviour of the dice. The outcomes from space that satisfy X > 4 so ‘experiments in real life will never exactly fit the Feats ‘model, but the model provides a useful way of analysing possible outcomes. 85 Chapter 6 This spinner is spun until it lands on red or has been spun four times in total. Find the probability distribution of the random variable S, the number of times the spinner is spun. | eeu Read the definition of the random variable P(S = 1) is the probability that the spinner carefully. Here it is the number of spins. lands on red the first time: FIS= 5 If the spinner lands on red on the second spin it must land on blue on the first spin: = 3.2.6 a aves —— ieee Likewise for landing on red on the third spin: 332.18 5°5°5 125 ‘The experiment stops after 4 spins so: (2 by 15 27, PIS = 3) PIS = 4) 125) > 125 * 1 2 3 4 2 6 18 | 27 PS=9 | 5 | 25 | 5 | 125 13 1 Write down whether or not each of the following is a discrete random variable. Give a reason for your answer. a The height, Vem, of a seedling chosen randomly from a group of plants. b The number of times, R, a six is rolled when a fair dice is rolled 100 times. ¢ The number of days, W, ina given week. 2. A fair dice is thrown four times and the number of times it falls with a 6 on the top, Y, is noted. Write down the sample space of Y. 3 A bag contains two discs with the number 2 on them and two discs with the number 3 on them. A disc is drawn at random from the bag and the number noted. The disc is returned to the bag. A second dise is then drawn from the bag and the number noted. a Write down all the possible outcomes of this experiment. The discrete random variable Y'is defined as the sum of the two numbers. b Write down the probability distribution of Xas: ia table iia probability mass function. 86 Statistical distributions 4 A discrete random variable X has the 1)2)3 )4 probability distribution shown in the table. me rm T Z T Find the value of &. se _| 5 ls s § The random variable V has a probability function P(Y=x)=kx 1,2,3,4, Show that k= 35. (2 marks) 6 The random variable X has a probability function kx xwild Purs= {fF -1l) x=2,4 where k is a constant. a Find the value of k. (2 marks) b Find P(Y> 1). (2 marks) ® 7 The discrete random variable X has a probability function 01 x=-2,-1 P(X¥=x)=49 x=0,1 02 x=2 a Find the value of f. b Construct a table giving the probability distribution of ¥. ¢ Find P(-1 = ¥ < 2). © 8 A discrete random variable has a probability oe 0 1 2 distribution shown in the table rasa | dna . = Find the value of a. ® 9 The random variable X can take any integer value from I to 50. Given that X has a discrete uniform distribution, find: a P(Y=1) b PLY > 28) © P(3< ¥<42) e © 10 A discrete random variable X has the x probability distribution shown in this table. Find: a P< ¥<3) (1 mark) b PY <2) (1 mark) © P(Y>3) (1 mark) P(X= x) a 87 Chapter 6 11 A biased coin is tossed until a head appears or it is tossed four times. If P(Head) = 2. a Write down the probability distribution of S, the number of tosses, in table form. (4 marks) b Find P(S > 2). (1 mark) ® 12 A fair five-sided spinner is spun. Given that the spinner is spun five times, write down, in table form, the probability distributions of the following random variables: aX, the number of times red appears bY, the number of times yellow appears. ‘The spinner is now spun until it lands on blue, or until it has been spun five times. ‘The random variable Z is defined as the number of spins in this experiment. ¢ Find the probability distribution of Z. €/P) 13 Marie says that a random variable X has a probability distribution defined by the following probability mass function: Z PIW= a2, x= 2,34 a Explain how you know that Marie's function does not describe a probability distribution. (2 marks) b Given that the correct probability mass function is in the form PL =x) =, y=2,3.4 where k is a constant, find the exact value of k (2 marks) Ctirs qpD Nand Yare independent so the The independent random variables X and Y have probability distributions, Rie aren beers P(X=x)= 4, x =1,2,3,4,5,6,7,8 ere 8 probabilities for the Find PX > ¥). other. @ The binomial distribution When you are carrying out a number of trials in an experiment or survey, you can define a random variable to represent the number of successful trials. ™ You can model X with a binomial distribution, B(n, p), if: © there are a fixed number of trials, 1 « there are two possible outcomes (success and failure) EBD i Prsuccess) = p and © there is a fixed probability of success, p AUER TECAP ASUS then P(Failure) = 1 - p. @ the trials are independent of each other 88 Statistical distributions = Ifa random variable X has the binomial distribution xX. B(n, p) then its probability mass function is given by CED vow ite x2. P(X = 1) = ("Aorta — py" nis sometimes called the index and p is sometimes called Hin the parameter. Itis sometimes written as Cr You can use your calculator to work out binomial probabilities. oes ie pees ee ar You can either use the rule given above, together with the "C, eye Of selec Une 2 success function, or use the binomial probability distribution function fe Ob prouan iy: ute ‘0 Pure Year 1, Chapter 8 directly. The random variable ¥ ~ B(12, ¢). Find: a P(¥=2) b P(Y=9) ¢ PX¥S1) + varey=(\(2) (2) =a (2) (2) -— Wsetefomuawtn= ep fands=2 = 0.29609. = 0.296 (3 sf) » v=) (8)(2) (2) Weatherman ipaande=, = Q.0000126 (3 si) © PKS 1)=P(X=0) + X=) =O) = 012156... + 0.26917... = 0.36133... = 0.381 (3 5) {ontine } Use the "C, function on your: bs calculator to work out binomial probabilities. The probability that a randomly chosen member of a reading group is left-handed is 0.15 A random sample of 20 members of the group is taken. a Suggest a suitable model for the random variable X, the number of members in the sample who are left-handed. Justify your choice. b Use your model to calculate the probability that: i exactly 7 of the members in the sample are left-handed fewer than two of the members in the sample are left-handed. 89 Chapter 6 a The random variable can take two values, left-handed or right-handed. There are a fixed number of trials, 20, and a fixed probability of success: 0.15. Assuming each member in the sample is independent, a suitable model is X ~ (20, 0.15) z CEMDD Worn octcrcetyusng fa bi r= 7)= (70) x orsyossy? the binomial probability distribution OC. function on your calculator and entering seme xX =7,n=20and p=0.15. i PY <2) (¥ =O) + P(Y = 1) = O176 (3 5.£.) 1a « © 1 The random variable ¥ ~ B(8, 4). Find: a P(X =2) b P(X=5) © Y= 1) 2. The random variable 7 ~ B(15, 3). Find: a P(T=5) b P(T=10) © PB 15) - FIX < 14) - 0.9964 = 0.0016 When questions are set in context there are different forms of words that can be used to ask for probabilities. The correct interpretation of these phrases is critical, especially when dealing with cumulative probabilities. The table below gives some examples. Phrase Means Calculation ... greater than 5... X>5 1-P(¥<5) ++ No more than 3... X<3 P(X = 3) «atleast 7... X27 1-P(¥ <6) -fewer than 10 ¥<10 PY <9) ..at most 8 ¥<8 PIV <8) A spinner is designed so that the probability it lands on red is 0.3. Jane has 12 spins, Find the probability that Jane obtains: a no more than 2 reds b atleast 5 reds. Jane decides to use this spinner for a class competition. She wants the probability of winning a prize to be < 0.05. Each member of the class will have 12 spins and the number of reds will be recorded. © Find how many reds are needed to win a prize. 92

You might also like