0% found this document useful (0 votes)
68 views

ACtuarial CS1 2019 booklet core reading

The document outlines the Core Reading for the CS1 Actuarial Statistics 1 examination, detailing the syllabus and topics covered, including data analysis, random variables, and statistical methods. It emphasizes the importance of reproducibility in research and the legal and ethical considerations in data handling. Additionally, it provides guidelines for the data analysis process and the characteristics of big data.

Uploaded by

parnika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
68 views

ACtuarial CS1 2019 booklet core reading

The document outlines the Core Reading for the CS1 Actuarial Statistics 1 examination, detailing the syllabus and topics covered, including data analysis, random variables, and statistical methods. It emphasizes the importance of reproducibility in research and the legal and ethical considerations in data handling. Additionally, it provides guidelines for the data analysis process and the characteristics of big data.

Uploaded by

parnika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 157
Institute and Faculty of Actuaries re Subject CS1 Actuarial Statistics 1 Core Principles Core Reading for the 2021 exams Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors. Core Reading is intended for the exclusive use of the purchaser and the Institute and Faculty of Actuaries do not permit it to be by another party, copied, electronically transmitted or published on a website without prior permission being obtained. Legal action will be taken if these terms are infringed. In the case of a member of the Institute and Faculty of Actuaries, we may to take disciplinary action through the Disciplinary Scheme of the institute and Faculty of Actuaries. These conditions remain in force after the Core Reading has been superseded ty a later edition. Subject CS1 core reading Contents Accreditation Introduction Unit 1 Unit 2 Unit 3 Unit 4 Unit 5 Unit 6 Unit 7 Unit 8 Unit 9 Unit 10 Unit 11 Unit 12 Unit 13 Unit 14 Unit 15 Data and basics of modelling Random variables and distributions Generating functions Joint distributions Conditional expectation The Central Limit Theorem Random sampling and sampling distributions Estimation and estimators Confidence intervals Hypothesis testing Exploratory data analysis Linear regression Generalised linear models Bayesian statistics Credibility theory Syllabus with cross referencing to Core Reading Subject CS1 core reading Accreditation The Institute and Faculty of Actuaries would like to thank the numerous people who have helped in the development of the material contained in this Core Reading. Introduction The Core Reading manual has been produced by the Institute and Faculty of Actuaries. The purpose of the Core Reading is to ensure that tutors, students and examiners understand the requirements of the syllabus for the qualification examinations for Fellowship of the Institute and Faculty of Actuaries. The examinations require students to demonstrate their understanding of the concepts given in the syllabus and described in the Core Reading; this will be based on the legisation, professional guidance etc. which are in force when the Core Reading is published, Le. on 31 May in the year preceding the examinations. Examiners will have this Core Reading manual when setting tt papers. In preparing for examinations students are advised to work through past examination questions and may find addition tuition helpful. The manual will be updated each year to reflect changes in the syllabus and current practice, and in the interest clarity. CS] core reading: References S2 Risk Modelling and Survival Analysis CT3 Probability and Mathematical Statistics CTE Statistical Methoos McCullagh, P. & Nelder, J.A, 1989. Generalized Linear Models. 2nd ed. Chapman & Hall/CRC. Knuth, D., 1992, Literate Programming, California: Stanford University Center for the Study of Language and Information. ISBN 978-0-937073-80-3, Peng, R.D,, 2016, Report Writing for Data Science in R, www.Leanpub.com/reportwriting Subject CSI 12 124 12.2 Data and basics of modelling Unit 1- Data and basics of modelling Syllabus objectives 21, Dataanalysis, 211 Describe the possible aims of data analysis (e.9. descriptive, inferential, and predictive) 212. Describe the stages of conducting a data analysis to solve real-world problems in a scientific manner a describe teols suitable for each stage. 213. Describe sources of data and explain the characteristics of different data sources, including extremely large data sets 214 Explain the meaning and value of reproducible research and describe the elements required to ensure data analysis is reproducible. Data analysis Introduetion Data analysis is the process by which data is gathered in its raw state and analysed or processed into information which can be used for specific purposes. This section will describe some of the different forms of data analysis, th steps involved in the process and consider some of the practical problems encountered in data analytics. Aims of a data analysis: Three keys forms of data analysis will be covered in this section: + descriptive; + inferential; and + predictive. Descriptive analysis Data presented in ts raw state can be difficult to manage and draw meaningful conclusions from, particularly wh there is a large volume of data to work with. A descriptive analysis solves this problem by presenting the data in simpler format, more easily understood and interpreted by the user. ‘Simply put, this might involve summarising the data or presenting it in a format which highlights any patterns or trends. A descriptive analysis is not intended to enable the user to draw any specific conclusions. Rather, it descri the cata actually presented. ‘Two key measures, or parameters, used in a descriptive analysis are the measure of central tendency and the dispersion. The most common measurements of central tendency are the mean, the median and the mode. Typic: measurements of the dispersion are the standard deviation and ranges such as the interquartile range. It can also be important to describe other aspects of the shape of the (empirical) distribution of the data, for exar by calculating measures of skewness and kurtosis, Inferential analysis Often itis not feasible or practical to collect data in respect of the whole population, particularly when that population is very large. For example, when conducting an opinion poll ina large country, it may net be cost effec to survey every citizen. A practical solution to this problem might be to gather data in respect of a sample, which Used to represent the wider populetion. The anelysis of the data from this sample is called inferential analysis The sample analysis involves estimating the parameters as described in 1.2.1 above and testing hypotheses. It is generally accepted that if the sample is large and taken at random (selected without prejudice), then it quite accurately represents the statistics of the population, such as distribution, probability, mean, standard deviation, However, this is also contingent upon the user making reasonably correct hypotheses about the population in ord to perform the inferential analysis. Subject CSI 12.3 us 14 Data and basics of modelling Predictive analysis Predictive analysis extends the principles behind inferential analysis in order for the user to analyse past data and make predictions about future events. It achieves this by using an existing set of data with known attributes (also knownas features), known as the tral set in order to discover potentially predictive relationships. Those relationships are tested using a different sot of data, known as the test set, to assess the strenth of those relationships. A typical example of a predictive analysis is regression analysis, which is covered in more detail later. The simples form of this is linear regression where the relationship between a scalar dependent variable and an explanatory independent variable 's assumed to be linear and the training set is used to determine the slope and intercept of Ine. A practical example might be the relationship between a car's braking distance against speed, The data analysis process While the process to analyse data does not follow a set pattern of steps, itis helpful to consider the key stages w/ might be used by actuaries when collecting and analysing data: The key steps in a data analysis process can be described as follows: 1 Develop a well-defined set of objectives which need to be met by the results of the data analysis. Identify the data items required for the anelysis. Collection of the date from appropriate sources. Processing and formetting data for analysis, e.¢. inputting into a spreadsheet, database or other model Cleaning data, e.g. addressing unusual, missing or inconsistent values. Exploratory data analysis, which may include: 2. Descriptive analysis; producing summary statistics on central tendency and spread of the data. b. Inferential analysis; estimating summary parameters of the wider population of deta, testing hypotheses Predictive analysis; analysing data to make predictions ebout future events or other data sets. 7. Modelling the data. 8. Communicating the results 9. Monitoring the process; updating the data and repeating the process if required, Throughout the process, the modelling team needs to ensure that any relevant professional guidance has been complied with. For exampie, the Financial Reporting Council has issued a Technical Actuarial Standard (TAS) ont principles for Technical Actuarial Work (TASIOO) which includes principles for the use of data in technical actuari work. Knowledge of the detail ofthis TAS is not required for CSI. Further, the modelling team should also remain aware of any legal requirement to be complied with. Such legal requirement may include aspects around consumer/customer data protection and gender discrimination, Data sources Step 3 of the process described in Section 13 above refers to collection of the data needed to meet the objective of the analysis from appropriate sources. As consideration of Steps 3, 4, and 5 mekes clear, getting data into a fo ready for analysis is a process, not a single event. Consequently, what is seen as the source of data can depend fn your viewpoint. Suppose you are conducting an analysis which involves collecting survey data from a sample of people in the hope of drawing inferences about a wider population. If you are in charge of the whole process, including collecting the primary data from your selected sample, you would probably view the ‘source’ of the det as being the people in your sample. Having collected, cleaned and possibly summarized the data you might make it aveilable to other investigators in JavaScript object notation (JSON) format via a web Application programmin« interface (AP). You will then have created a secondary ‘source’ for others to use, In this section, we discuss how the characteristics of the data are determined both by the primary source and the steps carried out to prepare it for analysis - which may include the steps on the journey from primary to seconda Subject CSI 144 Data and basics of modelling Primary data can be gathered as the outcome of a designed experiment or from an observational study (which could include a survey of responses to specific questions). n all cases, knowledge of the details of the collection process is important for a complete understanding of the data, inclucing possible sources of bias or inaccuracy. Issues that the analyst should be aware of include: + whether the process was menual or automated; + limitations on the precision of the data recorded; + whether there was any validation at source; and + if data wasn't collected automatically, how was it converted to an electronic form. Where randomization has been used to reduce the effect of bias or confounding variables it is important to know sampling scheme used: + simple random sampling; + stratified sampling; or + another sampling method, Data may have undergone some form of pre-processing. A common example is grouping (e.9. by geographical a or age band). In the past, this was often done to reduce the amount of storage required and to make the number of calculations manageable. The scale of computing power available now meens that this is less often an issue, bu data may still be grouped: perhaps to anonymise it, or to remove the possibility of extracting sensitive (or perhap commorcially sensitive) details Other aspects of the data which are determined by the collection process, and which affect the way it is analysed include the following: + Cross-sectional data involves recording values of the variables of interest for each case in the sample at a sing! ‘moment in time. + Longitudinal data involves recording values at intervals over time, + Consored data occurs when the value of a variable is only partially known, for cxampla, if a subject in a surviva study withdraws, o survives beyond the end of the study: here a lower bound for the survival period is known | the exact value isn’t. + Truncated data occurs when measurements on some variables are not recorded so are completely unknown. Big data The term big data is net well defined but has come tobe used to describe data with characteristics that make it impos to apply traditional methods of analysis (for example, those which rely on a single, wel-structured data set which can manipulated end analysed on a single computer). Typically, this meens automatically collected data with characteristics that have to be inferred from the data itself rather than known in advance from the design of an experiment. Given the description above, the properties that can lead data to be classified as ‘big’ include: + size, not only does big data include 2 very large number of individual cases, but each might include very many variables, a high proportion of wiich might have empty (or nul) values - leading to sparse data; + speed, the data to be analysed might be arriving in real time at a very fast rate - for example, from an array of ‘sensors taking measurements thousands of time every second; + variety, big data is often composed of elements from many different sources which could have very different structures - or is often largely unstructured; + reliability, given the above three characteristics we can see that tho reliability of individual data elements migh be difficult to ascertain and could vaty over time (for example, an internet connected sensor could go offline fo period), Although the four points above have been presented in the context of big data, they are characteristics that shou be considered for any data source. For example, an actuary may need to decide if it is advisable to increase the volume of data available for a given investigation by combining an internal data set with data available externally. Subject CSI 14.2 Data and basics of modelling Data security, privacy and regulation In the design of any investigation consideration of issues releted to data security, privacy and complying with relevant regulations should be parsmount. It is especially important to be aware that combining different data fro different ‘anonymized! sources can mean that individual cases become identifiable. Another point to be aware of is that just because data has been made available on the internet, doesn’t mean tha that others are free to use it as they wish. This is a very complex area and laws vary between jurisdictions. Reproducible research Students may wish to refer to Peng (2016) in the references for more details. The meaning of reproducible research Reproducibility refers to the idea that when the results of a statistical analysis are reported, sufficient informatior provided so that an independent third party can repeat the enalysis and arrive at the same results. In science, reproducibility is linked to the concept of replication which refers to someone repeating an experimer and obtaining the same (or at least corsistent) results. Replication can be hard, or expensive or impossible, for example it: + the study is big: + the study relies on data collected at great expense or over many years; or + the study is of a unique occurrence (the standards of healthcare in the aftermath of a particular event). Due to the possibe difficulties of replication, reproducibility of the statistical analysis is often a reasonably alternz standard, Elements required for reproducibility Typically, reproducibility requires the original data and the computer code to be made available (or fully specific so that other people can repeat the analysis and verify the results. In all but the most trivial cases, it will be necessary to include full documentation (eg. description of each data variable, an audit trail describing the decisions made when clezning and processing the data, and full documented code). Documentation of models i covered in subject CP2, Full documented code can be achieved through literate statistical programming (as defined by Knuth, 1992) whe the program includes an explanaticn of the program in plain language, interspersed with code snippets. Within th environment, a tool which allows this is R-markdown, Although not strictly required to meet the definition of reproducibility, a good version control process can ensure evolving drafts of code, documentation and reports are kept in alignment between the various stages of development and review, and changes are reversible if necessary. There are many tools that are used for version control. A popular tool used for version control is git, In addition to version control, documenting the software environment, the computing architecture, the operating system, the software toolchain, external dependencies and version numbers can all be important in ensuring repreductility. ‘Asan example, in the R programming language, the command > sessioninfo() provides information about the operating system, version of Rand version of all R packages being used. Where there is randomness in the statistical or machine learning techniques being used (for example random forests or neural networks) or where simulation is used, replication will require the random seed tobe sot, Subject CSI Data and basics of modelling Doing things ‘by hand is very likely to create problems in reproducing the work. Examples of doing things by han are: ‘+ manually editing spreadsheets (rather than reading the raw data into a programming environment and making changes there); editing tables and figures (rather than ensuring that the programming environment creates them exactly as needed); + downloading data manually from a website (rather than doing it programmatically); and + pointing and clicking (unless the software used creates an audit trail of what has been clicked). ‘The value of reproducibility Many actuarial analyses aro undertaken for commercial, not scientific, reasons and are not published, but repreducbility is still valuable: + reproducibility is necessary for a complete technical work review (which in many cases will be a professional requirement) to ensure the analysis has been correctly carried out and the conclusions are justified by the data analysis; reproducibility may be required by external regulators and auditors; reproducible research is more easily extended to investigate the effect of changes to the analysis, or to incorpo new data; itis often desirable to compare the results of an investigation with a similar one carried out in the past; if the e investigation was reported reproducibly an analysis of the differences between the two can be carried out with confidence; tthe discipline of reproducible research, with its emphasis on good documentation of processes and data storag can lead to fewer errors that need correcting in the original work and, hence, greater efficiency. ‘There are some issues that reproducibility does not address: + Reproducibility does not mean that the analysis is correct. For example, if an incorrect distribution is assumed, results may be wrong - even though they can be reproduced by making the same incorrect essumption about t distribution. However, by making clear how the results are achieves, It does allow transparency so thet incorrec analysis can be appropriately challenged. + If activities involved in reproducibility happen only at the end of an analysis, this may be too late for resulting challenges to be dealt with. For example, resources may have been moved on to other projects. END Subject CSI Random variables and distributions Unit 2 - Random variables and distributions Syllabus objectives 11 Define basic univariate distributions and use them to calculate probabilities, quantiles and moments. 1 Define and explain the key characteristics of the discrete cistributions: geometric, binomial, negative binomial, hypergeometric, Poisson and uniform on a finite set. 112. Define and explain the key characteristics of the continuous distributions: normal, lognormal, exponen gamma, chi-square, , F, beta and uniform on an interval. 113. Evaluate probabilities and quantiles associated with distributions (oy calculation or using statistical software as appropriate), 11.4. Define and explain the key characteristics of the Poisson process and explain the connection between Poisson process and the Poisson distribution, 11.5 Generate basic discrete and continuous random variables using the inverse transform method. 16 Generate aiscrete and continuous random variables using statistical software. 1 Introduction This unit introduces the standerd distributions that are used in actuarial work. 2 Discrete distributions The distributions considered here are all madels for the number of something (number of “successes”, number of trials’, number of deaths, number of claims, etc). The values assumed by the variables are integers in the set (0, 1,2,3,..}. These are often referted to as counting variables 21 Uniform distribution Sample space: $= (1, 2, 3.....8} Probability measure: equal assignment (1/) to all outcomes, ie. all outcomes equally likely. Random variable X defined by: (i) (ae 2e k) _ ERUEED) _e Moments: = EL ~ (PHP ee) PRED ED GADD FLX? k k 6 code for simulating a random sample from the discrete uniform distribution. Generate a vector for sample space S: Tea see $= 1:20 Simulate 100 values: sample (8, 100, replace ~ TRUE) 22 Bernoulli distribution ‘A Bernoulli trial is an experiment which has (or can be regarded as having) only two possible outcomes s (“succes and / (“failure”) Subject CSI Random variables and distributions 23 Probability measure: P((s})=p, PUS))=1-p + X(f)=0-Xis the number of successes which occur (0 oF 1), Distribution: P(x x=Db Ox-+m | > n). Given that there have already been n trials without a success, what is the probability that moro than x additional trials aro required to got a success? The intersection of the events “X > n” and “X >x +a" Isjust“X > x42", So P(X > xn) (=p) Pn) =p PUX> xen | X>m)= (py = POX >) ie, just the same as the originel probability that more than x trials are required. The lack of success on the first » trials is irrelevant - under this model the chances of success are no better kecau there has been a run of bad luck. This characteristic ~ a reflection of the “independent, identical trials” structure ~ is important, and is referred to as "memoryless" property. Another formulation of the geometric distribution is sometimes used. Let ¥ be the number of failures before the first success. Then P(Y = »)= p(l=p)’, y=0,1,2,3,... with mean X -1, where Wis defined as above. The R code for simulating values and calculating probabilities and quantiles from the geometric distribution is similar to the R code used for the binomial distribution using the R functions rgeom, dgeom, pgeom and ageon, For example: dgeom(10, 0.3) calculates the probability P(Y ~10) for p03. Negative binomial distribution This is a generalisation of the geometric distribution The random variable is the number of the trial on which the &-th success occurs, where kis a positive integer. Distribution: P(X. xl Dp xk keh Oe pel aire ” p PUL acl » for which P(success) = p= k/N is kept fixed, Hence, the binomial, wi p =k/N, provides a good approximation to the hypergeometric when WV is large compared to 7. Subject CSI 27 Random variables and distributions The R code for simulating values and calculating probabilities and quantiles from the hypergeometric distribution is similar to the R code used for other distributions using the R functions rhyper, dhyper, phyper and ghyper. For example: rhyper (20, 15, 10, 5) simulates 20 values from samples of size 5 from a population in which k= 1S and N-k=10. Poisson distribution This distribution models the number of events which occur in a specitied interval of time, when the events occur after another in time in a well-defined manner. This manner presumes that the events occur singly, at a constant r and that the numbers of events which occur in separate (Le. non-overlapping) time interva's are independent of another: These conditions can be described loosely by saying that the events occur “randomly, at a rate of .. por and such events are said to occur according to a Poisson process. We will formally define this in Section 4 of this Another approach to the Poisson distribution uses arguments which appear at first sight to be unrelated to the above. Consider a sequence of binomial (n,p) distributions as n > and p-»0 together, such that the mean np held constant at the value A. The limit leads to the distribution of the Poisson variable, with parameter 2. Distribution: P(x =x) -FE JE=0 12.5220 PU =x) ie PUK =x-1) Moments: Since the binomial mean is held constant at 2. through the limiting process, it is reasonable to sugc tthat the distribution of 2 (the limiting distribution) also has mean A ~ this isin fact the case. The binomial variance is np(1 — p) = n(A/n)(1 ~ A/n) = A(1 —A/n) —> has n> 0, This suggests that X ha variance 2. - this is also the case. Sop= The Poisson distrioution provides a very good approximation to the binomial when » is large and p is small - typi applications have n = 100 or more and p = 0.05 or less. The approximation depends only on the product np (= %) - individual values of n and p are irrelevant. So, for example, the value of P(= =) in the case n = 200 and p = 0.02 is effectively the same as the value of P(Y=x) in the case n = 400 and p= 0.01. When dealing with large numbers of opportunities for the occurrence of “rare” events (under “binomial assumptions”), the distribution of the number which occur doponds only on the expected number which occur. When events are described as occurring “as a Poisson process with rate 2” or “randomly, at a rate of & per unit tir then the number of events which occur in a time period of lenath s has a Poisson distrioution with mean i. The R code for simulating velues and calculating probabilities and quantiles from the Poisson distribution is similar to the R code used for other distributions using the R functions rpois, dpois, ppois and apois. For example, to calculate P(X <5) =0.9432683 for A= 2.7 use the R code: ppois(5, 2.7) Continuous distributions Uniform distribution takes values between two specified numbers o and 6 say, Probability (density) function (PDF): f(x), nex 0} First note that the gamma function (a) is defined for c.> 0 as follows Fa)= fire" ae Note in particular that PI) =1, P(a)= (DM (@=1) for > (ie. if ais an integer Ta) =(a-D!),and F('4) =v The R code for the gamma function (a) is gamma (n) Probability (density) function: the PDF of the gamma distribution with parameters @ and A is defined by Fe le forx>d Ta)* The R code for simulating a random sample of 100 values from the gamma distribution with a = 2.and 2 = 0.25: rgamma(100, 2, 0.25) Similarly, the PDF, cumulative distribution function (CDF) and quantiles can be obtained using the R functions dgamma, pgamna and aganma. ‘Special case 1: exponential distribution Gamma with « Probability (density) functi Sas o Subject CSI Random variables and distributions puay=fpea= The exponential cistribution is used as a simple model for the lifetimes of certain types of equipment. Very importantly, it also gives the distribution of the waiting-time, 1, from one event to the next in a Poisson process i rate i. This is proved in Section 4 of this unit, In fact the time from any specified starting point (not necessarily the time at which the last event occurred) to the event occurring has this exponential distribution. This property can also be expressed as the "memaryless” proper Note: a gamma variable with parameters o.= & (a positive integer) and 2. can be expressed as the sum of & exponential variables, each with perameter A. This gamma distribution Is in fact the model for tne time from any spec'fied starting point to the occurrence of the kth event in a Poisson process with rate 2. The R code for simulating values and obtaining the PDF, CDF end quantiles from the exponential distribution is similar to the R code used for other continuous distributions using the R functions rexp, dexp, pexp and qex Special case 2: chi-square (7’) distribution with parameter “degrees of freedom”. Gamma with «= v/2 where v is a positive integer, and 2 = 1/2. Moments: 2v Note: a ° veriable with v= 2's the same as an exponential variable with mean 2. The R code for simulating velues and obtaining the PDF, COF and quantiles from the chi-square distribution is, similar to the R code used for other continuous distributions using the R functions rchisg, dchisg, pchisg and achisa. Beta distribution This is another versatile family of distributions with two positive parameters. The range of the variable is {x: 0< x < First note that the beta function (cr) is defined by (Leaf de The relationship between beta functions and gamma functions is Te) FB) Be, B)= P= TB) The R code for the beta function (a,b) isbeta(a,b). Probability (density) function: the PDF of a beta distribution is defined by en) 1 Fel) = x daa" ford 0, N(#) must be integer valued: 0,;.e. no events have occurred at time 0; (iid) when s <1, Ms) $M(O, ie. the number of events over time is non-decreasing; (iv) when s <1, Mi) ~N(s) represents the number of events occurring in the time interval (s, 0), The event number process {N()},.» 6 defined to bea Poisson process with parameter ifthe following three conditions are satisfied: © N(0)=0, and sys Nie) when s <6 GPE +H) =P NQ =) = 1-1 + ol) PONE H W art 1 N@ == Lh r POMC > 4 LING =) = olH) (note that a function /()is described os off) if tim Subject CSI Random variables and distributions Condition (i) states that in a very short time interval of length h, the only possible numbers of events are zero or. Consition (i) also implies that the number of events in a time interval of length # does not depend on when that interval starts. ‘The reason why a process satisfying conditions (i) to (li) is called a Poisson process is that for a fixed value of r,t random variable (2) has @ Poisson distribution with parameter 2u. This is proved as follows: Let p.(1) = PUN() =m). Then Gay" D0 = expl-bsy a will be proved by deriving “difterential-difference” equations from the conditions and then showing that (1.4.2) is their solution For a fixed value of r > 0 and a small positive value of h, condition on the number of events at time : and write PACD =p_ {O12 FN] AOL = Adv of] © (8) = Mp, (0+ [1 = Ailpld) +0(h) Thus +9) =P, = Wily, (0) ~ 2,01 +00) a and this identity holds for m= 1, 2,3, Now divide (1.4.3) by a, and let h go to zero from above to get the differential-difference equation a i PAO= Mp0 ~ P09) a with intial condition p,(0) = 0. When » =, an identical analysis yields a 4 Cd PAO a c with initial condition p,(0) = 1. It is now straightforward to verify that the suggested solution (1.4.2) satisfies both the differential equations (1.4. and (1.4.5) as well as the initial conditions, This study of the Poisson process conchuides by considering the distribution of the time to the first event, 7,,andt times between events, , 7, .... These inter-event times are often called the waiting times or holding times. P(T, > t)is the probability that no events occur between time 0 and time ¢ Hence PUL, > = PME) =0) = expt Ao} So the distribution function of 7, is F)=PUT, <= 1-expi-hy} s0 that 7, hasan exponential cistribution with parameter 2. Consider the conditional distribution of 7, given the value of 7,. AT t\T=n) =P + To rer 7, PING +r) =11N)= 1) = PING +) -NO=O1N)=1, Because the number of events in the time interval is independent of the number of events up to the start of that t interval (condition (ili) above), PONG +1) Nv) =0 1M) = 1) = POE +P) Mr) =0), Subject CSI Random variables and distributions 51 Since the number of events in a time interval of length » does not depend on when that time interval starts (cond (il) above, equations (1.4.1)) we have: PING +) — Nr) = 0) = P(N) = 0) = exp{ A} Hence, 7, has an exponential distribution with parameter A and 7, is independent of 7, This calculation can be repeated for T,, 7, “Monte Carlo” simulation With the advent of high-speed personal computers “Monte Carlo” simulations have become one of the most valu tools of the actuarial profession. This is because the vast majority of the practically important problems are not amenable to analytical solution We have already seen that we can simulate samples from distributions listed in Sections 2 and 3 using the R functions rbinom, egeon, rnbinom, rhyper, rpois, runif, rgamma, rexp, rchisg, rbeta, rnorm, rlnorm, rt and rf Below we outline one basic simulation technique that can be used to simulate values from most of these distributions. Inverse transform method for continuous distributions First we generate ¢ random number, U, from the U(0,1) distribution. We can use this to simulate a random variate with PDF (0 by using the CDF, F(x), Let U be the probability that 2 takes on a value less then or equal to x, ie. U= P(X S x)= Fix), hence x can be derived as: «=F Mw) Hence, the following two-step algorithm is used to generate a random variate x from a continuous distribution wi CDF F(x}: 1. generate a random number « from U(0, 1), 2 retum x= Fw). Formally, we can prove that the random variable X = F~(U)has the CDF F(x), as follows: P(X Sx)=PIF WU) Sx] [es FO]= Fe) Example1.5.1 Generate a random variate from the exoonentia’ distribution with parameter 2. The distribution function of Xis given by =log(= wr Thus, to generate random variate x from en exponential distribution we can use the following algorithm: 1 generatea random variate w from U(0, 1), 2, retum x =—log(1-u)/% ‘The main disadvantage of the inverse transform methods the necessity to have an explicit expression for the inw of the distribution function F(x), For instance, to generate a rancom variate from the standard normal distributiot Subject CSI 52 Random variables and distributions be However, no explicit solution to the equation « = F(x) can be found in this case. Example 1.5.2 Generate a random variate X from the double exponential distribution with density function f= Kor", reR Its possible in this case to find the distribution function F corresponding to f and to use the inverse transform method, but an alternative method is presented here. The density is symmetric about 0, we can therefore generate a variate Y having the same distribution as [XI and set.X=-+ or ’=-¥ with equal probability. Now th density of LX1is fin easily recognised as the density of the exponential distribution. The following algorithm therefore generates a val for x. 1. generate w, and u, from U0, 1) 2. ifu,<0.5return y=-In(1-u,)/@ , otherwise return y=In(l-1,)/@ Inverse transform method for discrete distributions Let. be a discrete random variable which can take only the values x,.%.....y . WN@r@ x, 0 Independence of random variables Consider a pair of variables (X, ¥), and suppose that the conditional cistribution of ¥ given X'= r does not actually depend on x at all. It follows that the probability function/pdf f(ylx) must be simply that of the marginal distribu of ¥,4(0) So, if conditional is equivalent to marginal, then: Sn(o)=F (rk) =F (9) fea) Fe fe) LALO) s0 joint pf/paf is the product of the two marginals. ‘This motivates the definition, which is given here for two variables: Definition: The random variablas and ¥ are independent if, and only if, the joint probability function/paf is the product of the two marginal probability functions/pdfs for al (x, ») in the range of the variables, Le. Sua (2) = fy (x) fy (») for all (x, ») in the range. Subject CSI 22 23 Joint distributions It follows that probability statements about values assumed by (, ¥) can be broken down into statements about AX and ¥ separately. Sof Nand ¥ are independent discrete veriables then P(X =x,¥ =y)=P(X=3) Pl If Vand ¥ are continuous, the double integral required to eveluate a joint probability splits into the product of twc separate integrals, one for Vand one for ¥, and we have PCa, < p*; hence the negative binomial has mean k/p and variance ke/ p* Physically, the number of trials up to tho £* success is the sum of the number of trials to the first success, plus the additional number to the Second SUCCESS, smu PIUS the additional number to the &® success. Further, the sum of two independent negative binomial variables, one (k,p) and the ot Subject CSI Joint distributions Poisson Let and Z be independent Poisson (2) and Poisson (7) variables. Then.x has MGF M (i) =exp{2(e' -D}, Zhas MGF M,(1) = exp{i(e’ -D} So the sum +7 has MGF [exp{a(e’ ~1)}]lexply(e' ~ | ]=expt(& + ye" —1)}, which is MGF of a Poisson (7+) variable. So the sum of independent Poisson variables is @ Poisson variable. Chas mean =variance = 2, Z has mean = variance =, and the sum has mean= variance = A+. Exponential/ Let_X,,i=1,2,...k,be independent exponential (4) variables. ° Then each has MGF M(1)=2(—1)" SoY=X,+.X,+...+X, has MGF (A(A—2)"J,, which is the MGF of a gamma (£,2) variable. So the gamma (k,2) random variable (for ka positive integer) is the sum of & independent exponential (2) random variables. Each exponential variable has mean 1/2. and variance 1/2; hence the gamma (k,2) has mean k/4 and variance k/2’, Physically, the time to the &* event in @ Poisson process with rate 2 is the sum of & individual inter-event times. Further, the sum of two independent gamma variables, one (a,2) and the other (82), is gamma (a +8,2) variable. Chi-square From the above result with 2.=1/2, it follows that the sum of a chi-square (n) and an independent chi-square (m) is @ chi-square (n+) variable. So the sum of independent chi-square variables isa chi-square variable. Normal Let.X be a normal random variable with mean 1, and standard deviation , and let ¥ be a normal random variable with mean 1, and standard deviation g,. Let and Y be independent, and let Z = x +¥. Xhas MGF M(1)=exp(ut+4o,""") Yhas MGF M,(0) = exp(Hyr ) and so the sum Z=1-+Y has MGF exp(ust+$0°7 Jexp(y,1+4632)=exp{(u, +H, )1+4(02+0;)?} which is the MGF of a normal variable {with mean Hy + sy and variance o% +041 So the sum of independent normal variables is @ normal variable. END Subject CSI Conditional expectation Unit 5 - Conditional expectation Syllabus objectives 1.3. Expectations, conditional expectations. 1.31 Define the conditional expectation of one random variable given the value of another random variable and calculate such a quantity 1.3.2 Show how the mean and veriance of a random variable can be obtained from expected values of conditional expected values, and apoly this. The conditional expecta in E [Y|X=x] Definition: The conditional expectetion of ¥ given X'=.s the mean of the conditional distribution of ¥ given X= This mean is denoted F[Y'| X'=a], or just ELV Lx} The random variable £ [¥ |X] ‘The conditional expectation [| x(x), say, is, in general, a function of.x. It can be thought of as the obser value of a random variable g(X). The random variable (2) is denoted FLY |X] Note: £[7 |X] s also referred to as the regression of Yon x. LY 1X1, ke any other function of x hs its own distribution, whose properties depend on those of the distributic of Xitset Of particular importance is the expected value (the mean) of the distribution of EL} |X], The usefulnes considering this expected value, E{E[Y|X]}, comes from the following result, proved herein the case of continuo Variables, but true in general Theorem: #/{'| XI] =A) Proof: z[z[Y| xI) = fal lx sear = Jffolndy Fedde The random variable V [¥ |X] and the “E [V] + V [E]” result The variance of the conditional distribution of Y given 4'=.x is denoted M11" | x], where VEY |x] = ELEY ~ FLY |x? La) = FIY? |x] - (FLY |) LY [a] isthe observed value of a random variable FLY | x] where wy ia = ep? | xy y = Fp Ley Hence EUTY | xq) = 6160" | XI — BLf@(4)? ELY*] = (VL XI) + ELtg 2") = EL] £Ufg(4))"] and so So the variance of ¥, W(Y) = E(¥*)-[E(V)P is given by ELV X0] + EL@-X)) 1 LE tg OH? = BLM 1X0] + Me), ie. HY}= EU XI) + MELD END Subject CSI 31 The Central Limit Theorem Unit 6 - The Central Li Theorem Syllabus objectives 15. Central Limit Theorem ~ statement and application, 1.5. State the Central Limit Theorem for a sequence of independent, identically distributed random variabl 15.2 Generate simulated samples from a given distribution and compare the sampling distribution with the Norma Introduction The Central Limit Theorem is perhaps the most important result in statistics. It provides the basis for large-sampl Inference about a population mean when the population distribution is unknown and more Importantly does not need to be known, It also provides the basis for large-sample inference about a population proportion, for examp in initial mortality rates at given age x, or in opinion polls and surveys It is one of the reasons for the importance the normal distribution in statistics, The Central Limit Theorem If XX. .-X, is a sequence of independent, identically distributed (.ic.) random variables with finite mean w 2 finite (non-2e0) variance «”, then the distrioution of XH approaches the standard normal distribution, N01). nye, oivn The way the Central Limit Theorem is used in practice is to provide useful normal approximations to the distributi of particular functions of a set of iid. random variables. EY Therefore both ff ana aS are approximately cistributed as (0.1) for large nm oivn vie Alternatively the unstanderdised forms can be used, Thus X is approximately N(u,0° /n) and 2X, is approximatel N(np,no"). Note: the symbol “=” is used to mean “is approximately distributed”, so we can write the statements in the. preceding paragraph as X = N(u,o" /n) and SX, = N(ni no?) An obvious question is: what i large n? A. common answer is simply n > 30 but this is too simple an answer. A fuller answer is that it depends on the shap the population, that is, the distribution of -X, , and in particular how skewed iti. If this population distribution is fairly symmetric even though non-normal, then » = 10 may be large enough: wher if the distribution is very skewed, n= 50 or more may be necessary. Normal approximations We can use Central Limit Theorem to obtain approximations to the binomial, Poisson end gamma distributions. T is useful for calculating probabilities and obtaining confidence intervals and carrying out hypothesis tests on a pic of paper. However, it is no bother for a computer to calculate exact probabilities, confidence intervals and hypoth tests. Hence, these approximations are not as important as they used to be. Binomial distribution Let X, be i.id. Bernoulli random variables, that is, binomial (1, p), so that MX =-D=p PU, =0)=1-p. In ather words ¥; is the number of successes in a single Bernoulli trial Consider x, ,..,.A,.2 Sequence of such variables. This is precisely the binomial situation and x =x, is the nun Subject CSI 32 The Central Limit Theorem Sox X, ~ binomial (n, p). A'so note that As a result of the Central Limit Theorem it can be said that, for large n, ¥ > N(p,o" sn) or SX, = N(mino”). For the Bernoulli distribution V[X]= p= p). Therefore YX, N(ap.n(1— p)) for larae n, which is of course the norrral approximation to the binomial. What is “large n"? A commonly quoted rule of thum is that the approximation can be used only when both np a (1 p) ae greater than 5. The “only when" is abit severe. It is more a case of the approximation being less good if either is less than 5. However, this rule of thumb is consistent with the answer that it depends on the symmetry skewness of the population. Note: when p=0.5 the Bernoulli distribution is symmetric. In this case both np and n(1- p) equal S when n so the rule of thumb suggests that n = 10 is large enough. 0, a ‘As p moves away from 0.5 towards either 0 or 1 the Bernoulli distribution becomes more severely skewed. For example, when p = 0.2 or 0.8 the rule of thumb gives n =25 as large enough, but, when p =0.05 or 0.95 the rule of thumb gives » = 100 as large enough. Poisson distribution Bs ‘The Central Limit Theorem implies that EX, © M(wh ni.) for large» But 2X, ~ Poisson (wh) and so, for large n, Poisson (ra) ~ N(mhynh) or, equivalentyy, Poisson (2) == N(2,2) for large 2. A rule of thumb for this one is that the approximation is good if 2.> 5. However since extensive tables for a range values of 2 are available, itis only needed in practice for much larger values of i. ‘The normal approximations to the binomial and Poisson distributions (both discrete) are the most commonly use practice, and they are needed as the direct calculation of probabilities is computationally awkward without them. Gamma distribution Lot 4,:=1,2....n be a sequence of il.¢, exponential (2) variables and ¥ be their sum, The exponential cistribution has mean y=1/ 2 and variance o =1/! forlarge n= 2X, #N (n/n/2?) ¥,, which is gamma (n,2), will have anormal approximation for large values of m Sineo 42 = gamma (k/2,1/2), x2 will have a normal approximation N(t, 2k forlarge vakies of its degroes of freed The continuity correction When dealing with the normal approximations to the binomial and Poisson distributions, which are both discrete, discrete distribution is being approximated by 2 continuous one. When using such an approximation the change t discrete to continuous must be allowed for. For an integer-valued discrete distributi n, such as the binomial or Poisson, it is perfectly reasonable to Subject CSI The Central Limit Theorem is not meaningful and is taken to be zero. For a continuous variable it is sensible to consider only the probabil that X lies in some interval. To allow for this a continuity correction must be used. Essentially it corresponds te treating the integer values as being rounded to the nearest integer. So to use the continuity correction in practice, for example, Be is equivalent to “3.5 Is is equivalent to X>155 Kes is equivalent to X> 145 Example 5.41 Let X'be a Poisson variable with parameter 20. Use the normal approximation to obtain a value for PUY's 15) and tables to compare with the exact value Solution x-20 A= Poisson (20) -. X +.N(20,20).. +NO1) 20 P(X S15) P(X <15.5): using continuity correction Is.9-20 {2< Te = Pz-<-1.006 ~ 0.84279, interpolating in tables to be as accurate as possible =0.19721. From Poisson tables, P(¥<15) = 0.15651 Error = 0.0007 or a 0.45% relative error. Comparing simulated samples We saw in a previous unit how to use R to simulate samples from standard distributions. We can then obtain the or mean of each of these samples. The following R code uses a loop to obtain the means of 1.000 samples of size 40 from a Poisson distribution with mean 5. It then stores these sample means in the vector xbar: set .seed(23) xbar <- rep(0,1000) for (i in 1:1000) {x<-rpois (40, 5) sxbarlil<-mean [x) Note that we have used the set. seed function to specify the seed for the simulation. Two simulations that us the same number as a seed will obtain exactly the same results. The Central Limit Theorem tells us that the distribution of the sample means will approximately have a N(5,0.125) distribution, The mean and variance of sbbax are 5.01135 and 0.1250763 which are very close. Subject CSI The Central Limit Theorem We can compare our observed distribution of the sample means with the Central Limit Theorem by a histogram of the sample means (using the R function hist) and superimposing the normal distribution curve (using the function curve): hist (xbar, prob=TRUE, ylim=c(0,1.2)) curve (dnorm(x,mean=5, sd=sqrt (0.125)), add=TRUB, lwd=2, col="red"). Histogram of xbar 7 aN : 3 Za Ps s Jal So 40 45. 5.0 55 60 xbar Another method of compering the distribution of our sample means, xbar, with the normal distribution is to examine the quantiles. In R we can find the quantiles of xbax using the quanti le function. Using the default setting (type 7) to obtain the sample lower quartile, median and upper quartile gives 4.775, 5.000 and 5.250, respectively, However, in subject CSI, we prefer to use type 5 or type 6. InR, we can find the quartiles of the normal distribution using the gnozm function, This gives a lower quartile, median and upper quartile of 4.762, 5000 an¢ 5.238, respectively. We observe that ot distribution of the sample means is slightly more spread out in the tails - which is what we observed in the previc diagram. ‘A quick way to compare all the quantiles in one go is by drawing a QQ-plot using the R function aqnorm. Subject CSI The Central Limit Theorem Normal Q.@ Plot °F *y va é a34 5 302 4 0 1 2 3 Theoretical Quaniles If our sample quantiles coincide with the quantiies of the normal distribution, we would observe a perfect diagon Ine (which we have added to the ciagram for clarity). For our example we can see that xbar and the normal distribution are very similar excent in the tails where we see that bar has a lighter lower tail and a heavier uppe than the normal cistribution, END Subject CSI Random sampling and sampling distributions Unit 7 - Random sampling and sampling dist Syllabus objectives 2.3. Random sampling and sampling distributions. 231, Explain what is meant by a sample, a population and statistical inference. 2.3.2 Define a random sample from a distribution of a random variable. 2.3.3 Explain what is meant by a statistic and its sampling astribution. 2.3.4 Determine the mean and variance of a sample mean and the mean of a sample variance in terms of the population mean, variance and sample size. 2.3.8 State and use the basic sampling distributions for the sample mean and the sample variance for randor samples from a normal distribution. 2.3.6 State and use the distribution of the t-statistic for random samples from a normal distribution. 2.3.7 State and use the /’ distribution for the ratio of two sample variances from independent samples taken from normal distributions. 1 Introduction When a sample is taken from a population the sample information can be used to infer certain things about the population. For example, it can be used to estimate a population quantity or test the validity of a statement made about the population. 2 Basic definitions Theoretically this deals with samples from infinite populations. Actuaries are concerned with sampling from populations of policyholders, policies, claims, buildings, employees, etc. Such populations may be looked upon as conceptually infinite but even without doing so, they will be very large populations of many thousands and so the methods for infinite populations will be more than adequate, ‘random sample is made up of independent and identically distributed (i.d.) random variables and so they are denoted by capital Xs, We will use the shorthand notation x to denote a random sample, that is, X,.XseonX). An observed sample wil be denoted by x =(r,.x3...1,): The popuiation distribution will be spectfied by a probability (density) function denoted by (1:8), where 0 denotes the parameter(s) of the distribu Due to the Central Limit Theorem, inference concerning a population mean can be considered without specifying form of the population, provided the sample size is large enough, A statistic isa function of X only and does not involve any unknown parameters. Thus ¥ =“! and os 1 2 (X, 8) ate statisties whereas ! x(x, -)" is not, unless of course pis known, A statistic can be generally denoted by g(X), Since a statistic is 2 function of random variables, it will bea rand variable itself and willhave a distribution, its sampling distribution. 3 Moments of the sample mean and vai nce 3 The sample mean Suppose X, has mean u and variance 6% XY, Recal that the sample mean © Consider first 2X; efx] #[X,]=20 1 [X,]: independent ‘identically distributed r[er =no*: identically distributed. Subject CSI Random sampling and sampling distributions 32 42 22, you can now write down that E[ Note: the standard deviation of ¥, which is © is called the standard error of the sample mean. The sample variance Recal that the sample variance S* = 3(x,—Xy. mt Considering only the mean of $*, it an be proved that £[.S*] as follows: = 2%] Taking expectations and noting thet for any random variable Y, apr =V[}+(E[¥])’ leads to -u(Py) fom n(o* +4) 0° —m"} Sampling distributions for the normal The sample moan The Central Limit Theorem provides a large-sample approximate sampling distribution for X without the need for any distributional assumptions about the population. So, for large n, = N(QO,1) or X = N(o" fn) olde This result is often called the zresut. It transpires that the above result gives the exact sampling distribution of X for random samples from a normal population, The sample variance The sampling distribution of S* when sampling from a normal population with mean yt and variance ois (n-Ds' Whereas the distribution of ¥ is normal and hence symmetric, the distribution of S” is positively skewed especial s0 for small but becoming symmetrical for large 7. Using the x’ result to investigate the first and second order moments of S?, when sampling from a normal popula and the fact that the mean and variance of &: are k and 2k, respectively, ee] 20 For both X and S” the variances decrease and tend to zero es the sample size n increases. Added to the facts tha ELF] =n and £[s°]=0°, these imply that gets closer to 4 and 5” gets closer to g” as the sample size increas -158[5'] Subject CSI 43 Random sampling and sampling distributions Independence of the sample mean and variance The other important feature when sampling from normal populations is the independence of ¥ and S*. A full pro this is not trivial but itis a result that is easily appreciated as follews. Suppose that a sample from some normal distribution has been simulated. The value of ¥ does not give any information about the value of s?. However, if the sample is from some exponential distribution, the value of ¥ do give information about the value of s*, as wand o* are related. Other cases such as Poisson, binomial, gamma can ke considered in 2 sirvilar way, but only the normal has the independence property. The result The sampling distibution for Z, that is ah ~ NGO) or ¥~ N(uyo" /m), wil be used in subsequent units for on inference concerning yt when the population variance «is known. However, this is rare in practice and another result is needed for the realistic situation when o* is unknown. This is the i result or the t sampling cistribution. ‘The 1 result is similar to the = result with o replaced by S and N(0,1) replaced by f,. ‘Then ther result above follows from the sampling distributions of the last section, that is, “—H as the N(0,!) oi vn when sampling from (nS? > YS. ~ 22. asthe ti, together with their independence, to obtain s vn a normal population, and ‘The 1 distribution is symmetrical about zero and its critical points are tabulated. It looks similar to the standard normal especially for large degrees of freedom. The following picture shows a ¢, density, af, density and a (0,1) density for comparison. . faistbaton —2z * — tel 34 & & In fact, as k > 2,1, > N(O1). ‘The 4, distribution is also called the Cauchy distribution and is peculiar in that none of its moments exist, not even mean. However since samples of size 2 are unrealistic, it should not arise as a sampling distribution. Subject CSI Random sampling and sampling distributions 6 The F result for variance ratios Uys, ‘The F distribution is defined by # =, where U’and Mare independent 2 random variables with v, and v, degr VA, of freedom respectively. Thus if independent random samples of size n, and n, respectively are taken from normal populations with variances o and l Lee Alternatively, F~F, int OE ‘This reciprocal form is needed when using tables of critical points as only upper tall points are tabulated. See “Formulas and Tables” END Subject CSI 12 Estimation and estimators Unit 8 - Estimation and estimators Syllabus objectives 31 Estimation and estimators 311 Describe and apply the method of moments for constructing estimators of population parameters. 31.2. Describe and apply the method of maximum likelihood for constructing estimators of population parameters. 31.3. Define the terms: efficiency, bias, consistency and meen scuared error. 31.4. Define and apply the property of unbiasedness of an estimator. 315. Define the mean square error of an estimator, and use it to compare estimators. 31.6 Describe and apply the asymptotic distribution of maximum likelihood estimators. 31.7 Use the bootstrap method to estimate properties of an estimator. The method of moments The basic principle is to equate population moments to corresponding sample moments and solve for the parameter(s). ‘The one-parameter case ‘This is tho simplest case: to equate population mean to sample moan and solve for the paramotor Le 1y Note: For some populations the mean does not involve the parameter, such as the uniform on (0,0) oF the normal ’(0,s°), in which case a higher order moment must be used. However, such cases are rarely of practic importance. ie. ELX] The estimator is written as upper case as itis a random variable and will have a sampling distribution. The estima written as lower case as it comes from an actual sample of numerical values. The two-parameter case This involves equating the first and second order moments of the population and the sample, and solving the resulting pair of equations. Moments about the aricin can be used but the solution is the same (and often mare easily obtained) using moments about the mean - apart from the first order moment being the mean itself. So the second order equation is ig or eft wn) ]=7 > (x) Note that s*with divisor (n—1)is often used in place ofthe second central sample moment. For cases with more than two parameters, moments about zero should be used The method of maximum likelihood The method of maximum likelinood is widely regarded as the best general method of finding estimators. These estimators tend to be especially useful in the large-sample situation es they have excellent, easily determined asymptotic properties Subject CSI 2a 22 23 Estimation and estimators The one-parameter case ‘he most important stage in applying the method is that of writing down the likelihood 1(0)=T] #(x:9) for arandom sample x,.x,.... x, from a population with density or probability function fix: 6). The likelihood is the probebility of observing the sample in the discrete case, and is proportional to the probabilit observing values in the neighbourhood of the sample in the continuous case. In most cases, taking logs greetly simpifies the determination of the maximum likelihood estimator (MLE), 6. Differentiating the likelihood or log likelihood with respect to the parameter and setting the derivative to zero giv the maximum likelihood estimator for the parameter. It is necessary to check either formally, or through simple logic, that the turning point is a maximum. Generelly, th Ikelihood starts at zero, finishes at or tends to zero, andis non-negative. Therefore, if there is one turning point it must be a maximum, MLEs display the inveriance property, which means that ifd is the MLE of © then the MLE of e function 2(A)is 2(6 For populations where the range of the random variable involves the parameter, care must be taken to specify wh the likelihoods zero and non-zero. Often a plot of the likelihood is helpful Examples, Given a random sample of size n from the exponential population with density de * = >0, the MLE. i. is found as follows: ns n . equating to zero: "x, -0= 4, quating x MLE isi. IR. The two-parameter case This is straightforward in principle and the method is the same as the one-parameter case, but the solution of the resulting equations may be more awkward, perhaps requiring an iterative or numerical solution. The only difference is that a partial derivative is taken with respect to each parameter, before equating each to ze and solving the resulting system of simultaneous equations for the parameters. Incomplete samples The method of maximum likelihood can be applied in situations where the sample is incomplete. For example, truncated data or censored data in which observations are known to be greater than a certain value, or multiple claims where the number of claims is known to be two or more. In these situations, provided the likelihood (the probability of observing the given information) can be written as Subject CSI 24 Estimation and estimators For example, suppose a sample yields » observations (1,,x,,.....-%,) and m observations greater than the value y, then the likelihood is given by £(0)-(FIrts.0)}s(r(a> 99" Independent samples For independent samples from tw populations which share a common parameter, the overall likelihood is the product of the two separate likelihoods. Unbiasedness Consideration of the sampling distribution of an estimator cen give an incication of how good it is as an ostinato: Clearly the aim is for the sampling distribution of the estimator to be located near the true value and have a small spread IF we have a random sample X = (.1,, X,,..-¥,) from a distribution with an unknown parameter 0 and g(X) is an estimator of 0, it seems desirable that £[ ¢(X), ‘This is the property of unbiasedness. Iran estimator is biased, its bias is given by E[ g(X)]~8, Le. It is @ measure of the difference between the expect Value of the estimator and the paremeter being estimated. The property of unbiasedness is not preserved under non-linear transformations of the estimator/parameter. As indicated earlier unbiasedness seems to be 2 desirable property. However, it is not necessarily an essential property for an estimator. There are many common situations in which a biased estimator is better than an unbia: one, and, in fact, better than the best unbiased estimator. ‘The importance of unbiasediness is secondary to that of having a small mean square error. Mean square error {As biased estimators can be better than unbiased ones 8 measure of efficiency is needed to compare estimators generally. That measure is the mean square error. ‘The mean square error (MSE) of an estimator g(X) for 0 is defined by: Mse(e(t))= #[le(a)-0)'] Thus the mean square error is the second moment of g() about @ and an estimator with a lower MSE is said to b more efficient. Note: If the estimator g(X) is unbiased, then MSE = Variance. The MSE of a particular estimator can be worked out directly as an integral using the density of the sampling distribution of (2), or using the density of x itself However, itis usually much easier to use the alternative expression MSE = Variance + bias’ as this makes use of quantities that are already known or can easily be obtained, This expression can be proved as follows: (Simplifying things by dropping the (X) and writing simply 2) Subject CSI Estimation and estimators MSE(e) #((2-0) | [fe #Le) -(ele]-9}'] ={(e-FLel) |+2(zLe]-0) é[e - ele]]+[z[e]-0) ¥[g]+0+ bias‘[z] as required The following diagram gives the sampling cistributions of two estimators: one is unbiased but has 2 large verianc the other is biased with a much smaller variance. Ths illustrates a situation in which a biased estimator is better t! an unbiased one. B 2 2 4 0 1 2 ‘An estimator with a "small" MSE isa good estimator It is also desirable that an estimator gets better as the samp size increases. Putting these together suggests that it is desirable that MSE — 0 as > <», This property is knowr consistency. Asymptotic Given a random sample of size » from ¢ distribution with density (or probability function in the discrete case) £ (338), the maximum likelihood estimator @ is such that, for large n, @ is aoproximately normal, unbiased with \eriance given by the Cramer-Rao lower bound (CRLB), thatis, 6+ '(0,CRLB) ist ution of maximum likelihood estimator where CRLB = —-——_———— -E| <1ogh.(0.X} [ Seto.) Noting that the likelihood, £(6), is really L(@, x). The MLE can therefore be called asymptotically efficient in that for large n, it is unbiased with a variance equal to lowast possible valuo of unbiased estimators. ‘his 's potentally a very useful result as it provides an approximate distribution for the MLE when the true sampli distribution may be unknown or impossible to determine easily, and hence may be used to obtain approximate confidence intervals. The result holds undler very general conditions with only one major exclusion: it does not apply in casos where th range of the distribution involves the parameter, such as the uniform distribution. Two altemativo expressions for the CRLB aro: CRIB = and CRLB = oN 2 rerve yh b cell é Subject CSI aa 72 Estimation and estimators Some remarks on estima’ Essentially maximum ikelinood is regarded as the better method, In the usual one-parameter case the method cf moments estimator is always a function of the sample mean X and this must limit its usefulness in some situations. For example in case of the uniform distribution on (6,8) the method of moments estimator is 2.¥ and this can result in inadmissible estimates which are greater than 0. Nevertheless, in many common applications such as the binomial, Poisson, exoonential and normal cases both methods yield the same estimator. In some situations such as the gamma with two unknown parameters, the simplicity of the method of moments g it a possible advantage over maximum likelhood which may require a complicated numerical solution. The bootstrap method Introduction to the bootstrap ‘The bootstrap method is 2 computer intensive estimation method and can be used to estimate the properties of estimator. It is mainly distinguished in two types: parametric and non-parametric bootstrap. Suppose that we want to make inferences about parameter 0 using observed data (y.,.¥3..--.34) which follow a distribution with cumulative distribution function #(y;0) Usually inference is based on the sampling distribution ¢ estimator 0. A sampling distribution is obtained either by theoretical results, or is based on a large number sampl from F(y8). For example, suppose we have a sample (1.)35-.-934) froman exponential distribution with parameter 2 and we wish to make inferences about 4. The Central Limit Theorem telisus that asymptotically ¥ ~ N(I/2., fn?) and we can use this sampling distribution to estimate quantities of interest (e.g. for confidence intervals or tests about 4) However, thete will be cases where assumptions or asymptotic results may not hold (or we may not want to use them - eg. when samples are small. ‘Then one alternative option is fo use the bootstrap method. Bootstrap allows us to avoid making assumptions about the sampling distribution of a statistic of interest, by instead forming an empirical sampling distribution of t statistic. This is generally achieved by resampling based on the available sample. Non-parametric (full) bootstrap The main idea behind non-parametric bootstrap, when estimating a parameter @, can be described as follows. Construct the empirical distribution, F,, of the data: {Number of, < 9} ‘Then perform the following steps: 1. Drawa sample of sizen from F,, This is the bootstrap sample (y/,3,....);) with y” selected with replacement from (.¥2..--.J4) 2. Obtain an estimate 6* from the bootstrap sample. This is done in the same way as 6 is obtained from the original sample. 3. Repeat steps and 2, say, times. Provided that 2 is sufficiently large the output set of estimates (07,6, distribution of 6, which serves as an estimate of the sampling distribution of 6, and is referred to as the bootstrap ‘empirical distribution of 6. will provide the empi Subject CSI Estimation and estimators Schematically, this can be thought as sampler: (iexoavi) 4 sample 2: (y/.) Hdrvenn Ie > Bootstrap empirical distribution of 0, sample B The bootstrap distribution of 6 can then be used for any desired inference regarding the estimator 6, and particul to estimate its properties. For example we can’ + estimate the mean of estimator @ by using the sample mean of the bootstrap estimates (6, 8-535, + estimate its median, using the 0.5 empirical quantile of the bootstrap estimates 6; + estimate the variance of estimator 6 by using the sample varience of the bootstrap estimates (6;,05,....65) Bayt 1 (hae (ger) + estimate a (I —a)% confidence interval for 6 by wn) where k, denotes the o empirical quantile of the bootstrap values 6. Confidence intervals are described in Unit 9 (how Example 8.7.1 ‘Suppose we have the following sample of 10 values (to 2 DP) from an £xp(%.) distrioution with unknown paramete 0.61 6.47 2.56 5.44 2.72 0.87 2.77 600 0.14 0.75 We can use the following R code to obtain a single resample with replacement from this original samole. sample.data <- c(0.61, 6.47, 2.56, 5.44, 2-72, 0.87, 2.77, 6.00, 0.14, 0.75) ample (sample .data, replace=TRUE) Note that this is non-parametric as we are ignoring the Exp(2.) assumotion to obtain a new sample. The folowing R code obtains #=1,000 estimates (Ki. estimate: ‘wn using 45 1/7; and stores them in the vector set.seed(47) te(0, 1000) estimate <- repl for (i in 1:1000) (xccsample (sample.data, replace=TRUE) estimate [1] <-1/nean (x)) Analternative would be to use set .seed(47) Subject CSI 23 Estimation and estimators ‘his gves us the following empirical samping distribution of &: Histogram otestimate «20 soo 20 J Frequency 100 oo 02 04 «08 08 stimeto We can obtain estimates for the mean, standard error and 95% confidence interval of the estimator 2 using th folowing R code: mean (estimate) ed(cetimate) quantile(estimate, c(0.025,0.975]) Parameti bootstrap If we are prepared to assume that the sample is considered to come from a given distribution, we first obtain an estimate of the parameter of interest 6 (e.g. using maximum likelihood, or method of moments). Then we use the assumed distribution, with parameter equal to 6, to draw the bootstrep semples. Once the bootstrap samples are available, we proceeds with the non-parametric method before. Example 8.7.2 Using our sample of 10 values (to 2 decimal places) from an exponential distribution with unknown parameter 2: 0.61 6.47 2,56 5.44 2.72 0.87 27 6.00 0.14 0.75 Our estimate would for 4. would be 4 =1/7 = 1/2.833= 0.3530. We now use the exponential distribution with parameter 0.3930 to generate the bootstrap samples. Note that this is parametric as we are using the exponential distribution to obtain new samples. ‘We can use the following R code to obtain B = 1,000 estimates (27,43....2/ )) using the vector paramestimate: 1 ¥; and stores themi set.seed(47) param.estimate <- rep(0, 1000) for (i in 1:1000) {x<-xexp(10, rate=1/mean(sample.deta}); param.estimate(i}<-1/mean (x) } Analternative would be to use ee Subject CSI Estimation and estimators ‘his gives us the following empirical sampiing distribution of Histogram of paramestimate 600 500 00 Frequency 300 200 100 oo oo 09 os 10 18 20 param estimate Various inferences can then be made using the bootstrap estimates (i,i,...44,) 28 before. Bootstrap methodology can also be used in other, more complicated, scenarios - for example in regression analy’ or generalised linear model settings. END Subject CSI Confidence intervals Unit 9 - Confidence intervals Syllabus objectives 3.2. Confidence intervals, 3.21. Definein general terms a confidence interval for an unknown parameter of a distribution based on a random sample, 3.2.2 Derive a confidence interval for en unknown parameter using a given sempling distribution. 3.2.3 Calculate confidence intervals for the mean and the variance of a normal distribution. 3.2.4 Calculate confidence intervals for a binomial probability and a Poisson mean, including the use of the normal approximation in both cases. 3.2.5 Calculate confidence intervals for two-sample situations involving the normal distribution, and the binomial and Poisson distributions using the normal approximation. 3.2.6 Calculate confidence intervals for a difference between two means from paired data 3.2.7 Use the bootstrap method to obtain confidence intervals. Confidence ‘tervals in general A confidence interval provides an “interval estimate" of an unknown parameter (as opposed toa “point estimate” It is designed to contain the parameter’s value with some stated probability. The width of the interval provides a measure of the precision accuracy of the estimator involved. ‘A100(1—«2)% confidence interval for @ is defined by specifying rendom variables 6, (X), 6,(X) such thet P(8,(x)<0<6,(X) Rightly or wrongly, «= 0.05 leading to a 95% confidence interval is by far the most common case used in practice and we will tend to use this in most of our illustrations. Thus P(6,(x)<0<6,(x))=095 specifies (3, (1x), 6, ()) 2s a 95% confidence interval for 0. This emphasises th fact that itis the interval and not 0 that is random. In the long run 95% of the realisations of such intervals will inc @ and 5% of the realisations will not include ®. Confidence intervals are not unique. In general, they should be obtained via the sampling distribution of a good estimator, in particular the maximum likelihood estimator. Even then there is a choice between one-sided and twc sided intervals and between equal-tailed and shortest-length intervals although these are often the same, 0.9. for sampling distributions that are symmetric about the unknown value of the parameter. Derivation of confidence intervals ‘There is a general method of constructing confidence intervals called the pivotal method. This method requites the finding of a pivotal quantity of the form (8) with the following properties. (tis a function of the sample values and the unknown parameter 8. (2) ts distribution is completely known, @) tis monctonic in ‘he equation J Fi) at =085, where f(s the known probability (density) of (4,0) defines two values, g, and g, such that (5 <#(X.0) <))=0.95 ee Subject CSI Confidence intervals 8 <2(¥.0)20,<0 and if g(,8) is monotonic decreasing in @, then 2 (X40) <9, 70,50 8, , con: For small samples from a non-normal distribution then confidence intervals can be constructed empirically in Rusing the bootstrap method described in Unit 8, Section 7. For example, a non-parametric 95% confidence Interval for the mean could be obtained by: quantile (replicate (1000,mean(sample(, replace

You might also like