○ Descriptive statistics focus on gathering, sorting, summarizing, and displaying data, while Inferential statistics focus on using descriptive statistics to estimate population parameters based on simple data. ● Population vs parameter ○ The population of a study is the study of a group the collected data is intended to describe. ○ A parameter meanwhile is a value, such as an average, percentage, etc, calculated using all data from the population. ■ It is rare for parameters to be used due to how time and resource heavy it is to collect the data, unless the pop is small or the data is already collected. ○ A census meanwhile is a survey of an entire population. ● Sample vs statistic ○ Samples are smaller subsets of the entire population, ideally one that is representative of the population as a whole. ○ A Statistic is a value that is calculated using data from a sample. ○ Example ■ The city of Raleigh has 8,700 registered voters. There are two candidates for city council in an upcoming election: Brown and Feliz. The day before the election, a telephone poll of 600 randomly selected registered voters was conducted. 197 said they'd vote for Brown, 388 said they'd vote for Feliz, and 15 were undecided. ■ Identify the following: ● Population ● Sample ● Sample statistic ● Number and percentage of voters expected to vote for (1) Brown, (2) Feliz, and (3) Undecided. Round your answers to the nearest person. Round your percentages to the nearest tenth. ■ Population:8,700 ■ Sample size:600 ■ Voters: ● Brown:2857, 32.8% ● Feliz:64.7% ● Undecided:218, 2.5% ● Data Types ○ Categorical/Qualitative data are pieces of data that allow us to classify the objects under investigation into different categories. ○ Quantitative data are responses that are numerical in nature, with which we can preform meaningful arithmetic calculations ○ Examples ■ Classify these as categorical or quantitative ● Zip codes:Categorical ● Eye color of a certain group:categorical ● Daily high temperature of a city over several weeks:Quantitative ● Annual income:quantitative ● Sampling methods ○ A sampling method is biased if every member of a population doesn’t have a fair chance of being included. ○ Random samples are ones where each member of the population has an equal chance of being chosen. ■ A simple random sample is one where each member of the population and any group of members has an equal chance of being chosen. ○ Stratified sampling is where a population is devided into a number of subgroups known as strata. Random samples are then taken from each group with sample sizes proportional to the size of the subgroup in the population. ■ Quota sampling is a variation on stratified, where samples are collected in each strata until the quota is met. ○ Cluster sampling is where the population is separated into subgroups known as clusters, and further sets of subgroups are to be selected from these clusters. ○ Systemic sampling is where every Nth member of a population is to be selected in the sample. ○ Convenient sampling is chosen by who ever is most convenient. ○ Voluntary response is where the sample size is based upon volunteer. ○ Examples ■ Identifty each type of sampling used. ● A sample was selected to contain 25 men and 35 women. Stratified sampling. ● Viewers of new show being asked to vote on a website. Voluntary response. ● Every 4th member of a class was asked. Systemic. ● Website randomly sends survey to 50 users. Simple random. ● To survey voters in a town, a polling company randomly selects 10 city blocks, interviews everyone who lives on those blocks. cluster. ● Studies and experiments. ○ Observational studies is a study based upon observations or measurements. ○ An experiment is a study where the effects of a treatment are measured. Tables and graphs. ● Frequency tables are tables with two columns where one column lists the categories and the other with the frequencies at the items occur, aka how many items fit the category. ○ A relative frequency table is a table where you have columns of fractions or percents detailing the relative frequency of each category. ● A bar graph is a graph that displays a bar for each category with the length of each bar indicating the frequency of the category. ○ Pareto graphs are bar graphs that are ordered from highest to lowest. ● A pie chart is a circle with wedges cut into varying sizes, where the sizes indicate the frequency of items in that category. ● A histogram is a graph that displays a rectangle for each numerical class interval for each rectangle indicating the frequency of values in the interval. A histogram is close to a bar graph but the horizontal line is a number line, with all class intervals being of equal width. ● A line chart shows each category as a point connected with a line.
Measures of center and variation
● Measures of central tendency ○ This is the distribution of a variable or data set that refers to the way it’s values are spread over all possible values. The distribution can be shown visually with a table or graph. ■ Mean:the arithmetic mean is also known as the average, is the sum of all values divided by the total number of values, or Sum/total. ■ Median:the median is the middle value when the data is sorted in numerical order, or halfway if the number of values are even. ■ Mode is the most common value or group of values in a data set. ■ Outliers are values that are much higher or lower than all other values. This will bring the mean up. ■ The range is the different between the maximum and minimum values. ■ Standard deviation is a measure of variation based on measuring how far each data value deviates or is different, from the mean. ● Standard deviations are always positive, and will be zero if all the data values are equal, and will increase as data spreads out. ● SD has the same units as original data. ● SD is also affected by outliers/ ● SD=square root of Sum of (deviations from the mean)2/total numbers of data values-1 ○ Examples ■ For the following dataset of contract offers, find the mean, median, mode, range, and standard deviation: $50,000, $80,000, $100,000 $90,000,$10,000,000 ● N=5 ● Mean:2,064,000 ● Median:90,000 ● Mode: ● SD:1.968163e13