Statistics LO1 LO4
Statistics LO1 LO4
Topic 1: Define the concepts of data and information, and then clarify how each is used.
Statistics is the science dealing with the collection, analysis, interpretation, and presentation of data.
Data is information, especially facts or numbers. Data is collected to be examined and considered and
used to help decision-making. It can also be information in an electronic form. In addition, data is the
raw information from which statistics are created.
Uses of data:
Uses of information:
Decision-making.
Problem solving.
Develop skills.
Develop knowledge.
Topic 2: Explain the different sources of data, and then evaluate them by stating their benefits and
Limitations
A. Primary data: this kind of data is collected for the first time and it is original. For example:
Population census by the government.
Advantages: Researchers are collecting data for the specific purposes of their study. / More up to
date, since previous studies may not answer the questions you need to collect data.
Disadvantages: Needs a large enough sample to make the sample authoritative and be able to
generalize.
B. Secondary data: this kind of data is obtained not collected, gathered from studies, surveys, etc.
Advantages: Secondary data tends to be easily available and cheap to obtain. / Secondary data are
collected over a long period, which allow researchers to uncover the changes over time.
Analysis methods:
Exploratory data analysis: In order to summarize data key features, derive relevant variables, and
evaluate the underlying hypotheses, exploratory data analysis is an approach to data analysis.
Example: surge in the number of users canceling their product subscription. You want to find out why
this is so that you can tackle the underlying cause and reverse the trend.
Descriptive data analysis: Descriptive data analysis is a mathematical method used to identify trends or
principles by identifying and summarizing historical data.
Example: The idea of a GPA is that it takes data points from a wide range of exams, classes and
grades, and averages them together to provide a general understanding of a student's overall
academic performance. A student's personal GPA reflects their mean academic performance.
Confirmatory data analysis: In order to validate preconceived hypotheses in general, figures are directed
at addressing one or more research questions. In comparison, confirmatory data analysis is where you
use traditional statistical tools to analyze the facts, such as inference, relevance, and trust.
Example: Blood units initially tested positive must undergo confirmatory tests to confirm the presence
of a specific virus or disease.
References:
Courses.lumenlearning.com. 2020. 1.1 Definitions Of Statistics And Key Terms | Introduction To Statistics.
[online] Available at: <https://courses.lumenlearning.com/odessa-introstats1-1/chapter/definitions-of-
statistics-probability-and-key-terms/> [Accessed 13 November 2020].
Data?, W., 2020. What Are The Advantages And Disadvantages Of Internal Sources Of Data? - Blurtit.
[online] Science.blurtit.com. Available at: <https://science.blurtit.com/429449/what-are-the-advantages-
and-disadvantages-of-internal-sources-of-data> [Accessed 17 November 2020].
www.dictionary.com. 2020. Definition Of Data | Dictionary.Com. [online] Available at:
<https://www.dictionary.com/browse/data> [Accessed 10 November 2020].
Merriam-webster.com. 2020. Definition Of INFORMATION. [online] Available at: <https://www.merriam-
webster.com/dictionary/information> [Accessed 10 November 2020].
www.dictionary.com. 2020. Definition Of Information | Dictionary.Com. [online] Available at:
<https://www.dictionary.com/browse/information> [Accessed 10 November 2020].
Dictionary.cambridge.org. 2020. DATA | Meaning In The Cambridge English Dictionary. [online] Available
at: <https://dictionary.cambridge.org/dictionary/english/data> [Accessed 10 November 2020].
Sealey, D. and Sealey, D., 2020. The Four Uses Of Data (Big Or Small) - Storm81. [online] Storm81.
Available at: <http://storm81.com/data/four-uses-of-data> [Accessed 17 November 2020].
Igi-global.com. 2020. What Is Information Use | IGI Global. [online] Available at: <https://www.igi-
global.com/dictionary/information-literacy-and-the-circular-economy-in-industry-40/14578> [Accessed 17
November 2020].
BYJUS. 2020. What Are The Sources Of Data? Primary And Secondary Data. [online] Available at:
<https://byjus.com/commerce/what-are-the-sources-of-data/> [Accessed 14 November 2020].
Development, H., 2020. Secondary Data - Meaning, Its Advantages And Disadvantages. [online]
Managementstudyguide.com. Available at:
<https://www.managementstudyguide.com/secondary_data.htm> [Accessed 17 November 2020].
Task 2: LO2
You have been asked by your supervisor to show your ability to analyse and evaluate qualitative and
quantitative raw business data using a number of statistical methods such as central tendency measures
and variation measures. In this task you have to:
Part 1: Provide a word document includes an evaluation (strengths and weaknesses) of the
differences in application between descriptive statistics and inferential statistics.
Descriptive statistics: Descriptive Statistics refers to a discipline that quantitatively describes the
important characteristics of the dataset. For the purpose of describing properties, it uses measures of
central tendency, i.e. mean, median, mode and the measures of dispersion i.e. range standard deviation,
quartile deviation and variance, etc.
Inferential statistics: Inferential Statistics is all about generalising from the sample to the population, i.e.
the results of the analysis of the sample can be deduced to the larger population, from which the sample
is taken. It is a convenient way to draw conclusions about the population when it is not possible to query
each and every member of the universe. The sample chosen is a representative of the entire population;
therefore, it should contain important features of the population.
Descriptive Statistics is a discipline which is concerned with describing the population under study.
Inferential Statistics is a type of statistics; that focuses on drawing conclusions about the population,
on the basis of sample analysis and observation.
Descriptive Statistics collects, organises, analyses and presents data in a meaningful way. On the
contrary, Inferential Statistics, compares data, test hypothesis and make predictions of the future
outcomes.
There is a diagrammatic or tabular representation of final result in descriptive statistics whereas the
final result is displayed in the form of probability.
Descriptive statistics describes a situation while inferential statistics explains the likelihood of the
occurrence of an event.
Descriptive statistics explains the data, which is already known, to summarise sample. Conversely,
inferential statistics attempts to reach the conclusion to learn about the population; that extends
beyond the data available.
The strength:
In order to improve data and improve it in the future and process the defects, it makes it easier for data
users to analyze, understand and study the variables.
The weakness:
They are so limited that they only allow you to make summaries about people or things that you have
already measured not all things.
The strength:
It’s allowed to the researcher to make generalizations about the data set or in most cases.
The weakness:
After completing inferential statistics, a person must provide data about a population that you have not
fully measured, and therefore you cannot be completely sure that the values / statistics you compute are
correct.
SCENARIO 2 :
4-The percentage of employees 'salaries exceeding the average of 60% and the employees below the
average of 40%, so I should change the percentage of employees' salaries to 60% below the average and
40% more than the average by reducing the salaries of at least two employees. Example:
This results in a difference of 140, and that difference is added to employee F's salary, changing the value
from 350 to 490.
Then we collect the value of salaries to show that their total has not changed, which is 7500, and that
average salaries have not changed.
1) Linear Regression: It is used when we want to predict the value of a variable based on the
value of another variable. The variable we want to predict is called the dependent variable
(or sometimes, the outcome variable).
Advantages:
1. Linear Regression is simple to implement and easier to interpret the output coefficients.
2. When you know the relationship between the independent and dependent variable has a
linear relationship, this algorithm is the best to use because of its less complexity compared
to other algorithms.
3. In addition, it works in most cases. Even when it doesn't fit the data exactly, we can use it
to find the nature of the relationship between the two variables.
Disadvantages:
1. On the other hand in the linear regression technique outliers can have huge effects on the
regression and boundaries are linear in this technique.
3. But then linear regression also looks at a relationship between the mean of the dependent
variables and the independent variables. Just as the mean is not a complete description of a
single variable, linear regression is not a complete description of relationships among
variables.
Advantage
1. Less prone to whipsawing up and down in response to slight, temporary price swings back
and forth
2. Moving averages can be used for measuring the trend of any series. This method is
applicable to linear as well as non-linear trends.
Disadvantages
1. The trend obtained by moving averages generally is neither a straight line nor a standard
curve, for this reason, the trend cannot be extended for forecasting future values. Trend
values are not available for some periods at the start and some values at the end of the time
series. This method is not applicable to short time series.
2. Some of the data used to compute the moving average might be old or stal
3) Naïve: Estimating technique in which the last period's actuals are used as this period's
forecast, without adjusting them or attempting to establish causal factors. It is used only for
comparison with the forecasts generated by the better (sophisticated) techniques.
Advantages:
1. You’ll gain valuable insight
2. Efficiency and accuracy have also led to the widespread proliferation
3. It can decrease costs
Disadvantages:
1. It not Considerate if there any emergency conditions
2. Forecasts are never 100% accurate
3. It can be time-consuming and resource-intensive
4) Correlation: is used to describe the linear relationship between two continuous variables
(e.g., height and weight). In general, correlation tends to be used when there is no identified
response variable. It measures the strength (qualitatively) and direction of the linear
relationship between two or more variables.
Advantages:
1. can show the strength of the relationship between two variables
2. Study behavior that you cannot study
3. Gain quantitative data that can be easily analyzed
Disadvantages:
1. Cannot show cause and effect (what variables control what)
2. No control of the third variable that might affect the correlation
Scenario 1:
Naïve 10500
(10500+11000+12000)/3
Y=-785.71*2018+1.595.285.71=
production volume
total quantity of inventory (*1000)
Year (*1000)
1 100 20
2 120 27
3 150 36
4 200 50
250 65.2267
Scenario 2:
=CORREL
Linear Regression (Production = 250) 65.2267
Y=0.2974*250-9.1233=
Scenario 3:
n=20000
M=5
σ =0.1
LO4
Identify different types of charts / tables available to communicate different categories of variables.
1. Summary table: The summary table is a visualization which in table form, summarizes statistical data
information. In other visualizations, all visualizations can only be set up to display data constrained by
one or more markings (details visualizations). It is also possible to restrict the overview tables to one or
more filters.
2. Frequency Distribution table: A frequency distribution is a representation that shows the number of
observations within a given interval, either in a graphical or tabular format. The magnitude of the
interval depends on the data being evaluated and the analyst objectives. There must be mutually
exclusive and exhaustive intervals. In a mathematical sense, frequency distributions are usually used. In
general, the distribution of frequency may be combined with the mapping of regular distribution.
3. Contingency table: A data table in which data is tabulated by row entries according to one variable
and tabulated by column entries according to another variable, and which is used in particular in the
analysis of the association between variables.
4. Ordered array: In ascending or descending order, the elements of the ordered array are arranged.
Generally speaking an ordered array may have duplicate components.
After organizing data, you must visualize them so here is some ways in visualizing data:
1) Pie chart: A circular mathematical graph is a pie map (or a circle chart), which is broken into slices to
show numerical proportions. The arc length of each slice (and thus its central angle and area) in a pie
chart is equal to the sum it represents.
2) Stem and leaf: A table used for viewing data is a stem and leaves. On the left is the 'stem' that
indicates the first digit or digits. On the right is the ‘leaf’, which indicates the last digit.
3) Bar chart: A bar chart or bar graph is a chart or graph that provides rectangular bars with categorical
data with heights or lengths proportional to the values they represent. It is possible to plot the bars
vertically or horizontally. Comparisons of various groups are seen in a bar graph.
4) Scatter plot: A scatter plot is a series of points on a horizontal and vertical axis. In statistics, scatter
plots are important since they will display the degree of association, if any, between the values of
quantities or phenomena observed.
5) Histogram: Description of Quality Glossary: Histogram. A spectrum of frequencies indicates how often
each different value in a data set happens. The most widely used graph to illustrate frequency
distributions is a histogram. It looks pretty much like a bar map, but the distinctions between them are
major.
Use the appropriate tables/charts in order to present and communicate the following variables:
Survey 1:
-One variable: (major field of study)
For one categorical variable summary table is the simplest and easiest way to organize it:
The best way to organize two categorical variables is the contingency table:
H 23
G 35
E 53
N 60
O 63
M 65
A 70
L 70
F 78
B 80
J 80
K 85
I 90
C 95
D 98
Performance
Stem Leaf
2 3
3 5
4
5 3
6 0 3 5
7 0 0 8
8 0 0 5
9 0 5 8
With two numerical variables, we do a normal table to organize data because it is the easiest way to
read data after organizing it.
A 23 350
B 35 500
C 53 600
D 60 500
E 63 650
F 65 1200
G 70 1000
H 70 1200
I 78 1000
J 80 1200
K 80 1400
L 85 1350
M 90 1500
N 95 1400
O 98 1200
Salary Vs Performance
1600
1400 f(x) = 15.85 x − 100.91
1200 R² = 0.76
1000
Salary
800 Salary in £
600 Linear (Salary in £ )
400
200
0
10 20 30 40 50 60 70 80 90 100 110
Prformance