Unit 1 - Statistics - English
Unit 1 - Statistics - English
Data?
Data, the raw material of information,
plays a pivotal role in various fields, from
research and business to government
policymaking. In the modern world, data forms
the backbone of research, decision-making,
and innovation. Understanding the nature of
data and its origins is crucial for researchers,
analysts, and practitioners across various
disciplines. This explores two fundamental
categories of data—Primary and Secondary
data—elaborating on their distinctions and
discussing methods of data collection relevant
to each. This understanding not only sharpens
analytical approaches but also ensures the
validity and reliability of research outcomes.
Primary Data
Primary data refers to data that is collected directly from the source specifically for a
particular research purpose or study. It is original and directly obtained from the source
through experiments, surveys, interviews, or observations. This data is tailored to meet
the specific needs of a study and has not been subjected to prior analysis or
interpretation.
2
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Secondary Data
Secondary data, on the other hand, refers to data that has already been collected,
analyzed, and possibly interpreted by someone else for a purpose other than the
current research. It is often found in reports, books, online databases, and previous
research studies. Researchers access this data to support their analysis, saving time
and resources compared to gathering original data.
Differences
1. Source of Data:
Primary data is gathered directly by the researcher
from the original source, while secondary data is
sourced from pre-existing records or research
conducted by others.
2. Purpose:
Primary data is collected with a specific research question in mind, tailored to
the needs of the study. Secondary data is typically collected for a different
purpose, and researchers adapt it to suit their own needs.
3
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
5. Data Availability:
Primary data requires the researcher to initiate and execute data collection,
often constrained by logistical or ethical limitations. Secondary data, however,
is already available, although its availability may depend on access
restrictions such as subscription costs or confidentiality issues.
Similarities
1. Utility in Research:
Both primary and secondary data are fundamental to the research process,
providing the necessary inputs for analysis, hypothesis testing, and
conclusions.
2. Need for Validation:
Regardless of whether the data is primary or secondary, researchers must
critically assess its validity, reliability, and appropriateness for their research
context.
3. Influence on Results:
The quality of both primary and secondary data directly influences the
outcome of the research. Poor data collection or inappropriate use of
secondary sources can skew results and lead to inaccurate conclusions.
4
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Qualitative Data:
→ Descriptive data that cannot be measured numerically.
Represents qualities or characteristics.
Examples: Color, gender, brand, taste, smell, opinion.
Subcategories:
Nominal Data: No inherent order or rank.
Examples: Gender (male, female), eye color (blue, brown, green), religion.
Ordinal Data: Has a natural order or ranking, but the differences between values are
not equal.
Examples: Educational level (elementary, high school, college), satisfaction rating
(very unsatisfied, unsatisfied, neutral, satisfied, very satisfied).
Qualitative Data: The table includes non-numerical, categorical data such as "Favorite
Subject," "Preferred Learning Style," "Extracurricular Activity," and "Personality Type."
Variables: These qualitative variables describe characteristics and preferences of students
without numerical values
Quantitative Data:
→ Numerical data that can be measured or counted.
Represents quantities or amounts.
Examples: Age, height, weight, income, temperature, sales figures.
Subcategories:
Discrete Data: Can only take on specific, separate values.
Examples: Number of children, number of cars owned, number of votes.
Continuous Data: Can take on any value within a range, including decimals.
Examples: Height, weight, time, temperature, distance.
5
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
6
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
2. Interviews:
Interviews involve direct interaction
between the researcher and respondents,
either in person or virtually. They can be
structured (with predefined questions) or
unstructured (more conversational and
flexible).
Applications: Qualitative research,
exploring complex topics such as
motivations or beliefs, case studies.
3. Observations:
In observational studies, researchers gather data by watching subjects in a
natural or controlled environment. Observations can be participant-based,
where the researcher becomes part of the group being studied, or non-
participant, where the researcher observes without involvement.
Applications: Behavioral studies, sociological research, ethnography.
2. Online Databases:
Researchers often access secondary data from online databases such as
JSTOR, PubMed, or government census sites. These databases provide a
wealth of pre-analyzed information, ranging from academic studies to
statistical datasets.
Applications: Academic research, business trend analysis, health research.
3. Published Statistical Data:
Statistical data published by government agencies, international
7
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
8
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
1. Problem Definition
Objective: Identify the problem or phenomenon to be investigated. This involves
clearly defining the research question, hypothesis, or goal.
Steps:
Clarify the purpose of the investigation.
Determine the scope and nature of the problem.
Identify variables or factors to be studied.
Formulate a hypothesis or define key research questions.
3. Data Collection
Objective: Gather data based on the research design.
9
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Steps:
Implement the chosen methods of data collection.
Ensure data is collected systematically and accurately.
Use tools like questionnaires, interviews, sensors, or administrative records.
Record and document data carefully to maintain integrity.
5. Data Analysis
Objective: Apply statistical techniques to extract insights, patterns, and relationships
from the data.
Steps:
Use descriptive statistics (e.g., mean, median, mode, standard deviation) to
summarize the data.
Apply inferential statistics (e.g., regression, hypothesis testing, ANOVA) to
test hypotheses and make predictions.
Conduct exploratory data analysis to find trends and correlations.
6. Interpretation of Results
Objective: Interpret the findings in relation to the original research questions or
hypotheses.
Steps:
Compare results with the hypothesis or expectations.
Assess the reliability and validity of the conclusions.
Consider the limitations and possible sources of bias or error.
Relate the findings to real-world implications or applications.
These stages form a cyclical process, where findings from one enquiry may
lead to new questions or further research.
10
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
categorized based on the nature of the data (qualitative or quantitative), the context
in which the data is collected, and the tools used. Below are the major types of data
collection methods:
2. Interviews
Description: A conversational method to collect in-depth qualitative data.
Interviews can be structured (with set questions), semi-structured (with a
flexible guide), or unstructured (open discussions).
Example: One-on-one interviews with employees, expert interviews for
research.
Types:
o Structured: Pre-determined questions, consistent across interviews.
o Unstructured: Open-ended, exploratory conversation.
3. Observations
Description: The researcher observes subjects in their natural environment,
collecting behavioral data without direct interaction. This method is often used
for qualitative data but can also be used for quantitative measures (e.g.,
counting occurrences of behavior).
Example: Observing customer behavior in a store, watching wildlife in their
habitat.
Types:
o Participant Observation: The researcher interacts with the group
while observing.
o Non-participant Observation: The researcher observes without
engaging.
4. Experiments
Description: Controlled testing or manipulation of variables to observe
effects, often used to establish cause-and-effect relationships. This is a
primary quantitative method, used in both natural and social sciences.
Example: A/B testing in marketing, lab experiments to test a drug’s
effectiveness.
Types:
o Laboratory Experiments: Conducted in a controlled environment.
o Field Experiments: Conducted in a real-world setting.
5. Focus Groups
Description: A small group discussion led by a moderator, aimed at gaining
in-depth insights and opinions. It is primarily used for qualitative data
collection and is popular in market research.
Example: A focus group discussing a new product design.
Types:
11
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Coding of Data
Coding of data refers to the process of assigning labels, symbols, numbers, or
other identifiers to raw data to simplify analysis, improve organization, and allow for
easier interpretation
12
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Arrangement of data:
Example:
A teacher has collected raw data on the scores of 20 students in a math test. The
raw scores are:
1. Tabulation of Data:
The first step is to organize the raw data into a simple table.
2. Sorting of Data:
Sorting the scores in ascending order:
Sorted Data:
Copy code
45, 48, 50, 52, 55, 56, 60, 61, 63, 66, 67, 68, 70, 72, 73, 77, 82, 84, 89, 90
This makes it easier to see the range of scores and identify the highest and lowest
scores.
3. Frequency Distribution:
Next, group the scores into class intervals and count how many students fall into
each interval.
13
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
70-79 4
80-89 2
90-99 1
This frequency distribution helps in identifying how the scores are spread across
different intervals.
This classification gives insight into the overall performance of the group.
14
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Frequency Distribution
A frequency distribution is a table or a graphical representation that shows
how often each value or range of values occurs in a dataset. It provides a simple
way to summarize and analyze large datasets by organizing data into categories
(often called classes) and then displaying the number of occurrences (frequency) of
each category.
Frequency distribution is a key concept in statistics because it allows us to
observe patterns, trends, or distributions in data, making it easier to interpret and
visualize.
Grade Frequency
A 5
B 8
C 6
D 2
F 1
15
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
o Example: For the interval "40-49", the lower limit is 40, and the upper
limit is 49.
4. Cumulative Frequency:
o The running total of frequencies as you move from the first interval to
the last. It helps to determine how many data points fall below a
particular value.
o Example: If you want to know how many students scored below 60,
you sum the frequencies for all intervals below 60.
Example of a Grouped Frequency Distribution with Cumulative Frequency:
The relative frequency helps in understanding the percentage of the total population
that falls into each category.
1. Histogram:
o A bar graph where each bar represents a class interval, and the height
of the bar represents the frequency of the interval. Useful for visualizing
grouped frequency distributions.
o Example: A histogram for the score distribution where each bar
represents a range of scores and its height represents how many
students fall into that range.
2. Frequency Polygon:
o A line graph that is created by plotting the frequency of each class
interval and connecting the points with straight lines.
16
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Tabulation of data
Tabulation of data is the process of organizing raw data into a structured,
systematic, and summarized format, usually in the form of rows and columns in a
table. This method helps in simplifying complex datasets, making it easier to
analyze, interpret, and compare different aspects of the data.
3. Types of Tabulation:
Simple Tabulation: Presents data for one characteristic or variable (e.g.,
number of students who passed an exam).
Complex Tabulation (Cross-tabulation): Presents data for two or more
variables simultaneously (e.g., number of students who passed the exam by
gender).
Example:
Age Number of
Group People
18-25 50
26-35 70
36-45 40
17
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Age Number of
Group People
46-55 30
56+ 20
This table organizes the data by age group and number of people, summarizing and
categorizing raw data for easy understanding.
Parts of Table:
A table consists of several essential parts that help organize and present data
in a clear and structured way. The main parts of a table are:
1. Title
Definition: A brief, descriptive heading that indicates the content and purpose of the
table.
Function: Provides context for the data presented.
Example: "Distribution of Student Test Scores by Subject."
4. Body
Definition: The main part of the table, where the actual data is presented.
Function: Contains the data values organized into rows and columns for easy
interpretation.
Example: The numerical values or categories corresponding to each row and
column, such as "85" or "Male/Female."
6. Headnote
Definition: A brief note or explanation just below the title but above the table,
offering clarification on the scope or specifics of the table.
Function: Adds extra details about how the data should be read or interpreted.
18
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
7. Rules (Lines)
Definition: Horizontal and vertical lines that separate rows and columns in a table.
Function: Improve readability and structure by clearly delineating sections of the
table.
Example: The lines separating different student names from their scores.
1. Bar Chart
Description: Uses rectangular bars to represent the values of different
categories.
Best For: Comparing quantities across categories (e.g., sales figures by
month).
Example: Sales revenue for different product categories.
19
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
0 2 4 6
2. Column Chart
Description: Similar to a bar chart but with vertical bars.
Best For: Comparing data points across categories or over time.
Example: Monthly temperature changes throughout the year.
14
12
10 5
8 2 2
3
6 2.8
2.4
4.4 1.8
4
2 4.3 3.5 4.5
2.5
0
1 2 3 4
3. Pie Chart
Description: Divides a circle into segments to show the proportion of each
category relative to the whole.
Best For: Showing percentage or proportional data.
Example: Market share of different companies in an industry.
4th Qtr
3rd Qtr
1st Qtr
2nd Qtr
4. Line Chart
Description: Uses points connected by lines to show trends over time.
Best For: Tracking changes over time or continuous data.
Example: Stock price movements over several years.
20
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
4
Series1
3
Series2
2 Series3
0
1 2 3 4
5. Scatter Plot
Description: Displays points based on two variables to show relationships or
correlations.
Best For: Identifying relationships or correlations between variables.
Example: Relationship between advertising spend and sales revenue.
6. Histogram
Description: Shows the distribution of numerical data by grouping data points
into bins or intervals.
Best For: Understanding the distribution of a dataset.
Example: Distribution of test scores in a class.
21
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
8. Area Chart
Description: Similar to a line chart, but the area under the line is filled in.
Best For: Showing cumulative totals over time.
Example: Total sales volume over several quarters.
50
45
40
35
30
25
20
15
10
5
0
01-05-2024 01-06-2024 01-07-2024 01-08-2024 01-09-2024
22
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
01-05-2024
40
30
20
01-09-2024 01-06-2024
10
0
01-08-2024 01-07-2024
1. Histogram
23
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Histogram Diagram:
In a histogram, each bar touches the next, indicating the continuous nature of the
data. The bars' width represents the class interval, while the height corresponds to
the frequency of the class interval.
24
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
2. Frequency Polygon
25
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
3. Frequency Curve
Ogive
An ogive is a type of graph used in statistics to represent the cumulative
frequency or cumulative relative frequency of a dataset. It visually shows how the
data accumulates over intervals, helping to understand the distribution of the dataset
up to a certain point.
26
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
Characteristics of an Ogive:
The curve is always non-decreasing because the cumulative frequency either
increases or stays the same as you move along the x-axis.
The shape of the ogive provides a good visual representation of how the data
accumulates, showing trends like how quickly or slowly data values rise.
It can help in identifying percentiles, medians, and quartiles in a dataset.
Example:
If we have a dataset of student test scores grouped into intervals, the ogive would
help visualize the number of students scoring below a certain threshold, such as:
How many students scored below 50?
What percentage of students scored below 70?
27
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
10. What is one key similarity between Primary and Secondary Data?
a) Both are collected for the same research purposes.
b) Both require validation to ensure accuracy and reliability.
c) Both are cost-effective and easy to obtain.
d) Both are collected through the same methods.
28
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
15. Document analysis is a method most commonly used for which type
of data?
a) Primary data
b) Secondary data
c) Experimental data
d) Observational data
17. What is the main objective of the "Planning the Study" stage in
statistical enquiry?
a) To interpret the data
b) To collect the data
c) To design a strategy for data collection and analysis
d) To present the data graphically
18. Which of the following methods is commonly used during the "Data
Collection" stage?
a) Creating histograms
b) Applying inferential statistics
c) Conducting surveys or experiments
d) Writing research reports
19. In which stage of statistical enquiry would you most likely use bar
charts or pie charts?
29
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
a) Problem Definition
b) Data Organization and Presentation
c) Data Analysis
d) Reporting and Conclusions
20. What is the main purpose of the "Data Analysis" stage in statistical
enquiry?
a) To gather data
b) To apply statistical techniques and extract insights
c) To present findings in a report
d) To define the research problem
22. At which stage would you evaluate the limitations and possible
biases of the study?
a) Problem Definition
b) Data Collection
c) Interpretation of Results
d) Data Organization
Answer Keys:
30
gyanSHiLA – Siddharth Sir
Blue Book of Statistics → UNIT 1→ Statistics
31
gyanSHiLA – Siddharth Sir