ITE Elective Lecture Materials Data Colletion and Descriptive Statistics
ITE Elective Lecture Materials Data Colletion and Descriptive Statistics
is the process of collecting and evaluating information or data from multiple sources to find answers to
research problems, answer questions, evaluate outcomes, and forecast trends and probabilities
What is Data?
• Data is a set of values of qualitative or quantitative variables.
• It's the raw information from which statistics are derived, and it's the basis for all scientific
conclusions.
• Not all data is created equal.
Observational Tools • Include video and audio recording devices for capturing behaviors or events • Software
for tracking online behavior and conducting structured observations (checklists or rating scales) • Can be
used in a classroom or online setting
Summary
• Data is crucial for decision-making and understanding the world
• Data collection is vital in research, business, and decision-making
• Primary data is collected directly from the source
• Secondary data is collected from existing data
• Questionnaires and software aid in the data collection process
• Privacy, consent, confidentiality, and accuracy are essential ethical considerations
• Follow best practices to ensure high-quality, reliable data
• Effective data collection is a cornerstone of informed decision-making
Descriptive Statistics
Descriptive statistics is a branch of statistics that focuses on summarizing, organizing, and presenting data in
a meaningful way to provide insights into the characteristics of a dataset. It is the initial and fundamental step
in data analysis, enabling researchers, analysts, and decision-makers to understand and interpret the
information within the data.
Measures of Dispersion:
• Range: The difference between the maximum and minimum values in a dataset.
• Variance: A measure of how much individual data points deviate from the mean, quantifying the
spread of the data.
• Standard Deviation: The square root of the variance, providing the average distance between each
data point and the mean.
• Interquartile Range (IQR): The range between the first quartile (25th percentile) and the third quartile
(75th percentile), showing the spread of the middle 50% of the data.
Frequency Distribution:
• A table or chart displaying the frequency or count of each unique value in a dataset, helping to identify
patterns and common values.
Graphical Representation:
• Graphs and charts like histograms, bar charts, box plots, and scatter plots are used to visually
summarize and display data.
Percentiles:
• Percentiles divide a dataset into 100 equal parts. For example, the median is the 50th percentile, and
quartiles (25th and 75th percentiles) divide the data into four parts.
Summary Statistics:
• Summary tables provide an overview of key statistics like minimum, maximum, mean, median, and
standard deviation.
Observation:
• A single unit or measurement in a dataset. Each row in a dataset represents an observation, while
each column represents a variable.
Variation:
• Refers to the differences or variability in data values or observations within a dataset, which helps
identify patterns and uncertainty.
Random Variables:
• Variables whose values are determined by chance. They can be:
o Discrete: Take distinct values (e.g., the number of coin flips).
o Continuous: Can take any value within a range (e.g., height).
Uncertain Variables:
• A broader term encompassing both random variables and other variables affected by uncertainty due
to incomplete information or variability.