2a. Sources of Data
2a. Sources of Data
Main topics
Data categorisations:
Primary vs secondary
Internal vs external
Numerical vs categorical
Discrete vs continuous
Nominal vs ordinal
Primary
Data obtained from first hand sources; origin of data
Surveys, observation, experimentation, interviews etc.
Original research on the specific topic
Primary vs secondary
Secondary
Data already obtained by third parties
Libraries, newspapers, journals, magazines, governments, banks, internet etc.
Research conducted on the specific topic or another topic
Considerations: accuracy, reliable, suitability etc.
Internal vs external data
Internal sources:
Accounting system(records): invoices, timesheets, journal entries, budgets etc.
Department wise data: Payroll, marketing, production etc.
Strategic planning system
Internal vs external data
External sources:
Libraries, newspapers, journals, magazines, governments, banks, internet, customers etc.
Numerical vs categorical
Numerical:
Expressed in numbers
It can be discrete or continuous
Examples: Marks, attendance, height
Categorical:
Descriptive data rather than numeric
It can be nominal or ordinal
Examples: Eye colour, educational qualification, feedbacks
Discrete vs continuous
Discrete:
non-continuous data
include only countable data
ex.: Class attendance, goals scored by a team
Continuous:
unbroken data
include uncountable data
measured using appropriate measures
ex.: Distance, height, weight
Nominal or ordinal
Nominal:
Cannot be measured
No set order or scale
Examples: Name, eye colour
Ordinal:
Has a set order or scale
Examples: Feedbacks measured through ratings
Impact of general economic environment:
Discussion points:
GDP
Economic trends
Inflation
Interest rates
Exchange rates
Tax levels
Government policy
Economic cycles: boom, bust or recession
Big data analysis
Data analytics:
The process of collecting and examining data in order to extract meaningful business
insights, which can be used to inform decision making and improve performance.
5 Vs of big data:
Volume: large quantity of data
Velocity: speed of generating data
Variety: different types of data
Veracity: reliability of data
Value: cost-benefit analysis
Variety
Structured:
Well organized
Example: Government statistics, mark list of students in an excel
Unstructured:
Unorganized data
Example: Social media
Semi-structured:
Mix of structured & unstructured; not totally unorganized
Example: Sorted mailbox
Uses of big data
Direct access to customers
Cheaper marketing
Quicker decision making
Better costing and pricing
New product design & features
Big data analysis
Pros:
Faster
Cheaper
Direct access to customers
Cons:
Unreliable - Veracity
Lack of technical know-how
Difficulty in understanding
Sample vs population
What is sampling?
When do we use sampling?
Key terms:
Population
Sample
Census
Sampling frame
Sample vs population
If all members of a population are examined, the survey is called a census.
If it is not possible to survey the entire population, a sample is selected.
The results from the sample are used to estimate the results of the whole.
A sampling frame is a numbered list of all items in a population.
Sample vs population
A sampling frame should have the following characteristics:
Completeness
Accuracy
Adequacy
Current
Non-duplication
Convenience
Probability sampling vs non-probability sampling
Probability sampling: sampling method in which there is a known chance of each member of
the population being selected as sample
Random
Stratified random
Systematic
Multistage
Cluster
Probability sampling vs non-probability sampling
Non-probability sampling: sampling method in which the chance of each member of the
population appearing in the sample is not known
Quota sampling
Random sampling
Samples selected such that every item in the population has an equal chance of
being selected
Sampling frame is required
Sampling method which selects every Nth item after a random start
Sampling frame is required