Source of Data
Source of Data
By Leila Ospanova
Primary data is preferable to secondary data since data collected for a specific purpose is likely to be better than
data acquired for some other purpose.
Which of the following are primary data
and which are secondary data.
(a) Information from clock cards when used for making up wages.
- This is primary data, since the data is collected to make up the wages.
(b) Data from a government publication on the toy industry used by a new toy shop to determine which items to stock.
- This is secondary data; government statisticians collate data from various sources and the data is used in a variety of ways.
(c) Expense claim forms submitted by sales representatives used to estimate the car mileage they have travelled.
- This is secondary data since the expense claim data is collected for a different reason initially.
(d) Results of an election opinion poll published in a newspaper
- This is primary data since the data was collected specifically for the purpose. If you said secondary data you were
probably thinking that the results were being used to predict the result of the election; this is different from the
reason why it was collected.
Discrete and Continuous data
Discrete data is non-continuous data. Discrete data can only take certain values for example the number
of students taking a course (there cannot be half a student). Discrete data is counted.
Continuous data Continuous data can take on any value (within a range) for example time or
distance. Continuous data is measured.
Internal sources of information
Accounting system. The accounting system will collect data from source documents such as invoices,
timesheets and journal entries.
Payroll system. The payroll system may provide information concerning detailed labour costs.
Strategic planning system. The strategic planning system may provide information relating to the
organisation’s objectives and targets.
Benefits Limitations
Readily available data Data may need to be further analysed to be of
use to management accountants
Data can easily be sorted and analysed
Reports can easily be produced when required
Data relates to the organization concerned
External sources of information
government sources
business contacts – customers and suppliers
trade associations and trade journals
the financial and business presseconomic
4 General and otherenvironment
media
the internet.
Benefits Limitations
Wide expanse of external sources of information Data may not be accurate
Easily accessible especially using the internet Finding relevant information can be time
consuming
More general information available
Can source specific information needs
General economic environment
The general state of the economy will impact on businesses – is the economy in a boom or bust period?
Businesses will need to consider the general economic state and how it is forecast to change when
forecasting productivity and pricing strategies. (inflation, interest rates, exchange rates etc.)
Sampling techniques - terminology
The term population is used to mean all the items under consideration in a particular enquiry.
A sample is a group of items drawn from that population. The population may consist of items such as
metal bars, invoices, packets of tea, etc; it need not be people.
Sampling techniques
Why sampling is necessary:
1. The whole population may not be known.
2. Even if the population is known the process of testing every item can be extremely costly in time and
money, for example, gaining information about the popularity of TV programs by interviewing every
viewer.
3. The items being tested may be completely destroyed in the process, for example in order to check the
lifetime of an electric light bulb it is necessary to leave the bulb burning until it breaks and is of no further
use.
Two rules are observed:
1 The sample must be of a certain size. In general terms the larger the sample the more reliable the
results will be.
2 The sample must chosen in such a way that it is representative of the population.
Random Sampling
A simple random sample is defined as a sample taken in such a way that every member of the population has
an equal chance of being selected.
If a 10% sample of a population of 200 items is the required, then the sample size needs to be 20 items.
Numbers from a table of random numbers can be taken and the corresponding items are extracted from the
population to form the sample e.g. in selecting a sample of invoices for an audit. Since the invoices are
already numbered, this method can be applied with the minimum of difficulty. This method has obvious
limitations when either the population is extremely large or, in fact, not known. The following methods are
more applicable in these cases.
Systematic Sampling
Systematic sampling is a technique for creating a random sample in which each piece of data is chosen at a
fixed interval for inclusion in the sample.
If the population is known to contain 50,000 items and a sample of size 500 is required, then 1 in every 100
items is selected. The first item is determined by choosing randomly a number between 1 and 100 e.g. 67,
then the second item will be the 167th, the third will be the 267th... up to the 49,967th item.
There is danger of bias if the population has a repetitive structure. For example, if a street has five types of
house arranged in the order, A B C D E A B C D E... etc, an interviewer visiting every fifth home would only
visit one type of house.
Stratified Sampling
A stratified sample is made up of different 'layers' or ‘groups’ of the population. The sample size for each
layer is proportional to the size of the 'layer' and is known as sampling with probability proportional to size.
Ideally the sample would be chosen at random, and would be large enough so as to be representative of the population.
Unfortunately both of these aspects introduce costs which are often unacceptably high.
Warm up:
The essence of systematic sampling is that:
A each element of the population has an equal chance of being chosen
B members of various strata are selected by the interviewers up to predetermined limits
C every nth member of the population is selected
D every element of one definable subsection of the population is selected
Warm up:
A sample is taken by dividing the population into different age bands and then sampling randomly from the
bands, in proportion to their size. What is such a sample called?
A Simple random
B Stratified random
C Quota
D Cluster
Warm up:
Large company wants to survey the opinions of employees using cluster sampling. Which of the
following methods should be used?
A Staff are randomly selected from each department in proportion to departmental size
B Staff are selected from the list of employees, taking every nth name
C A sample, which is as representative as possible of the composition of the staff in terms of gender, age and
department, is taken by stopping appropriate staff in the corridors and canteen
D One department is selected and all the staff in that department are surveyed
Example of use:
A supermarket is able to take data from a your past buying patterns, its own inventory information, your
mobile phone location, social media and weather information to send you, for example, a voucher for
barbeque food. It will have used that data to determine if you have bought such items before which would
indicate that you own a barbeque, if the weather is nice, if you are within 3 miles of one of their stores and
that they have the barbeque food in stock.
3 (4) V’s in Big Data
Volume: organisations now hold huge volumes of data. For example:
– A supermarket will have a data store of all purchases made, when and where they were
made, how they were paid for and the use of coupons via loyalty cards swiped at the checkout.
– An online retailer will have a data store of every product looked at and bought and every page
visited.
– Mobile phone providers will have a data store of texts, voice mails, calls made, browsing
habits and location.
– Social media companies, such as Facebook, will have a data store of all the postings an
individual makes (and where they were made), photos posted and contacts.
3 (4) V’s in Big Data
Variety: Big Data can include much more than simply financial information & can include other
organisational data which is operational in nature as well as other internal and external
information. This data can be both structured and unstructured in nature:
– Structured data – for example, a bank will hold a record of all receipts and payments (date,
amount and source) for a customer.
– Unstructured data – can make up 80% of business data but is more difficult to store and
analyse.
3 (4) V’s in Big Data
Velocity: The data must be turned into useful information quickly enough to be of use in
decision making and performance management (in real time if possible). The sheer volume and
variety of data makes this task difficult and sophisticated methods are required to process the
huge volumes of non-uniform data quickly.
A prospective purchaser of the business notices that the mean is higher than the takings in four of the 6
weeks. Calculate the median for him.
Thank you for attention!!!