0% found this document useful (0 votes)
2 views

DAA_Chapter 03

The document outlines the objectives and contents of a test plan focused on data analytics, detailing four main categories: descriptive, diagnostic, predictive, and prescriptive analytics. It explains various methods within these categories, such as summary statistics and clustering, and emphasizes the importance of data profiling and analysis techniques for understanding past events and predicting future outcomes. Additionally, it discusses practical applications of these analytics in fields like auditing and management accounting.

Uploaded by

Nhật Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DAA_Chapter 03

The document outlines the objectives and contents of a test plan focused on data analytics, detailing four main categories: descriptive, diagnostic, predictive, and prescriptive analytics. It explains various methods within these categories, such as summary statistics and clustering, and emphasizes the importance of data profiling and analysis techniques for understanding past events and predicting future outcomes. Additionally, it discusses practical applications of these analytics in fields like auditing and management accounting.

Uploaded by

Nhật Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

21/12/2024

CHAPTER 03
Performing the Test Plan and
Analyzing Results

1
Prepared by Nguyen Huu [email protected]

Objectives

• Understand four categories of Data Analytics.


• Describe some descriptive analytics approaches, including summary statistics and
data reduction.
• Explain the diagnostic approach to Data Analytics, including profiling and
clustering.
• Understand predictive analytics, including regression and classification.
• Describe the use of prescriptive analytics, including machine learning and artificial
intelligence.

2
Prepared by Nguyen Huu [email protected]

Contents

• Four main categories of data analytics.


• Descriptive analytics
• Diagnostics analytics
• Predictive analytics
• Prescriptive analytics

3
Prepared by Nguyen Huu [email protected]

1
21/12/2024

Four main categories of data analytics.

• Descriptive analytics are procedures that summarize existing


data to determine what has happened in the past.
• Diagnostic analytics are procedures that explore the current
data to determine why something has happened the way it has,
typically comparing the data to a benchmark.
• Predictive analytics are procedures used to generate a model
that can be used to determine what is likely to happen in the
future.
• Prescriptive analytics are procedures that model data to enable
recommendations for what should be done in the future.

4
Prepared by Nguyen Huu [email protected]

Four main categories of data analytics.


Each stage takes additional effort but provides additional value.

Exhibit 3-1 Four Main Categories of Data Analytics


5
Prepared by Nguyen Huu [email protected]

Descriptive analytics
Descriptive analytics help summarize what has
happened in the past.
• A financial accountant would sum all the sales transactions within a
period to calculate the value for Sales Revenue that appears on the
income statement.
• An analyst would count the number of records in a data extract to
ensure the data are complete before running a more complex analysis.
• An auditor would filter data to limit the scope to transactions that
represent the highest risk. In all these cases, basic analysis provides
an understanding of what has happened in the past to help decision
makers achieve good results and correct poor results.

6
Prepared by Nguyen Huu [email protected]

2
21/12/2024

Descriptive analytics

Descriptive analytics examples:


• Summary statistics describe a set of data in terms of their location
(mean, median), range (standard deviation, minimum, maximum), shape
(quartile), and size (count).
• Data reduction or filtering is used to reduce the amount of
observations to focus on relevant items (that is, highest cost, highest
risk, largest impact, etc.). It does this by taking a large set of data
(perhaps the population) and reducing it to a smaller set that has the
vast majority of the critical information of the larger set.

7
Prepared by Nguyen Huu [email protected]

Descriptive analytics

Summary statistics
Statistic Excel formula Description
Sum SUM() The total value of all numerical values
• Summary statistics The center value; sum of all observations divided by the
Mean =AVERAGE()
describe the number of observations
The middle value that divides the top half of the data from the
location, spread, Median =MEDIAN()
bottom half

shape, and Minimum =MIN() The smallest value


Maximum =MAX() The largest value
dependence of a set Count =COUNT() The number of observations
of observations. Frequency =FREQUENCY() The number of observations in each of a series of numerical or
categorical buckets
Standard The variability or spread of the data from the mean; a larger
=STDEV()
deviation standard deviation means a wider spread away from the mean
The value that divides a quarter of the data from the rest;
Quartile =QUARTILE()
indicates skewness of the data
Correlation How closely two datasets are correlated or predictive of one
=CORREL()
coefficient another

Exhibit 3-3 Description of Summary Statistics


8
Prepared by Nguyen Huu [email protected]

Descriptive analytics

Summary statistics
Mean vs Median: When to use?

It’s best to use the


Mean to describe the
center of a dataset when
the distribution is mostly
symmetrical and there
are no outliers.

9
Prepared by Nguyen Huu [email protected]

3
21/12/2024

Descriptive analytics

Summary statistics
Mean vs Median: When to use?

When a distribution is
skewed, the Median does a
better job of describing the
center of the distribution

10
Prepared by Nguyen Huu [email protected]

10

Descriptive analytics

Summary statistics
Mean vs Median: When to use?

The Median also does a


better job of capturing the
central location of a
distribution when there are
outliers present in the data

11
Prepared by Nguyen Huu [email protected]

11

Descriptive analytics

Summary statistics

Quartile

12
Prepared by Nguyen Huu [email protected]

12

4
21/12/2024

Descriptive analytics

Summary statistics

Quartile

13
Prepared by Nguyen Huu [email protected]

13

Descriptive analytics

Data reduction involves the following steps:


• Identify the attribute you would
like to reduce or focus on.
• Filter the results.
• Interpret the results.
• Follow up on results.

Exhibit 3-4 Use Filters to Reduce Data

14
Prepared by Nguyen Huu [email protected]

14

Descriptive analytics

Data reduction
Fuzzy matching locates approximate matches
• Useful for
identifying
relationships in
imperfect data.

Exhibit 3-5 A Fuzzy Matching Shows a Likely Match of an


Employees and Vendor
15
Prepared by Nguyen Huu [email protected]

15

5
21/12/2024

Related party transactions involve people who have close ties to


an organization, such as board members. Assume an
accounting manager decides that fuzzy matching would be a
Fuzzy matching useful technique to find undisclosed related party transactions.
Using the fields below, identify the pairings between the related
party table and the vendor and customer tables that could
independently identify a fuzzy match.

Fuzzy Fuzzy
Vendor RelatedParty Customer
Match? Match?
VendorState RelatedState CustomerState
VendorName RelatedName CustomerName
VendorZip RelatedZip CustomerZip
VendorAddDate RelatedAddDate CustomerAddDate
VendorAddress RelatedAddress CustomerAddress
VendorType RelatedPosition CustomerType
VendorCity RelatedCity CustomerCity
16
Prepared by Nguyen Huu [email protected]

16

Diagnostic analytics

Diagnostic analytics provide insight into why things happened or


how individual data values relate to the general population.

Two common methods of diagnostic analytics include Profiling and


Clustering.

More diagnostic analytics include Similarity matching and Co-


occurrence grouping

17
Prepared by Nguyen Huu [email protected]

17

Diagnostic analytics

Diagnostic analytics methods:


• Profiling identifies the “typical” behavior of an individual, group,
or population by compiling summary statistics about the data
(including mean, standard deviations, etc.) and comparing
individuals to the population.
• Clustering helps identify groups (or clusters) of individuals (such
as customers) that share common underlying characteristics—in
other words, identifying groups of similar data elements and the
underlying drivers of those groups.

18
Prepared by Nguyen Huu [email protected]

18

6
21/12/2024

Diagnostic analytics

Diagnostic analytics methods:


• Similarity matching is a grouping technique used to identify
similar individuals based on data known about them.
• Co-occurrence grouping discovers associations between
individuals based on common events, such as transactions they
are involved in.

19
Prepared by Nguyen Huu [email protected]

19

Diagnostic analytics

Profiling
• Profiling involves gaining an understanding of the typical behavior
of an individual, group, or population (or sample).
• Profiling can be used to develop complex models to predict
potential fraud.
• Profiling is done primarily using structured data—data that are
stored in a database or spreadsheet and are readily searchable.

20
Prepared by Nguyen Huu [email protected]

20

Diagnostic analytics

Data profiling typically involves the following steps:


1. Identify the objects or activity you want to profile: What data do you want to
evaluate? Sales transactions? Customer data? Credit limits?
2. Determine the types of profiling you want to perform: What is your goal? Do
you want to set a benchmark for minimum activity? Have you set a budget
that you wish to follow?
3. Set boundaries or thresholds for the activity: This is a benchmark that may be
manually set or automatically set.

21
Prepared by Nguyen Huu [email protected]

21

7
21/12/2024

Diagnostic analytics

Data profiling typically involves the following steps:


4. Interpret the results and monitor the activity and/or generate a list of
exceptions: Here is where dashboards come into play.
5. Follow up on exceptions: A plan should be taken to validate, correct, or
identify the causes of the abnormal behavior.

22
Prepared by Nguyen Huu [email protected]

22

Diagnostic analytics

Profiling
Z-Scores - Standardizing Data for Comparison
𝑥−𝜇
𝑧=
𝜎
Where:
• z = Z-score
• x = the value being evaluated
• μ = the mean
• σ = the standard deviation
23
Prepared by Nguyen Huu [email protected]

23

Diagnostic analytics

Profiling
Z-Scores shows spread and outliers.

The higher the Z-score


(farther away from the
mean), the more likely
a customer will have a
delayed shipment
(blue circle).

Exhibit 3-7 Z-Scores Provide an Example of Profiling That Helps Identify Outliers
24
Prepared by Nguyen Huu [email protected]

24

8
21/12/2024

Diagnostic analytics
Profiling
Box plots or whisker plot
• Displays the five-number summary of a set of data including the
minimum, first quartile, median, third quartile, and maximum
• The five-number summary divides the data into sections that each
contain approximately 25% of the data in that set

25
Prepared by Nguyen Huu [email protected]

25

Diagnostic analytics
Profiling
Box plots show spread and outliers

EXHIBIT 3-8 Box Plots Provide an Example of Profiling That Helps Identify Outliers
(in This Case, Categories with Unusually High Average Days to Ship)
26
Prepared by Nguyen Huu [email protected]

26

Diagnostic analytics
Data profiling in management accounting
Variance analysis
• Internal auditors analyze
travel and entertainment
expenses for violations of
internal controls.
• Managers use profiling to
compare variances from
target ranges.

Exhibit 3-9 Variance Analysis Is an Example of Data Profiling


27
Prepared by Nguyen Huu [email protected]

27

9
21/12/2024

Diagnostic analytics
Data profiling in auditing
Benford’s Law

• In the continuous audit,


an auditor may use
Benford’s Law to evaluate
the frequency distribution
of the first digits from a
large set of numerical
data.

Exhibit 3-10 Benford’s Law Applied to Large Numerical


Data Sets (including Employee Transactions) 28
Prepared by Nguyen Huu [email protected]

28

Diagnostic analytics
Benford’s Law is a diagnostic analytics that compares
actual to expected values.

29
Prepared by Nguyen Huu [email protected]

29

Diagnostic analytics

Benford’s Law can be used to identify the behavior, error, or


fraudulent scheme to the following accounts.

• Ficitious sales transactions in sales records


• Potential kickback schemes in purchases records
• Expense approval circumvention in travel and entertainment
expenses
• Duplicate checks in vendor payments
• Potential kickback schemes in sales returns

30
Prepared by Nguyen Huu [email protected]

30

10
21/12/2024

Diagnostic analytics
Cluster analysis shows natural groupings of data.

• Clustering is used to identify


groups of similar data
elements and the underlying
drivers of those groups.
• Clustering algorithms
calculate the minimum
distance of all observations
and groups those elements.
Exhibit 3-11 Clustering Is Used to Find Three Natural
Groupings of Vendors Based on Purchase Activity
31
Prepared by Nguyen Huu [email protected]

31

Diagnostic analytics

Clustering in auditing
• Internal auditors can use
clustering to identify
groups of transactions
that may indicate risk or
fraud in insurance or
other payments.

Exhibit 3-12 Cluster Analysis of Insurance Payments


32
Prepared by Nguyen Huu [email protected]

32

Diagnostic analytics

Hypothesis Testing for Differences in Groups


• A two-sample t-test for equal means is used to determine if the
difference between the means of two different populations is
significant or not.
• Begin by setting the Null Hypothesis H0 (no relationship) and
the Alternative Hypothesis HA (expected relationship).

33
Prepared by Nguyen Huu [email protected]

33

11
21/12/2024

Diagnostic analytics

Hypothesis Testing for Differences in Groups


• Significance level (α)
• A measure of the strength of the evidence before rejecting the null
hypothesis and concluding that the effect is statistically significant.
• The probability of rejecting the null hypothesis when it is true.
• The p-value: the probability that H0 is true. It helps decide whether
to reject the null hypothesis.
• How?: Compare p-value to α.

34
Prepared by Nguyen Huu [email protected]

34

Diagnostic analytics
Hypothesis Testing for Differences in Groups

EXHIBIT 3-13 T-Test Assessing for Significant Differences in Average Shipping Times across Categories 35
Prepared by Nguyen Huu [email protected]

35

Predictive analytics

Predictive analytics examples:


• Regression estimates or predicts the numerical value of a
dependent variable based on the slope and intersect of a line and
the value of an independent variable.
• Classification predicts a class or category for a new observation
based on the manual identification of classes from previous
observations.
• Link prediction predicts a relationship between two data items,
such as members of a social media platform.

36
Prepared by Nguyen Huu [email protected]

36

12
21/12/2024

Predictive analytics

Regression helps predict expected outcomes.

• Regression is a statistical method that


attempts to determine the strength and
character of the relationship between one
dependent variable (usually denoted by Y)
and a series of other variables (known as
independent variables).
• Regressions allow the accountant to develop
models to predict expected outcomes.
Exhibit 3-14 Regression
37
Prepared by Nguyen Huu [email protected]

37

Predictive analytics

Regression helps predict expected outcomes.

Regression analysis involves the following process:


1. Identify the variables that might predict an outcome.
2. Determine the functional form of the relationship (linear of
nonlinear?).
3. Identify the parameters of the model (β, P-value).
4. Evaluate the goodness of fit (R2)

38
Prepared by Nguyen Huu [email protected]

38

Predictive analytics

What are some examples of regression?


• In managerial accounting, regression may predict employee
turnover:
• Sales volume = f(advertising spending, and economic indicators such as
GDP or inflation)
• In auditing, regression may be used to determine the
appropriateness of allowance accounts:
• Allowance for loan losses amount = f(current aged loans, loan type,
customer loan history, collections success)
39
Prepared by Nguyen Huu [email protected]

39

13
21/12/2024

Predictive analytics
Classification predicts which class an individual
belongs to
• Identify the classes you wish to predict.
• Manually classify an existing set of records.
• Select a set of classification models.
• Divide your data into training and testing sets.
• Generate your model.
• Interpret the results and select the “best” model.

40
Prepared by Nguyen Huu [email protected]

40

Predictive analytics
Classification predicts which class an individual
belongs to

Test models
• New data
• Traing data
• Testing data • Real classification
• Select models results
• Interpret the results
• Select the “best” model
Generate
Using model
models

41
Prepared by Nguyen Huu [email protected]

41

Predictive analytics

Classification models
Logistic Regression

Support Vector Machine

Decision Trees
42
Prepared by Nguyen Huu [email protected]

42

14
21/12/2024

Predictive analytics

Predict whether to play sport?


Weather Temperature Humidity Wind Play Sports
Sunny Hot High Weak No
Sunny Hot High Strong No
Cloudy Hot High Weak Yes
Rainy Mild Normal Weak Yes
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No
Cloudy Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rainy Mild Normal Weak Yes
43
Prepared by Nguyen Huu [email protected]

43

Predictive analytics

Decision trees
Weather

Sunny Cloudy Rainy

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

44
Prepared by Nguyen Huu [email protected]

44

Predictive analytics

Classification begins with decision boundaries.


• Training data are existing data
that have been manually
evaluated and assigned a class.
• Test data are existing data used
to evaluate the model.
• Decision trees are used to divide
data into smaller groups.
• Decision boundaries mark the
split between one class and
another.
Exhibit 3-16 Example of Decision Trees and Decision
Boundaries
45
Prepared by Nguyen Huu [email protected]

45

15
21/12/2024

Predictive analytics

How do we evaluate classifiers?


• Try to avoid overfitting (models
that are too accurate) or
underfitting. They are bad at
predicting a future observation.

Exhibit 3-21 Illustration of Underfitting and


Overfitting the Data with a Predictive Model

46
Prepared by Nguyen Huu [email protected]

46

Predictive analytics

What else do you need to know about classification?


• Pruning removes branches
from a decision tree to avoid
overfitting the model.

Exhibit 3-17 Illustration of Pruning a Decision Tree


47
Prepared by Nguyen Huu [email protected]

47

Predictive Attributes Predictive?


Predictive analytics Inventory location
Purchase date
Manufacturer
An auditor is trying to figure
Identified as obsolete
out if the inventory at an
Sales revenue
electronics store chain is
Inventory color
obsolete. From the list below,
Inventory cost
identify whether each
Inventory description
attribute would be useful for
Inventory size
predicting inventory
Inventory type
obsolescence or not.
Days since last purchase

48
Prepared by Nguyen Huu [email protected]

48

16
21/12/2024

Prescriptive analytics

Once other diagnostic and predictive analyses have been


performed, the decision process can be aided by rules-based
decision support systems, machine learning models, or Machine learning and Artificial intelligence are two
added to an existing artificial intelligence model to forms of Prescriptive approach to Data Analytics work.
improve future predictions.

49
Prepared by Nguyen Huu [email protected]

49

Prescriptive analytics

Prescriptive analytics examples:


• Decision support systems are rule-based systems that gather
data and recommend actions based on the input.
• Machine learning and artificial intelligence are learning
models or intelligent agents that adapt to new external data to
recommend a course of action.

50
Prepared by Nguyen Huu [email protected]

50

Prescriptive analytics
DSS use rules to guide the accountant.

• The rules are derived from past


behavior to help guide the
accountant through a process.
• For example, the classification
of leases is based on evaluating
several rules.

Exhibit 3-23 Lease Classification Flowchart


51
Prepared by Nguyen Huu [email protected]

51

17
21/12/2024

Prescriptive analytics
Machine learning learns from past data to predict
better outcomes.
• What these all have in common is the use of algorithms and statistical models
to generate a previously unknown model that relies on patterns and
inferences.
• For most application of artificial intelligence models, most companies will
outsource the underlying system from companies like Microsoft, Amazon, or
Google rather than develop it themselves.
• These companies have large datasets to create more accurate prediction and
recommendation engines.
52
Prepared by Nguyen Huu [email protected]

52

Summary

• In this chapter, we addressed the third and fourth steps of the IMPACT
cycle model: the “P” for “performing test plan” and “A” for “address and
refine results.” That is, how are we going to test or analyze the data to
address a problem we are facing?
• We identified descriptive analytics that help describe what happened with
the data, including summary statistics, data reduction, and filtering.
• We provided examples of diagnostic analytics that help users identify
relationships in the data that uncover why certain events happen through
profiling, clustering; similarity matching, and co-occurrence grouping.

53
Prepared by Nguyen Huu [email protected]

53

Summary

• We introduced some specific models and terminology related to these tools,


including Benford’s law, test and training data, decision trees and boundaries,
linear classifiers, and support vector machines. We identified cases where creating
models that overfit existing data are not very accurate at predicting the future.

• We explained examples of predictive analytics and introduced some data mining


concepts related to regression, classification, and link prediction that can help
predict future events or values. We discussed prescriptive analytics, including
decision support systems and artificial intelligence and provided some examples
of how these systems can make recommendations for future actions.
54
Prepared by Nguyen Huu [email protected]

54

18
21/12/2024

Key words

Alternative hypothesis (131) Giả thuyết đối lập


Benford’s law (128) Luật Benford
Causal modeling (133) Mô hình nhân quả
Decision boundaries (138) Ranh giới quyết định
Decision support system (141) Hệ thống hỗ trợ ra quyết định
Decision tree (138) Cây quyết định
Descriptive analytics (116) Phân tích mô tả
Diagnostic analytics (116) Phân tích chẩn đoán
Digital dashboard (125) Bảng điều khiển kỹ thuật số
Dummy variables (135) Biến giả
Effect size (141) Mức độ ảnh hưởng
Interquartile range (IQR) (124) Khoảng tứ phân vị

55
Prepared by Nguyen Huu [email protected]

55

Key words

Null hypothesis (131) Giả thuyết không


Overfitting (140) Quá khớp
Predictive analytics (116) Phân tích dự báo
Prescriptive analytics (117) Phân tích đề xuất
Summary statistics (119) Thống kê tổng hợp
Supervised approach/method (133) Cách tiếp cận/phương pháp có giám sát
Test data (138) Dữ liệu kiểm tra
Time series analysis (137) Phân tích theo chuỗi thời gian
Training data (138) Dữ liệu huấn luyện
Underfitting (140) Chưa khớp
Unsupervised approach/method (129) Cách tiếp cận/phương pháp không có giám sát
XBRL (eXtensible Business Reporting Ngôn ngữ báo cáo kinh doanh mở rộng
Language) (122)

56
Prepared by Nguyen Huu [email protected]

56

19

You might also like