Data Analytics for Accounting, 3rd Edition
Data Analytics for Accounting, 3rd Edition
Vernon J. Richardson
University of Arkansas, Baruch College
Ryan A. Teeter
University of Pittsburgh
Katie L. Terrell
University of Arkansas
ISTUDY
Final PDF to printer
Published by McGrawHill LLC, 1325 Avenue of the Americas, New York, NY 10019. Copyright ©2023 by
McGrawHill LLC. All rights reserved. Printed in the United States of America. No part of this publication may
be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without
the prior written consent of McGrawHill LLC, including, but not limited to, in any network or other electronic
storage or transmission, or broadcast for distance learning.
Some ancillaries, including electronic and print components, may not be available to customers outside the
United States.
1 2 3 4 5 6 7 8 9 LWI 27 26 25 24 23 22
ISBN 978-1-265-09445-4
MHID 1-265-09445-4
All credits appearing on page or at the end of the book are considered to be an extension of the copyright page.
The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a website does
not indicate an endorsement by the authors or McGrawHill LLC, and McGrawHill LLC does not guarantee the
accuracy of the information presented at these sites.
mheducation.com/highered
ISTUDY
Preface
Data Analytics is changing the business world—data simply surround us! So many data are
available to businesses about each of us—how we shop, what we read, what we buy, what
music we listen to, where we travel, whom we trust, where we invest our time and money,
and so on. Accountants create value by addressing fundamental business and accounting
questions using Data Analytics.
All accountants must develop data analytic skills to address the needs of the profession
in the future—it is increasingly required of new hires and old hands. Data Analytics for
Accounting, 3e recognizes that accountants don’t need to become data scientists—they may
never need to build a data repository or do the real hardcore Data Analytics or learn how to
program a computer to do machine learning. However, there are seven skills that analytic-
minded accountants must have to be prepared for a data-filled world, including:
1. Developed analytics mindset—know when and how Data Analytics can address
business questions.
2. Data scrubbing and data preparation—comprehend the process needed to clean and
prepare the data before analysis.
3. Data quality—recognize what is meant by data quality, be it completeness, reliability,
or validity.
4. Descriptive data analysis—perform basic analysis to understand the quality of the
underlying data and their ability to address the business question.
5. Data analysis through data manipulation—demonstrate ability to sort, rearrange,
merge, and reconfigure data in a manner that allows enhanced analysis. This may
include diagnostic, predictive, or prescriptive analytics to appropriately analyze the
data.
6. Statistical data analysis competency—identify and implement an approach that will
use statistical data analysis to draw conclusions and make recommendations on a
timely basis.
7. Data visualization and data reporting—report results of analysis in an accessible way
to each varied decision maker and his or her specific needs.
Consistent with these skills, it’s important to recognize that Data Analytics is an iterative
process. The process begins by identifying business questions that can be addressed with
data, extracting and testing the data, refining our testing, and finally, communicating those
findings to management. Data Analytics for Accounting, 3e describes this process by relying
on an established Data Analytics model called the IMPACT cycle:1
1. Identify the questions.
2. Master the data.
3. Perform test plan.
4. Address and refine results.
5. Communicate insights.
6. Track outcomes.
1
Jean Paul Isson and Jesse S. Harriott, Win with Advanced Business Analytics: Creating Business Value
from Your Data (Hoboken, NJ: Wiley, 2013).
iv
ISTUDY
Preface v
The IMPACT cycle is described in the first four chapters, and then the process is
illustrated in auditing, managerial accounting, financial accounting, and taxes in Chapters
5 through 9. In response to instructor feedback, Data Analytics for Accounting, 3e now also
includes two new project chapters, giving students a chance to practice the full IMPACT
model with multiple labs that build on one another.
Data Analytics for Accounting, 3e emphasizes hands-on practice with real-world
data. Students are provided with hands-on instruction (e.g., click-by-click instructions,
screenshots, etc.) on datasets within the chapter; within the end-of-chapter materials; and in
the labs at the end of each chapter. Throughout the text, students identify questions, extract
and download data, perform testing, and then communicate the results of that testing.
The use of real-world data is highlighted by using data from Avalara, LendingClub,
College Scorecard, Dillard’s, the State of Oklahoma, as well as other data from our labs. In
particular, we emphasize the rich data from Dillard’s sales transactions that we use in more
than 15 of the labs throughout the text (including Chapter 11).
Data Analytics for Accounting, 3e also emphasizes the various data analysis tools students
will use throughout the rest of their career around two tracks—the Microsoft track (Excel,
Power BI) and a Tableau track (Tableau Prep and Tableau Desktop—available with free
student license). Using multiple tools allows students to learn which tool is best suited for
the necessary data analysis, data visualization, and communication of the insights gained—
for example, which tool is easiest for internal controls testing, which is best for analysis or
querying (using SQL) big datasets, which is best for data visualizations, and so on.
ISTUDY
About the Authors
Vernon J. Richardson is a Distinguished Professor of Accounting and the G. William
Glezen Chair in the Sam M. Walton College of Business at the University of Arkansas and a
Visiting Professor at Baruch College. He received his BS, Master of Accountancy, and MBA
from Brigham Young University and a PhD in accounting from the University of Illinois at
Urbana–Champaign. He has taught students at the University of Arkansas, Baruch College,
University of Illinois, Brigham Young University, Aarhus University, and University of
Kansas, and internationally at the China Europe International Business School (Shanghai),
Xi’an Jiaotong Liverpool University, Chinese University of Hong Kong–Shenzhen, and the
University of Technology Sydney.
Vernon J. Richardson
Dr. Richardson is a member of the American Accounting Association. He has served as
president of the American Accounting Association Information Systems section. He previously
served as an editor of The Accounting Review and is currently an editor at Accounting Horizons.
He has published articles in The Accounting Review, Journal of Information Systems, Journal of
Accounting and Economics, Contemporary Accounting Research, MIS Quarterly, International
Journal of Accounting Information Systems, Journal of Management Information Systems, Journal of
Operations Management, and Journal of Marketing. Dr. Richardson is also an author of McGraw
Hill’s Accounting Information Systems and Introduction to Data Analytics for Accounting textbooks.
Katie L. Terrell is an instructor in the Sam M. Walton College of Business at the University
of Arkansas. She received her BA degrees in English literature and in the Spanish language
from the University of Central Arkansas and her MBA from the University of Arkansas. She
expects a doctoral degree by 2021. She has taught students at the University of Arkansas;
Soochow University (Suzhou, China); the University College Dublin (Ireland); and Duoc
UC, a branch of the Catholic University of Chile (Vina del Mar, Chile).
She is a member of the American Accounting Association and has published a Statement
on Management Accounting for the Institute of Management Accountants on managing
organizational change in operational change initiatives. Terrell was named the 2019
Business Professional of the Year (Education) by the national Beta Alpha Psi organization.
She has recently been recognized for her innovative teaching by being the recipient of
Katie L. Terrell
the Mark Chain/FSA Teaching Award for innovative graduate-level accounting teaching
practices in 2016. She has worked with Tyson Foods, where she held various information
system roles, focusing on business analysis, project management for ERP implementations
and upgrades, and organizational change management. Terrell is also an author of McGraw
Hill’s Introduction to Data Analytics for Accounting textbook.
vi
ISTUDY
Acknowledgments
Our sincere thanks to all who helped us on this project.
Our biggest thanks to the awesome team at McGraw Hill, including Steve Schuetz, Tim
Vertovec, Rebecca Olson, Claire McLemore, Michael McCormick, Christine Vaughan,
Kevin Moran, Angela Norris, and Lori Hancock.
Our thanks also to each of the following:
The Walton College Enterprise Team (Paul Cronan, Ron Freeze, Michael Gibbs,
Michael Martz, Tanya Russell) for their work helping us get access to the Dillard’s data.
Shane Lunceford from LendingClub for helping gain access to LendingClub data.
Joy Caracciolo, Will Cocker, and Tommy Morgan from Avalara for their help to grant
permissions usage of the Avalara data.
Bonnie Klamm, North Dakota State University, and Ryan Baxter, Boise State University,
for their accuracy check review of the manuscript and Connect content.
In addition, the following reviewers and classroom testers who provided ideas and
insights for this edition. We appreciate their contributions.
ISTUDY
Key Features
• NEW! Color Coded Multi-Track Labs: Instructors have the flexibility to guide
students through labs using the Green Track: Microsoft tools (including Excel, Power
Query, and Power BI); Blue Track: Tableau tools (including Tableau Prep Builder
and Tableau Desktop); or both. Each track is clearly identified and supported with
additional resources.
• NEW! Lab Example Outputs: Each lab begins with an example of what students are
expected to create. This provides a clear reference and guide for student deliverables.
• NEW! Auto-Graded Problems: The quantity and variety of auto-graded problems
that are assignable in McGraw Hill Connect have been expanded.
• NEW! Discussion and Analysis: Now available as manually graded assignments in
McGraw Hill Connect.
• Emphasis on Skills: Working through the IMPACT cycle framework, students
will learn problem assessment, data preparation, data analysis, data visualization,
control contesting, and more.
• Emphasis on Hands-On Practice: Students will be provided hands-on learning (click-
by-click instructions with screenshots) on datasets within each chapter, within the
end-of-chapter materials, and in the labs and comprehensive cases.
• Emphasis on Datasets: To illustrate data analysis techniques and skills, multiple
practice datasets (audit, financial, and managerial data) will be used in every chapter.
Students gain real-world experience working with data from Avalara, LendingClub,
Dillard’s, College Scorecard, the State of Oklahoma, as well as financial statement
data (via XBRL) from S&P100 companies.
• Emphasis on Tools: Students will learn how to conduct data analysis using Microsoft
and Tableau tools. Students will compare and contrast the different tools to
determine which are best suited for basic data analysis and data visualization, which
are easiest for internal controls testing, which are best for SQL queries, and so on.
viii
ISTUDY
Main Text Features First Pages
Chapter Maps
These maps provide a guide of what we’re
going to cover in the chapter as well as a
Chapter 2
guide of what we’ve just learned and what’s Mastering the Data
coming next.
Chapter-Opening Vignettes
Because companies are facing new and exciting
opportunities with their use of Data Analytics A Look at This Chapter
to help with accounting and business deci- This chapter provides an overview of the types of data that are used in the accounting cycle and common data that
are stored in a relational database. The second step of the IMPACT cycle is “mastering the data,” which is sometimes
sions, we detail what they’re doing and why in called ETL for extracting, transforming, and loading the data. We will describe how data are requested and extracted
to answer business questions and how to transform data for use via data preparation, validation, and cleaning. We
our chapter-opening vignettes. conclude with an explanation of how to load data into the appropriate tool in preparation for analyzing data to
make decisions.
A Look Ahead
We are lucky to live in a world in which data are abundant. How
ever, even with rich sources of data, when it comes to being Chapter 3 describes how to go from defining business problems to analyzing data, answering questions, and address-
able to analyze data and turn them into useful information and ing business problems. We identify four types of Data Analytics (descriptive, diagnostic, predictive, and prescriptive
insights, very rarely can an analyst hop right into a dataset and
begin analyzing. Datasets almost always need to be cleaned analytics) and describe various approaches and techniques that are most relevant to analyzing accounting data.
and validated before they can be used. Not knowing how to
clean and validate data can, at best, lead to frustration and poor
insights and, at worst, lead to horrible security violations. While
this text takes advantage of open source datasets, these data
sets have all been scrubbed not only for accuracy, but also to
protect the security and privacy of any individual or company
Wichy/Shutterstock
whose details were in the original dataset.
In 2015, a pair of researchers named Emil Kirkegaard and
Julius Daugbejerg Bjerrekaer scraped data from OkCupid, a free dating website, and provided the data onto the
“Open Science Framework,” a platform researchers use to obtain and share raw data. While the aim of the Open
Science Framework is to increase transparency, the researchers in this instance took that a step too far—and a step
into illegal territory. Kirkegaard and Bjerrekaer did not obtain permission from OkCupid or from the 70,000 OkCupid
users whose identities, ages, genders, religions, personality traits, and other personal details maintained by the dat 52
ing site were provided to the public without any work being done to anonymize or sanitize the data. If the researchers
had taken the time to not just validate that the data were complete, but also to sanitize them to protect the individu
als’ identities, this would not have been a threat or a news story. On May 13, 2015, the Open Science Framework
removed the OkCupid data from the platform, but the damage of the privacy breach had already been done.1
A 2020 report suggested that “Any consumer with an average number of apps on their phone—anywhere between
40 and 80 apps—will have their data shared with hundreds or perhaps thousands of actors online,” said Finn Myrstad,
the digital policy director for the Norwegian Consumer Council, commenting specifically about dating apps.2 ric44907_ch02_052-113.indd 52 08/25/21 03:09 PM
All told, data privacy and ethics will continue to be an issue for data providers and data users. In this chapter, we
look at the ethical considerations of data collection and data use as part of mastering the data.
LO 2-1 Understand available internal and external data sources and how data
are organized in an accounting information system.
LO 2-2 Understand how data are stored in a relational database.
We feature learning objectives at the beginning
LO 2-3 Explain and apply extraction, transformation, and loading (ETL)
techniques to prepare the data for analysis.
of each chapter. Having these learning objectives
LO 2-4 Describe the ethical considerations of data collection and data use.
provides students with an overview of the con- First
1
B. Resnick, “Researchers Just Released Profile Data on 70,000 OkCupid Users without Permission,”
cepts to be taught in the chapter and the labs.
Vox, 2016, http://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release (accessed
October 31, 2016).
2
N. Singer and A. Krolik, “Grindr and OkCupid Spread Personal Details, Study Says,” New York Times,
January 13, 2020, https://www.nytimes.com/2020/01/13/technology/grindr-apps-dating-data-tracking.
html (accessed December 2020).
53
Chapter 2 Mastering the Data
Progress Checks
ric44907_ch02_052-113.indd 53 08/25/21 03:09 PM
PROGRESS CHECK
1. Referring to Exhibit 22, locate the relationship between the Supplier and
Periodic progress check questions are posed to the Purchase Order tables. What is the unique identifier of each table? (The unique
identifier attribute is called the primary key—more on how it’s determined in the
students throughout each chapter. These checks next learning objective.) Which table contains the attribute that creates the rela
tionship? (This attribute is called the foreign key—more on how it’s determined in
provoke the student to stop and consider the con- the next learning objective.)
cepts presented. 2. Referring to Exhibit 22, review the attributes in the Purchase Order table. There
are two foreign keys listed in this table that do not relate to any of the tables in
the diagram. Which tables do you think they are? What type of data would be
stored in those two tables?
3. Refer to the two tables that you identified in Progress Check 2 that would
relate to the Purchase Order table, but are not pictured in this diagram. Draw
a sketch of what the UML Class Diagram would look like if those tables were
relationships that
tables.
ISTUDY
data needed for solving the data analysis problem, as well as cleaning
PK:and preparing
Check Numberthe data for analysis.
primary key (57) An attribute that is required to exist in each table of a relational database and serves
as the “unique identifier” for each record in a table.
4. The purpose of the primary key is to uniquely identify each record in a table. The purpose
End-of-Chapter Materials
relational database (56) A means of storing data in order to ensure that the data are complete, not
of a foreign key is to create a relationship between two tables. The purpose of a descrip
redundant, and to help enforce business rules. Relational databases also aid in communication and integra-
tive attribute is to provide meaningful information about each record in a table. Descrip
tion of business processes across an organization.
tive attributes aren’t required for a database to run, but they are necessary for people to
supply chain
gain management
business (SCM)
information system
about the(54) An information
data stored system that helps manage all the
in their databases.
company’s
5. Datainteractions with
dictionaries suppliers.
provide descriptions of the function (e.g., primary key or foreign key
when applicable), data type, and field names associated with each column (attribute) of
a database. Data dictionaries are especially important when databases contain several
different tables and many different attributes in order to help analysts identify the informa
tion they need to perform their analysis.
Answers to Progress Checks ANSWERS TO PROGRESS CHECKS
6. Depending on the level of security afforded to a business analyst, she can either obtain
data directly from the database herself or she can request the data. When obtaining data
1. Theherself, theidentifier
unique analyst must have
of the accesstable
Supplier to theis raw data in
[Supplier theand
ID], database and aidentifier
the unique firm knowledge
of the
The answers allow students to evaluate if they are of SQL and
Purchase datatable
Order extraction
is [PO techniques.
Number]. The When requesting
Purchase Orderthe data,
table the analyst
contains doesn’t key.
the foreign need
the same level of extraction skills, but she still needs to be familiar with the data enough in
on track with their understanding of the materials 2. The foreign key attributes in the Purchase Order table that do not relate to any tables
order to identify which tables and attributes contain the information she requires.
in the view are EmployeeID and CashDisbursementID. These attributes probably relate
presented in the chapter. 7. toFour
the common
Employeeissues
table that mustwe
(so that becan
fixedtellare removing
which headings
employee or subtotals,forcleaning
was responsible each
leading zeroes
Purchase Order) or
andnonprintable characters, formatting
the Cash Disbursement table (so negative
that we cannumbers, andPurchase
tell if the correcting
inconsistencies across the data.
Orders have been paid for yet, and if so, on which check). The Employee table would be a
8. complete
Firms can listing of each
ask to see theemployee,
terms andas well as containing
conditions of theirthe details about
thirdparty each employee
data supplier, and ask
(for example,tophone
questions come number, address, etc.).
to an understanding The Cashif Disbursement
regarding and how privacy table would be
practices areamain
list
ing of the payments the company has made.
tained. They also can evaluate what preventive controls on data access are in place and
assess whether they are followed. Generally, an audit does not need to be performed, but
70 requesting a questionnaire be filled out would be appropriate.
First Pag
08/25/21 03:09 PM
1. (LO 2-3) Mastering the data can also be described via the ETL process. The ETL pro
The multiple choice questions quickly assess stu- 10. (LO 2-4)stands
cess Whichfor:
of the following questions are not suggested by the Institute of Business
Ethics to allow a business to create value from data use and analysis, and still protect
dent’s knowledge of chapter content. a. extract, total, and load data.
the privacy of stakeholders?
b. enter, transform, and load data.
a. How does the company use data, and to what extent are they integrated into firm
c. extract, transform, and load data.
strategy?
b. d. enter,
Does thetotal, and load
company senddata.
a privacy notice to individuals when their personal data are
collected?
c. Does the data used by the company include personally identifiable information?
d. Does the company have the appropriate tools to mitigate the risks of data misuse?
08/25/21 0
in Connect! 1. (LO 2-2) The advantages of a relational database include limiting the amount of redun
dant data that are stored in a database. Why is this an important advantage? What can
go wrong when redundant data are stored?
This feature provides questions for group discus- 2. (LO 2-2) The advantages of a relational database include integrating business pro
cesses. Why is it preferable to integrate business processes in one information system,
sion and analysis. Now available as manually rather than store different business process data in separate, isolated databases?
graded assignments in McGraw Hill Connect! 3. (LO 2-2) Even though it is preferable to store data in a relational database, storing data
across separate tables can make data analysis cumbersome. Describe three reasons it
is worth the trouble to store data in a relational database.
4. (LO 2-2) Among the advantages of using a relational database is enforcing business
11. (LO 2-4)
rules. Based on your understanding of What
how theis structure
the themeof aof each ofdatabase
relational the six helps
questions proposed by
prevent data redundancy and other advantages,
Business Ethics? Which howonedoesaddresses
the primary thekey/foreign
purpose of the data? Which
key relationship structure help
how enforce a business
the risks rule that
associated withindicates
data use that
anda company
collection are mitigated? H
shouldn’t process any purchase
twoorders from
specific suppliers who
objectives don’t existat
be achieved inthe
the same
database?
time?
5. (LO 2-2) What is the purpose of a data dictionary? Identify four different attributes that
could be stored in a data dictionary, and describe the purpose of each.
6. (LO 2-3) In the ETL process, the first step is extracting the data. When you are obtaining
Problems Problems
®
the data yourself, what are the steps to identifying the data that you need to extract?
7. (LO 2-3) In the ETL process, if the analyst does not have the security permissions to
access the data directly, then he or she will need to fill out a data request form. While
The problems challenge the student’s ability 1. (LO 2-2) Match the relational database function to the appropriate rela
this doesn’t necessarily require the analyst to know extraction techniques, why does
term: the raw data very well in order to complete the data
to see relationships in the learning objectives the analyst still need to understand
request? • Composition primary key
with analysis options that employ higher-level 8. (LO 2-3) In the ETL process, when an analyst is
• Descriptive completing the data request form, there
attribute
thinking and analytical skills. The quantity of are a number of fields that the analyst is required to complete. Why do you think it is
• Foreign key
important for the analyst to indicate the frequency of the report? How do you think that
auto-graded problems has been expanded. The would affect what the database• administrator
Primary key does in the extraction?
manually graded analysis problems are also now 9. (LO 2-3) Regarding the data•request form, why
Relational do you think it is important to the data
database
base administrator to know the purpose of the request? What would be the importance
assignable in McGraw Hill Connect. of the “To be used in” and “Intended audience” fields?
Relational
10. (LO 2-3) In the ETL process, one Database
important Functionwhen transforming the data
step to process Relational
is to work with null, n/a, and zero valuesas
1. Serves in the dataset.
a unique If you have
identifier a field of quantitative
in a database table.
data (e.g., number of years each individual in the table has held a fulltime job), what
2. Creates a relationship between two tables.
would be the effect of the following?
3. Two into
a. Transforming null and n/a values foreign keys from the tables that it is linking combine to
blanks
b.
4.
x c.
(Hint: Think about the impact on
ISTUDY AVERAGE.) 5.
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Sláinte has branched out in recent years to offer custom-branded
microbrews for local bars that want to offer a distinctive tap and corporate events where
a custom brew can be used to celebrate a big milestone. Each of these custom orders fall
into one of two categories: Standard jobs include the brew and basic package design,
and Premium jobs include customizable brew and enhanced package design with custom
designs and upgraded foil labels. Sláinte’s management has begun tracking the costs asso-
ciated with each of these custom orders with job cost sheets that include materials, labor,
NEW! Color Coded Multi-Track Labs and overhead allocation. You have been tasked with helping evaluate the costs for these
custom jobs.
Data: Lab 7-1 Slainte Job Costs.zip - 26KB Zip / 30KB Excel
The labs give students hands-on experience working with different types of data
and the tools used to analyze them. Students completeLab labs7-1usingExample Output
the instructor-
led
Tableautrack and
Software, Inc. answer
All rights reserved. common questions. Clear step-by-step directions
By the end help
of this lab, you model
will create a dashboard that will let you explore job cost variance.
LAB 7-1T Example Job Cost Dashboard in Tableau Desktop While your results will include different data values, your work should look similar to this:
the expected output of each lab exercise.
Chapter 1
• Added new opening vignette regarding a recent IMA survey of finance and
accounting professionals and their use of Big Data and Data Analytics.
• Added discussion on how analytics are used in auditing, tax, and management accounting.
• Included introduction to the variety of analytics tools available and explanation of
dual tracks for labs including Microsoft Track and Tableau Track.
• Added “Data Analytics at Work” box feature: What Does an Analyst Do at a Big
Four Accounting Firm.
• Added six new Connect-ready problems.
• Implemented lab changes:
• All-new tool connections in Lab 1-5.
• Revised Labs 1-0 to 1-4.
Chapter 2
• Edited opening vignette to include current examples regarding data privacy and ethics.
• Added a discussion on ethical considerations related to data collection and use.
• Added exhibit with potential external data sources to address accounting questions.
• Expanded the data extraction section to first include data identification, including
the use of unstructured data.
• Added “Data Analytics at Work” box feature: Jump Start Your Accounting Career
with Data Analytics Knowledge.
• Added six new Connect-ready problems.
• Implemented lab changes:
• Revised Labs 2-1 to 2-8.
ISTUDY
Chapter 3
• Refined the discussion on diagnostic analytics.
• Improved the discussion on the differences between qualitative and quantitative
data and the discussion of the normal distribution.
• Refined the discussion on the use of regression as an analytics tool.
• Added examples of time series analysis in the predictive analytics section.
• Added “Data Analytics at Work” box feature: Big Four Invest Billions in Tech,
Reshaping Their Identities as Professional Services Firm with a Technology Core.
• Added six new Connect-ready problems.
• Implemented lab changes:
• All-new cluster analysis in Lab 3-2.
• Revised Labs 3-1, 3-3 to 3-6.
Chapter 4
• Added discussion of statistics versus visualizations using Anscombe’s quartet.
• Updated explanations of box plots and Z-scores.
• Added “Data Analytics at Work” box feature: Data Visualization: Why a Picture Can
Be Worth a Thousand Clicks.
• Added six new Connect-ready problems.
• Implemented lab changes:
• All-new dashboard in Lab 4-3.
• Revised Labs 4-1, 4-2, 4-4, 4-5.
Chapter 5
• Improved and clarified content to match the focus on descriptive, diagnostic, predictive,
and prescriptive analytics.
• Added “Data Analytics at Work” box feature: Citi’s $900 Million Internal Control
Mistake: Would Continuous Monitoring Help?
• Added six new Connect-ready problems.
• Implemented lab changes:
• Revised Labs 5-1 to 5-5.
Chapter 6
• Clarified chapter content to match the focus on descriptive, diagnostic, predictive,
and prescriptive analytics.
• Added “Data Analytics at Work” box features: Do Auditors Need to Be Programmers?
• Added six new Connect-ready problems.
• Implemented lab changes:
• Major revisions to Labs 6-1 to 6-5.
Chapter 7
• Added new exhibit and discussion that maps managerial accounting questions to
data approaches.
• Added “Data Analytics at Work” box feature: Maximizing Profits Using Data Analytics
• Added five new Connect-ready problems.
• Implemented lab changes:
• All-new job cost, balanced scorecard, and time series dashboards in Lab 7-1, 7-2, 7-3.
• Revised Lab 7-4, 7-5.
ISTUDY
xiv Key Features
Chapter 8
• Added new exhibit and discussion that maps financial statement analysis questions
to data approaches.
• Added four new Connect-ready problems.
• Implemented lab changes:
• All-new sentiment analysis in Lab 8-4.
• Revised Labs 8-1 to 8-3.
Chapter 9
• Added new exhibit and discussion that maps tax questions to data approaches.
• Added four new Connect-ready problems.
• Implemented lab changes:
• Revised Labs 9-1 to 9-5.
Chapter 10
• Updated project chapter that evaluates different business processes, including the
order-to-cash and procure-to-pay cycles, from different user perspectives with a
choice to use the Microsoft track, the Tableau track, or both.
• Added extensive, all-new set of objective and analysis questions to assess analysis
and learning.
Chapter 11
• Updated project chapter, estimating sales returns at Dillard’s with three question sets
highlighting descriptive and exploratory analysis, hypothesis testing, and predictive
analytics with a choice to use the Microsoft track, the Tableau track, or both.
• Added extensive, all-new set of objective and analysis questions to assess analysis
and learning.
ISTUDY
Connect for Data Analytics
for Accounting
®
With McGraw Hill Connect for Data Analytics for Accounting, your students receive proven
study tools and hands-on assignment materials, as well as an adaptive eBook. Here are some
of the features and assets available with Connect.
Proctorio: New remote proctoring and browser-locking capabilities, hosted by Proctorio within
Connect, provide control of the assessment environment by enabling security options and
verifying the identity of the student. Seamlessly integrated within Connect, these services allow
instructors to control students’ assessment experience by restricting browser activity, recording
students’ activity, and verifying students are doing their own work. Instant and detailed reporting
gives instructors an at-a-glance view of potential academic integrity concerns, thereby avoiding
personal bias and supporting evidence-based claims.
SmartBook 2.0: A personalized and adaptive learning tool used to maximize the learning
experience by helping students study more efficiently and effectively. Smartbook 2.0 highlights
where in the chapter to focus, asks review questions on the materials covered, and tracks the most
challenging content for later review recharge. Smartbook 2.0 is available both online and offline.
Orientation Videos: Video-based tutorial assignments are designed to train students via
an overview video followed by a quiz for each of the assignment types they will find in
McGraw Hill Connect.
Multiple Choice Questions: The multiple choice questions from the end-of-chapter materials
are assignable and auto-gradable in McGraw Hill Connect, with the option to provide stu-
dents with instant feedback on their answers and performance.
Discussion and Analysis Questions: We have added the Discussion and Analysis questions
into McGraw Hill Connect as manually graded assignments for convenience of assignment
organization. These can be utilized for small group or in-class discussion.
ISTUDY
Problems: Select problems from the text are auto-graded in McGraw Hill Connect.
Manually graded analysis problems are also now available to ensure students are building
an analytical skill set.
Color Coded Multi-Track Labs: Labs are assignable in McGraw Hill Connect as the green
Microsoft Track (including Excel, Power Query, and Power BI) and blue Tableau Track
(including Tableau Prep Builder and Tableau Desktop).
xvi
ISTUDY
Students complete their lab work outside of Connect in the lab track selected by their
professor. Students answer assigned lab questions designed to ensure they understood the
key skills and outcomes from their lab work. Both auto-graded lab objective questions and
manually graded lab analysis questions are assignable in Connect.
Comprehensive Cases: Comprehensive case labs are assignable in McGraw Hill Connect.
Students work outside of Connect to complete the lab using the Dillard’s real-world Big
Data set. Once students complete the comprehensive lab, they will go back into Connect
to answer questions designed to ensure they completed the lab and understood the key
skills and outcomes from their lab work.
Lab Walkthrough Videos: These author-led lab videos in McGraw Hill Connect explain
how to access and use the tools needed to complete the processes essential to the labs. Lab
videos improve student success and minimize student questions!
Author Lecture Videos: Lecture Videos assignable in McGraw Hill Connect teach each
chapter’s core learning objectives and concepts through an author-developed, hands-on
presentation, bringing the text content to life. The videos have the touch and feel of a live
lecture, rather than a canned presentation, so you can learn at your own pace.
Writing Assignment: The Writing Assignment tool delivers a learning experience to help
students improve their written communication skills and conceptual understanding. As an
instructor you can assign, monitor, grade, and provide feedback on writing more efficiently
and effectively in McGraw Hill Connect.
Test Bank: The test bank includes auto-graded multiple choice and true/false assessment
questions. The test bank can be assigned directly within McGraw Hill Connect or exported
from Test Builder.
ISTUDY
Instructors: Student Success Starts with You
Tools to enhance your unique voice
Want to build your own course? No problem. Prefer to use an
OLC-aligned, prebuilt course? Easy. Want to make changes throughout
65%
Less Time
the semester? Sure. And you’ll save time with Connect’s auto-grading too.
Grading
ISTUDY
Students: Get Learning that Fits You
Effective tools for efficient studying
Connect is designed to help you be more productive with simple, flexible, intuitive tools that maximize
your study time and meet your individual learning needs. Get learning that works for you with Connect.
Top: Jenner Images/Getty Images, Left: Hero Images/Getty Images, Right: Hero Images/Getty Images
ISTUDY
Brief Table of Contents
Preface iv
About the Authors vi
Acknowledgments vii
Key Features viii
Main Text Features ix
End-of-Chapter Materials x
Data Analytics for Accounting, 3e Content Updates xii
Connect for Data Analytics for Accounting xv
Chapter 1 Data Analytics for Accounting and Identifying the Questions 2
Chapter 2 Mastering the Data 52
Chapter 3 Performing the Test Plan and Analyzing the Results 114
Chapter 4 Communicating Results and Visualizations 180
Chapter 5 The Modern Accounting Environment 244
Chapter 6 Audit Data Analytics 282
Chapter 7 Managerial Analytics 334
Chapter 8 Financial Statement Analytics 404
Chapter 9 Tax Analytics 454
Chapter 10 Project Chapter (Basic) 498
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 512
Appendix A Basic Statistics Tutorial 528
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 534
Appendix C Accessing the Excel Data Analysis Toolpak 544
Appendix D SQL Part 1 546
Appendix E SQL Part 2 560
Appendix F Power Query in Excel and Power BI 564
Appendix G Power BI Desktop 572
Appendix H Tableau Prep Builder 578
Appendix I Tableau Desktop 582
Appendix J Data Dictionaries 586
GLOSSARY 588
INDEX 593
ISTUDY
Detailed TOC
Chapter 1 Internal and External Data Sources 54
Data Analytics for Accounting and Identifying Accounting Data and Accounting Information Systems 56
the Questions 2 Data and Relationships in a Relational Database 56
Columns in a Table: Primary Keys, Foreign Keys, and
Data Analytics 4
Descriptive Attributes 57
How Data Analytics Affects Business 4
Data Dictionaries 59
How Data Analytics Affects Accounting 5
Extract, Transform, and Load (ETL) the Data 60
Auditing 6
Extract 61
Management Accounting 7
Transform 64
Financial Reporting and Financial Statement Analysis 7
Load 67
Tax 8
Ethical Considerations of Data Collection and Use 68
The Data Analytics Process Using the IMPACT Cycle 9
Summary 69
Step 1: Identify the Questions (Chapter 1) 9
Key Words 70
Step 2: Master the Data (Chapter 2) 10
Answers to Progress Checks 70
Step 3: Perform Test Plan (Chapter 3) 10
Multiple Choice Questions 71
Step 4: Address and Refine Results (Chapter 3) 13
Discussion and Analysis 73
Steps 5 and 6: Communicate Insights and Track
Problems 74
Outcomes (Chapter 4 and each chapter thereafter) 13
Lab 2-1 Request Data from IT—Sláinte 77
Back to Step 1 13
Lab 2-2 Prepare Data for Analysis—Sláinte 79
Data Analytic Skills and Tools Needed by
Lab 2-3 Resolve Common Data Problems—
Analytic-Minded Accountants 13
LendingClub 84
Choose the Right Data Analytics Tools 14
Lab 2-4 Generate Summary Statistics—
Hands-On Example of the IMPACT Model 17
LendingClub 91
Identify the Questions 17
Lab 2-5 Validate and Transform Data—
Master the Data 17
College Scorecard 95
Perform Test Plan 20
Lab 2-6 Comprehensive Case: Build
Address and Refine Results 23
Relationships among Database
Communicate Insights 24
Tables—Dillard’s 98
Track Outcomes 24
Lab 2-7 Comprehensive Case: Preview Data from
Summary 25
Tables—Dillard’s 103
Key Words 26
Lab 2-8 Comprehensive Case: Preview a Subset
Answers to Progress Checks 26
of Data in Excel, Tableau Using a SQL
Multiple Choice Questions 28
Query—Dillard’s 108
Discussion and Analysis 30
Problems 30
Lab 1-0 How to Complete Labs 36 Chapter 3
Lab 1-1 Data Analytics Questions in Financial Performing the Test Plan and Analyzing the Results 114
Accounting 39 Performing the Test Plan 116
Lab 1-2 Data Analytics Questions in Managerial Descriptive Analytics 119
Accounting 41 Summary Statistics 119
Lab 1-3 Data Analytics Questions in Auditing 42 Data Reduction 120
Lab 1-4 Comprehensive Case: Questions about Diagnostic Analytics 122
Dillard’s Store Data 44 Standardizing Data for Comparison (Z-score) 123
Lab 1-5 Comprehensive Case: Connect to Profiling 123
Dillard’s Store Data 47 Cluster Analysis 128
Hypothesis Testing for Differences in Groups 131
Chapter 2
Predictive Analytics 133
Mastering the Data 52
Regression 134
How Data Are Used and Stored in the Accounting Classification 137
Cycle 54 p
ISTUDY
xxii Contents
ISTUDY
Contents xxiii
Chapter 6 Chapter 7
Audit Data Analytics 282 Managerial Analytics 334
When to Use Audit Data Analytics 284 Application of the IMPACT Model to Management
Identify the Questions 284 Accounting Questions 336
Master the Data 284 Identify the Questions 336
Perform Test Plan 286 Master the Data 337
Address and Refine Results 288 Perform Test Plan 337
Communicate Insights 288 Address and Refine Results 338
Track Outcomes 288 Communicate Insights and Track Outcomes 339
Descriptive Analytics 288 Identifying Management Accounting Questions 339
Aging of Accounts Receivable 289 Relevant Costs 339
Sorting 289 Key Performance Indicators and Variance Analysis 339
Summary Statistics 289 Cost Behavior 340
Sampling 289 Balanced Scorecard and Key Performance
Diagnostic Analytics 290 Indicators 341
Master the Data and Perform the Test Plan 345
Box Plots and Quartiles 290
Address and Refine Results 347
Z-Score 290
Summary 348
t-Tests 290
Key Words 348
Benford’s Law 292
Answers to Progress Checks 349
Drill-Down 293
Multiple Choice Questions 349
Exact and Fuzzy Matching 293
Discussion and Analysis 351
Sequence Check 294
Problems 351
Stratification and Clustering 294
Lab 7-1 Evaluate Job Costs—Sláinte 355
Advanced Predictive and Prescriptive Lab 7-2 Create a Balanced Scorecard Dashboard—
Analytics in Auditing 294 Sláinte 367
Regression 295 Lab 7-3 Comprehensive Case: Analyze Time
Classification 295 Series Data—Dillard’s 377
Probability 295 Lab 7-4 Comprehensive Case: Comparing Results
Sentiment Analysis 295 to a Prior Period—Dillard’s 389
Applied Statistics 296 Lab 7-5 Comprehensive Case: Advanced
Artificial Intelligence 296 Performance Models—Dillard’s 398
Additional Analyses 296
Summary 297 Chapter 8
Key Words 297 Financial Statement Analytics 404
Answers to Progress Financial Statement Analysis 406
Checks 298 Descriptive Financial Analytics 407
Multiple Choice Questions 298 Vertical and Horizontal Analysis 407
Discussion and Analysis 300 Ratio Analysis 408
Problems 300 Diagnostic Financial Analytics 410
Lab 6-1 Evaluate Trends and Outliers— Predictive Financial Analytics 410
Oklahoma 304 Prescriptive Financial Analytics 412
Lab 6-2 Diagnostic Analytics Using Benford’s Visualizing Financial Data 413
Law—Oklahoma 311 Showing Trends 413
Lab 6-3 Finding Duplicate Payments— Relative Size of Accounts Using
Sláinte 317 Heat Maps 414
Lab 6-4 Comprehensive Case: Sampling— Visualizing Hierarchy 414
Dillard’s 321 Text Mining and Sentiment Analysis 415
Lab 6-5 Comprehensive Case: Outlier Detection— XBRL and Financial Data Quality 417
Dillard’s 325 XBRL Data Quality 419
ISTUDY
xxiv Contents
XBRL, XBRL-GL, and Real-Time Financial Reporting 420 Lab 9-3 omprehensive Case: Calculate Total
C
Examples of Financial Statement Analytics Using Sales Tax Paid—Dillard’s 479
XBRL 422 Lab 9-4 Comprehensive Case: Estimate Sales
Summary 422 Tax Owed by Zip Code—Dillard’s and
Key Words 423 Avalara 486
Answers to Progress Checks 423 Lab 9-5 Comprehensive Case: Online Sales Taxes
Multiple Choice Questions 424 Analysis—Dillard’s and Avalara 492
Discussion and Analysis 425
Problems 426 Chapter 10
Lab 8-1 Create a Horizontal and Vertical Analysis Project Chapter (Basic) 498
Using XBRL Data—S&P100 430
Lab 8-2 Create Dynamic Common Size Financial Evaluating Business Processes 500
Statements—S&P100 437 Question Set 1: Order-to-Cash 500
Lab 8-3 Analyze Financial Statement Ratios— QS1 Part 1 Financial: What Is the Total Revenue
S&P100 441 and Balance in Accounts Receivable? 500
Lab 8-4 Analyze Financial Sentiment— QS1 Part 2 Managerial: How Efficiently Is the Company
S&P100 444 Collecting Cash? 503
QS1 Part 3 Audit: Is the Delivery Process Following the
Chapter 9 Expected Procedure? 504
Tax Analytics 454 QS1 Part 4 What Else Can You Determine about the
Tax Analytics 456 O2C Process? 505
Identify the Questions 456 Question Set 2: Procure-to-Pay 506
Master the Data 456 QS2 Part 1 Financial: Is the Company Missing Out on
Perform Test Plan 456 Discounts by Paying Late? 506
Address and Refine Results 458 QS2 Part 2 Managerial: How Long Is the Company
Communicate Insights and Track Outcomes 458 Taking to Pay Invoices? 509
Mastering the Data through Tax Data QS2 Part 3 Audit: Are There Any Erroneous
Management 458 Payments? 510
Tax Data in the Tax Department 458 QS2 Part 4 What Else Can You Determine about
Tax Data at Accounting Firms 460 the P2P Process? 511
Tax Data at the IRS 461
Tax Data Analytics Visualizations 461 Chapter 11
Tax Data Analytics Visualizations and Tax Project Chapter (Advanced): Analyzing Dillard’s Data
Compliance 461 to Predict Sales Returns 512
Evaluating Sales Tax Liability 462
Estimating Sales Returns 514
Evaluating Income Tax Liability 462
Question Set 1: Descriptive and Exploratory
Tax Data Analytics for Tax Planning 464
Analysis 514
What-If Scenarios 464
QS1 Part 1 Compare the Percentage of Returned Sales
What-If Scenarios for Potential Legislation, Deductions,
and Credits 465 across Months, States, and Online versus In-Person
Summary 467 Transactions 514
Key Words 467 QS1 Part 2 What Else Can You Determine about the
Answers to Progress Checks 467 Percentage of Returned Sales through Descriptive
Multiple Choice Questions 468 Analysis? 518
Discussion and Analysis 469 Question Set 2: Diagnostic Analytics—Hypothesis
Problems 470 Testing 519
Lab 9-1 Descriptive Analytics: State Sales Tax QS2 Part 1 Is the Percentage of Sales Returned Significantly
Rates 472 Higher in January after the Holiday Season? 519
Lab 9-2 Comprehensive Case: Calculate QS2 Part 2 How Do the Percentages of Returned Sales
Estimated State Sales Tax for Holiday/Non-Holiday Differ for Online Transactions
Owed—Dillard’s 475 and across Different States? 521
ISTUDY
Contents xxv
Appendix A Appendix I
Basic Statistics Tutorial 528 Tableau Desktop 582
Appendix B Appendix J
Excel (Formatting, Sorting, Filtering, and Data Dictionaries 586
PivotTables) 534
GLOSSARY 588
Appendix C
Accessing the Excel Data Analysis Toolpak 544 INDEX 593
ISTUDY
ISTUDY
Data Analytics for
Accounting
ISTUDY
Chapter 1
Data Analytics for Accounting and
Identifying the Questions
A Look Ahead
Chapter 2 provides a description of how data are prepared and scrubbed to be ready for analysis to address account-
ing questions. We explain how to extract, transform, and load data and then how to validate and normalize the
data. In addition, we explain how data standards are used to facilitate the exchange of data between data sender and
receiver. We finalize the chapter by emphasizing the need for ethical data collection and data use to maintain data
privacy.
ISTUDY
As the access to accounting data proliferates and tools and
accountant skills advance, accountants are relying more on Big
Data to address accounting questions. Whether those questions
relate to audit, tax or other accounting areas, increasingly value
will be created by performing Data Analytics. In this chapter, we
introduce you to the need for Data Analytics in accounting, and
how accounting professionals are increasingly asked to develop
an analytics mindset for any and all accounting roles.
Technology such as Data Analytics, artificial intelligence,
machine learning, blockchain, and robotic process automation
will be playing a greater role in the accounting profession this
year, according to a recent report from the Institute of Man-
Cobalt S-Elinoi/Shutterstock
agement Accountants.
The report indicates that finance and accounting professionals are increasingly implementing Big Data in
their business processes, and the pattern is likely to continue in the future. The IMA surveyed its members for
the report and received 170 responses from CFOs and other management accountants. Many of the CFOs are
predicting big changes for 2020 in their businesses.
Sources: M. Cohn, “Accountants to Rely More on Big Data in 2020,” Accounting Today, January 4, 2020, https://www.
accountingtoday.com/news/accountants-to-rely-more-on-big-data-in-2020 (accessed December 2020).
OBJECTIVES
After reading this chapter, you should be able to:
ISTUDY
4 Chapter 1 Data Analytics for Accounting and Identifying the Questions
PROGRESS CHECK
1. How does having more data around us translate into value for a company? What
must we do with those data to extract value?
2. Banks know a lot about us, but they have traditionally used externally generated
credit scores to assess creditworthiness when deciding whether to extend a
loan. How would you suggest a bank use Data Analytics to get a more complete
view of its customers’ creditworthiness? Assume the bank has access to a cus-
tomer’s loan history, credit card transactions, deposit history, and direct deposit
registration. How could it assess whether a loan might be repaid?
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 5
a high value on Data Analytics. In fact, per PwC’s 6th Annual Digital IQ survey of more
than 1,400 leaders from digital businesses, the area of investment that tops CEOs’ list of
priorities is business analytics.5
A recent study from McKinsey Global Institute estimates that Data Analytics and tech-
nology could generate up to $2 trillion in value per year in just a subset of the total pos-
sible industries affected.6 Data Analytics could very much transform the manner in which
companies run their businesses in the near future because the real value of data comes
from Data Analytics. With a wealth of data on their hands, companies use Data Analytics
to discover the various buying patterns of their customers, investigate anomalies that were
not anticipated, forecast future possibilities, and so on. For example, with insight provided
through Data Analytics, companies could execute more directed marketing campaigns
based on patterns observed in their data, giving them a competitive advantage over compa-
nies that do not use this information to improve their marketing strategies. By pairing struc-
tured data with unstructured data, patterns could be discovered that create new meaning,
creating value and competitive advantage. In addition to producing more value externally,
studies show that Data Analytics affects internal processes, improving productivity, utiliza-
tion, and growth.7
And increasingly, data analytic tools are available as self-service analytics allowing users
the capabilities to analyze data by aggregating, filtering, analyzing, enriching, sorting, visual-
izing, and dashboarding for data-driven decision making on demand.
PwC notes that while data has always been important, executives are more frequently
being asked to make data-driven decisions in high-stress and high-change environments,
making the reliance on Data Analytics even greater these days!8
PROGRESS CHECK
3. Let’s assume a brand manager at Procter and Gamble identifies that an older
demographic might be concerned with the use of Tide Pods to do their laundry.
How might Procter and Gamble use Data Analytics to assess if this is a problem?
4. How might Data Analytics assess the decision to either grant overtime to current
employees or hire additional employees? Specifically, consider how Data Ana-
lytics might be helpful in reducing a company’s overtime direct labor costs in a
manufacturing setting.
5
“ Data Driven: What Students Need to Succeed in a Rapidly Changing Business World,” PwC, https://
www.pwc.com/us/en/faculty-resource/assets/pwc-data-driven-paper-feb2015.pdf, February 2015
(accessed March 20, 2019).
6
“The Trillion-Dollar Opportunity for the Industrial Sector: How to Extract Full Value from Technology,”
McKinsey Global Institute, https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/
the-trillion-dollar-opportunity-for-the-industrial-sector#, November 2018 (accessed December 2018).
7
Joseph Kennedy, “Big Data’s Economic Impact,” https://www.ced.org/blog/entry/big-datas-economic-
impact, December 3, 2014 (accessed January 9, 2016).
8
“What’s Next for Tech for Finance? Data-Driven Decision Making,” PwC, https://www.pwc.com/us/en/cfo-
direct/accounting-podcast/data-driven-decision-making.html, October 2020 (accessed December 2020).
ISTUDY
6 Chapter 1 Data Analytics for Accounting and Identifying the Questions
Auditing
Data Analytics plays an increasingly critical role in the future of audit. In a recent Forbes
Insights/KPMG report, “Audit 2020: A Focus on Change,” the vast majority of survey
respondents believe both that:
1. Audits must better embrace technology.
2. Technology will enhance the quality, transparency, and accuracy of the audit.
Indeed, “As the business landscape for most organizations becomes increasingly com-
plex and fast-paced, there is a movement toward leveraging advanced business analytic
techniques to refine the focus on risk and derive deeper insights into an organization.”9
Many auditors believe that audit data analytics will, in fact, lead to deeper insights that
will enhance audit quality. This sentiment of the impact of Data Analytics on the audit has
been growing for several years now and has given many public accounting firms incentives
to invest in technology and personnel to capture, organize, and analyze financial statement
data to provide enhanced audits, expanded services, and added value to their clients. As a
result, Data Analytics is the next innovation in the evolution of the audit and professional
accounting industry.
Given the fact that operational data abound and are easier to collect and manage, com-
bined with CEOs’ desires to utilize these data, the accounting firms may now approach
their engagements with a different mindset. No longer will they be simply checking for
errors, material misstatements, fraud, and risk in financial statements or merely be report-
ing their findings at the end of the engagement. Instead, audit professionals will now be
collecting and analyzing the company’s data similar to the way a business analyst would to
help management make better business decisions. This means that, in many cases, external
auditors will stay engaged with clients beyond the audit. This is a significant paradigm shift.
The audit process is changing from a traditional process toward a more automated one,
which will allow audit professionals to focus more on the logic and rationale behind data
queries and less on the gathering of the actual data.10 As a result, audits will not only yield
important findings from a financial perspective, but also information that can help compa-
nies refine processes, improve efficiency, and anticipate future problems.
“It’s a massive leap to go from traditional audit approaches to one that fully integrates
big data and analytics in a seamless manner.”11
Data Analytics also expands auditors’ capabilities in services like testing for fraudulent
transactions and automating compliance-monitoring activities (like filing financial reports
to the U.S. Securities and Exchange Commission [SEC] or to the Internal Revenue Service
[IRS]). This is possible because Data Analytics enables auditors to analyze the complete
dataset, rather than the sampling of the financial data done in a traditional audit. Data Ana-
lytics enables auditors to improve its risk assessment in both its substantive and detailed
testing.
9
eloitte, “Adding Insight to Audit: Transforming Internal Audit through Data Analytics,” http://www2
D
.deloitte.com/content/dam/Deloitte/ca/Documents/audit/ca-en-audit-adding-insight-to-audit.pdf
(accessed January 10, 2016).
10
PwC, “Data Driven: What Students Need to Succeed in a Rapidly Changing Business World,” http://
www.pwc.com/us/en/faculty-resource/assets/PwC-Data-driven-paper-Feb2015.pdf, February 2015
(accessed January 9, 2016).
11
EY, “How Big Data and Analytics Are Transforming the Audit,” https://eyo-iis-pd.ey.com/ARC
/documents/EY-reporting-ssue-9.pdf, posted April 2015. (accessed January 27, 2016).
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 7
Lab Connection
Lab 1-3 has you explore questions auditors would answer with Data Analytics.
Management Accounting
Of all the fields of accounting, it would seem that the aims of Data Analytics are most akin
to management accounting. Management accountants (1) are asked questions by manage-
ment, (2) find data to address those questions, (3) analyze the data, and (4) report the
results to management to aid in their decision making. The description of the management
accountant’s task and that of the data analyst appear to be quite similar, if not identical in
many respects.
Whether it be understanding costs via job order costing, understanding the activity-based
costing drivers, forecasting future sales on which to base budgets, or determining whether
to sell or process further or make or outsource its production processes, analyzing data is
critical to management accountants.
As information providers for the firm, it is imperative for management accountants to
understand the capabilities of data and Data Analytics to address management questions.
We address management accounting questions and Data Analytics in Chapter 7.
Lab Connection
Lab 1-2 and Lab 1-4 have you explore questions managers would answer with
Data Analytics.
ISTUDY
8 Chapter 1 Data Analytics for Accounting and Identifying the Questions
both its optimal response to the situation and appropriate adjustment to its financial
reporting.
It may be possible to use Data Analytics to scan the environment—that is, scan Google
searches and social media (such as Instagram and Facebook) to identify potential risks to
and opportunities for the firm. For example, in a data analytic sense, it may allow a firm to
monitor its competitors and its customers to better understand opportunities and threats
around it. For example, are its competitors, customers, or suppliers facing financial diffi-
culty that might affect the company’s interactions with them and/or open up new opportu-
nities that otherwise it wouldn’t have considered?
We address financial reporting and financial statement analysis questions and Data Ana-
lytics in Chapter 8.
Lab Connection
Lab 1-1 has you explore questions financial accountants would answer with
Data Analytics.
Tax
Traditionally, tax work dealt with compliance issues based on data from transactions that
have already taken place. Now, however, tax executives must develop sophisticated tax
planning capabilities that assist the company with minimizing its taxes in such a way to
avoid or prepare for a potential audit. This shift in focus makes tax data analytics valuable
for its ability to help tax staffs predict what will happen rather than react to what just did
happen. Arguably, one of the things that Data Analytics does best is predictive analytics—
predicting the future! An example of how tax data analytics might be used is the capability
to predict the potential tax consequences of a potential international transaction, R&D
investment, or proposed merger or acquisition in one of their most value-adding tasks, that
of tax planning!
One of the issues of performing predictive Data Analytics is the efficient organization
and use of data stored across multiple systems on varying platforms that were not originally
designed for use in the tax department. Organizing tax data into a data warehouse to be
able to consistently model and query the data is an important step toward developing the
capability to perform tax data analytics. This issue is exemplified by the 29 percent of tax
departments that find the biggest challenge in executing an analytics strategy is integrating
the strategy with the IT department and gaining access to available technology tools.12
We address tax questions and Data Analytics in Chapter 9.
PROGRESS CHECK
5. Why are management accounting and Data Analytics considered similar in many
respects?
6. How specifically will Data Analytics change the way a tax staff does its taxes?
12
eloitte, “The Power of Tax Data Analytics,” http://www2.deloitte.com/us/en/pages/tax/articles/top-ten-
D
things-about-tax-data-analytics.html (accessed October 12, 2016).
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 9
EXHIBIT 1-1
The IMPACT Cycle
Source: Isson, J. P., and J. S.
Harriott. Win with Advanced
Business Analytics: Creating
Business Value from Your
Data. Hoboken, NJ: Wiley,
2013.
13
e also note our use of the terms IMPACT cycle and IMPACT model interchangeably throughout the
W
book.
14
M. Lebied, “Your Data Won’t Speak Unless You Ask It the Right Data Analysis Questions,” Datapine,
June 21, 2017, https://www.datapine.com/blog/data-analysis-questions/ (accessed December 2020).
ISTUDY
10 Chapter 1 Data Analytics for Accounting and Identifying the Questions
15
“ One-Third of BI Pros Spend up to 90% of Time Cleaning Data,” http://www.eweek.com/database/one-
third-of-bi-pros-spend-up-to-90-of-time-cleaning-data.html, posted June 2015 (accessed March 15,
2016).
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 11
ISTUDY
12 Chapter 1 Data Analytics for Accounting and Identifying the Questions
EXHIBIT 1-2
Example of Co-
occurrence Grouping
on Amazon.com
Amazon Inc.
• Link prediction—An attempt to predict connections between two data items. This might
be used in social media. For example, because an individual might have 22 mutual
Facebook friends with me and we both attended Brigham Young University in the
same year, is there a chance we would like to be Facebook friends as well? Exhibit 1-3
provides an example of this used in Facebook. Link prediction in an accounting setting
might work to use social media to look for relationships between related parties that are
not otherwise disclosed.
EXHIBIT 1-3
Example of Link Pre-
diction on Facebook
Michael DeLeon/Getty
Images; Sam Edwards/Glow
Images; Daniel Ernst/Getty
Images; Exactostock/Super-
Stock; McGraw HIll
• Data reduction—A data approach that attempts to reduce the amount of information that
needs to be considered to focus on the most critical items (e.g., highest cost, highest
risk, largest impact, etc.). It does this by taking a large set of data (perhaps the popula-
tion) and reducing it with a smaller set that has the vast majority of the critical informa-
tion of the larger set. An example might include the potential to use these techniques in
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 13
auditing. While auditing has employed various random and stratified sampling over the
years, Data Analytics suggests new ways to highlight which transactions do not need the
same level of additional vetting (such as substantive testing) as other transactions.
Back to Step 1
Since the IMPACT cycle is iterative, once insights are gained and outcomes are tracked,
new more refined questions emerge that may use the same or different data sources with
potentially different analyses and thus, the IMPACT cycle begins anew.
PROGRESS CHECK
7. Let’s say we are trying to predict how much money college students spend on
fast food each week. What would be the response, or dependent, variable? What
would be examples of independent variables?
8. How might a data reduction approach be used in auditing to allow the auditor to
spend more time and effort on the most important (e.g., most risky, largest dollar
volume, etc.) items?
ISTUDY
14 Chapter 1 Data Analytics for Accounting and Identifying the Questions
• Draw appropriate conclusions to the business problem based on the data and make rec-
ommendations on a timely basis.
• Present their results to individual members of management (CEOs, audit managers,
etc.) in an accessible manner to each member.
Consistent with that, in this text we emphasize skills that analytic-minded accountants
should have in the following seven areas:
1. Developed analytics mindset—know when and how Data Analytics can address business
questions.
2. Data scrubbing and data preparation—comprehend the process needed to clean and
prepare the data before analysis.
3. Data quality—recognize what is meant by data quality, be it completeness, reliability, or
validity.
4. Descriptive data analysis—perform basic analysis to understand the quality of the under-
lying data and its ability to address the business question.
5. Data analysis through data manipulation—demonstrate ability to sort, rearrange, merge,
and reconfigure data in a manner that allows enhanced analysis. This may include diag-
nostic, predictive, or prescriptive analytics to appropriately analyze the data.
6. Statistical data analysis competency—identify and implement an approach that will use
statistical data analysis to draw conclusions and make recommendations on a timely basis.
7. Data visualization and data reporting—report results of analysis in an accessible way to
each varied decision maker and his or her specific needs.
We address these seven skills throughout the first four chapters in the text in hopes that
the analytic-minded accountant will develop and practice these skills to be ready to address
business questions. We then demonstrate these skills in the labs and hands-on analysis
throughout the rest of the book.
Source: “Data Analyst at a Big 4—What Is It Like? My Opinion Working as a Data Analyst at a
Big Four,” https://cryptobulls.info/data-analyst-at-a-big-4-what-is-it-like-pros-cons-ernst-young-
deloitte-pwc-kpmg, posted February 29, 2020 (accessed January 2, 2021).
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 15
EXHIBIT 1-4
Gartner Magic Quad-
rant for Business Intel-
ligence and Analytics
Platforms
Source: Sallam, R. L., C.
Howson, C. J. Idoine, T. W.
Oestreich, J. L. Richardson,
and J. Tapadinhas, “Magic
Quadrant for Business Intel-
ligence and Analytics Plat-
forms,” Gartner RAS Core
Research Notes, Gartner,
Stamford, CT (2020).
Based on Gartner’s magic quadrant, it is easy to see that Tableau and Microsoft provide
innovative solutions. While there are other tools that are popular in different industries,
such as Qlik and TIBCO, Tableau and Microsoft tools are the ones you will most likely
encounter because of their position as leaders in the Data Analytics space. For this reason,
each of the labs throughout this textbook will give you or your instructor the option to
choose either a Microsoft Track or a Tableau Track to help you become proficient in those
tools. The skills you learn as you work through the labs are transferrable to other tools as
well.
ISTUDY
16 Chapter 1 Data Analytics for Accounting and Identifying the Questions
365 online service for simple collaboration and sharing, although the most complete set of
features and compatibility with advanced plug-ins is available only on Windows.
Power Query is a tool built into Excel and Power BI Desktop on Windows that lets Excel
connect to a variety of different data sources, such as tables in Excel, databases hosted on
popular platforms like SQL Server, or through open database connectivity (ODBC) con-
nections. Power Query makes it possible to connect, manipulate, clean, and join data so you
can pull them into your Excel sheet or use them in Power BI to create summary reports and
advanced visualizations. Additionally, it tracks each step you perform so you can apply the
same transformations to new data without recreating the work from scratch.
Power BI is an analytic platform that enables generation of simple or advanced Data
Analytics models and visualizations that can be compiled into dashboards for easy sharing
with relevant stakeholders. It builds on data from Excel or other databases and can lever-
age models created with Power Query to quickly summarize key data findings. Microsoft
provides Power BI Desktop for free only on Windows or through a web-based app, though
the online version does not have all of the features of the desktop version and is primarily
used for sharing.
Power Automate is a tool that leverages robotics process automation (RPA) to automate
routine tasks and workflows, such as scraping and collecting data from nonstructured
sources, including emails and other online services. These can pull data from relevant
sources based on events, such as when an invoice is generated. Power Automate is a
web-based subscription service with a tool that works only on Windows to automate key-
strokes and mouse clicks.
EXHIBIT 1-6 Tool Tableau Prep Builder Tableau Desktop Tableau Public
Tableau Data Analytics
Good for Large datasets Large datasets Analyze and share
Tools Data summarization Advanced visualization public datasets
Data joining Dashboards
Data cleaning Presentation
Data transformation
Platform Windows/Mac/Online Windows/Mac/Online Windows/Mac/Online
Tableau Prep is primarily used for data combination, cleaning, manipulation, and
insights. It enables users to interact with data and quickly identify data quality issues with
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 17
a clear map of steps performed so others can review the cleaning process. It is available on
Windows, Mac, and Tableau Online.
Tableau Desktop can be used to generate basic to advanced Data Analytics models and
visualizations with an easy-to-use drag-and-drop interface.
Tableau Public is a free limited edition of Tableau Desktop that is specifically tailored
to sharing and analyzing public datasets. It has some significant limitations for broader
analysis.
PROGRESS CHECK
9. Given the “magic quadrant” in Exhibit 1-4, why are the software tools repre-
sented by the Microsoft and Tableau tracks considered innovative?
10. Why is having the Tableau software tools fully available on both Windows and
Mac computers an advantage for Tableau over Microsoft?
17
https://www.lendingclub.com/ (accessed September 29, 2016).
ISTUDY
18 Chapter 1 Data Analytics for Accounting and Identifying the Questions
2015
2016
2017
2018
2019
2020
$0 $58 $108 $158 $208 $258 $308 $358 $408 $458 $508 $558 $608
Total loans issued ($)
Borrowers borrow money for a variety of reasons, including refinancing other debt and
paying off credit cards, as well as borrowing for other purposes (Exhibit 1-8).
LendingClub provides datasets on the loans it approved and funded as well as data for
the loans that were declined. To address the question posed, “What are some characteristics
of rejected loans?,” we’ll use the dataset of rejected loans.
The rejected loan datasets and related data dictionary are available from your instructor
or from Connect (in Additional Student Resources).
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 19
As we learn about the data, it is important to know what is available to us. To that end,
there is a data dictionary that provides descriptions for all of the data attributes of the data-
set. A cut-out of the data dictionary for the rejected stats file (i.e., the statistics about those
loans rejected) is shown in Exhibit 1-9.
EXHIBIT 1-9
2007–2012 Lending-
Club Data Dictionary
for Declined Loan Data
Source: LendingClub
We could also take a look at the data files available for the funded loan data. However,
for our analysis in the rest of this chapter, we use the Excel file “DAA Chapter 1-1 Data”
that has rejected loan statistics from LendingClub for the time period of 2007 to 2012. It is
a cleaned-up, transformed file ready for analysis. We’ll learn more about data scrubbing and
preparation of the data in Chapter 2.
Exhibit 1-10 provides a cut-out of the 2007–2012 “Declined Loan” dataset provided.
EXHIBIT 1-10
2007–2012 Declined
Loan Applications
(DAA Chapter 1-1
Data) Dataset
Microsoft Excel, 2016
ISTUDY
20 Chapter 1 Data Analytics for Accounting and Identifying the Questions
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 21
EXHIBIT 1-11
LendingClub Declined
Loan Applications by
DTI (Debt-to-Income)
DTI bucket includes
high (debt > 20 percent
of income), medium
(“mid”) (debt between
10 and 20 percent
of income), and low
(debt < 10 percent of
income).
Microsoft Excel, 2016
EXHIBIT 1-12
LendingClub Declined
Loan Applications by
Employment Length
(Years of Experience)
Microsoft Excel, 2016
ISTUDY
22 Chapter 1 Data Analytics for Accounting and Identifying the Questions
EXHIBIT 1-13
Breakdown of Customer Excellent
Credit Scores (or Risk Those with excellent and 800–850
Scores) very good credit scores are
likely to qualify for almost
Source: Cafecredit.com
all loans and receive the
lowest interest rates. Very Good
750–799
Good
Those with good and fair 700–749
credit scores are likely to
qualify for most loans and
receive good interest rates. Fair
650–699
Poor
Those with poor and
600–649
very bad credit scores
are likely to qualify for
loans only if they have
Very Bad
sufficient collateral.
300–599
EXHIBIT 1-14
The Count of Lending-
Club Rejected Loan
Applications by Credit
or Risk Score Classifi-
cation Using PivotTable
Analysis
(PivotTable shown
here required manually
sorting rows to get in
proper order.)
Microsoft Excel, 2016
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 23
EXHIBIT 1-15
The Count of
LendingClub Declined
Loan Applications by
Credit (or Risk Score),
Debt-to-Income (DTI
Bucket), and Employ-
ment Length Using
PivotTable Analysis
(Highlighting Added)
Microsoft Excel, 2016
ISTUDY
24 Chapter 1 Data Analytics for Accounting and Identifying the Questions
EXHIBIT 1-16
The Average Debt-to-
Income Ratio (Shown
as a Percentage) by
Credit (Risk) Score for
LendingClub Declined
Loan Applications
Using PivotTable
Analysis
Microsoft Excel, 2016
given the debt they already had as compared to any of the others, suggesting a reason even
those potential borrowers with excellent credit were rejected.
Communicate Insights
Certainly further and more sophisticated analysis could be performed, but at this point we
have a pretty good idea of what LendingClub uses to decide whether to extend or reject a
loan to a potential borrower. We can communicate these insights either by showing the
PivotTables or simply stating what three of the determinants are. What is the most effective
communication? Just showing the PivotTables themselves, showing a graph of the results,
or simply sharing the names of these three determinants to the decision makers? Knowing
the decision makers and how they like to receive this information will help the analyst deter-
mine how to communicate insights.
Track Outcomes
There are a wide variety of outcomes that could be tracked. But in this case, it might be best
to see if we could predict future outcomes. For example, the data we analyzed were from
2007 to 2012. We could make our predictions for subsequent years based on what we had
found in the past and then test to see how accurate we are with those predictions. We could
also change our prediction model when we learn new insights and additional data become
available.
ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 25
PROGRESS CHECK
11. Lenders often use the data item of whether a potential borrower rents or owns
their house. Beyond the three characteristics of rejected loans analyzed in this
section, do you believe this data item would be an important determinant of
rejected loans? Defend your answer.
12. Performing your own analysis, download the rejected loans dataset titled “DAA
Chapter 1-1 Data” and perform an Excel PivotTable analysis by state (including
the District of Columbia) and figure out the number of rejected applications for
the state of California. That is, count the loans by state and see what percentage
of the rejected loans came from California. How close is that to the relative pro-
portion of the population of California as compared to that of the United States?
13. Performing your own analysis, download the rejected loans dataset titled “DAA
Chapter 1-1 Data” and run an Excel PivotTable by risk (or credit) score classifica-
tion and DTI bucket to determine the number of (or percentage of) rejected loans
requested by those rated as having an excellent credit score.
Summary
In this chapter, we discussed how businesses and accountants derive value from Data Ana-
lytics. We gave some specific examples of how Data Analytics is used in business, auditing,
managerial accounting, financial accounting, and tax accounting.
We introduced the IMPACT model and explained how it is used to address accounting
questions. And then we talked specifically about the importance of identifying the question.
We walked through the first few steps of the IMPACT model and introduced eight data
approaches that might be used to address different accounting questions. We also discussed
the data analytic skills needed by analytic-minded accountants.
We followed this up using a hands-on example of the IMPACT model, namely what are
the characteristics of rejected loans at LendingClub. We performed this analysis using vari-
ous filtering and PivotTable tasks.
■ With data all around us, businesses and accountants are looking at Data Analytics to
extract the value that the data might possess. (LO 1-1, 1-2, 1-3)
■ Data Analytics is changing the audit and the way that accountants look for risk. Now,
auditors can consider 100 percent of the transactions in their audit testing. It is also help-
ful in finding anomalous or unusual transactions. Data Analytics is also changing the way
financial accounting, managerial accounting, and taxes are done at a company. (LO 1-3)
■ The IMPACT cycle is a means of performing Data Analytics that goes all the way from iden-
tifying the question, to mastering the data, to performing data analyses and communicating
and tracking results. It is recursive in nature, suggesting that as questions are addressed,
new, more refined questions may emerge that can be addressed in a similar way. (LO 1-4)
■ Eight data approaches address different ways of testing the data: classification, regres-
sion, similarity matching, clustering, co-occurrence grouping, profiling, link prediction,
and data reduction. These are explained in more detail in Chapter 3. (LO 1-4)
■ Data analytic skills needed by analytic-minded accountants are specified and are consis-
tent with the IMPACT cycle, including the following: (LO 1-5)
◦ Developed analytics mindset.
◦ Data scrubbing and data preparation.
ISTUDY
◦ Data quality.
◦ Descriptive data analysis.
◦ Data analysis through data manipulation.
◦ Statistical data analysis competency.
◦ Data visualization and data reporting.
■ We showed an example of the IMPACT cycle using LendingClub data regarding rejected
loans to illustrate the steps of the IMPACT cycle. (LO 1-6)
Key Words
Big Data (4) Datasets that are too large and complex for businesses’ existing systems to handle utilizing
their traditional capabilities to capture, store, manage, and analyze these datasets.
classification (11) A data approach that attempts to assign each unit in a population into a few catego-
ries potentially to help with predictions.
clustering (11) A data approach that attempts to divide individuals (like customers) into groups (or
clusters) in a useful or meaningful way.
co-occurrence grouping (11) A data approach that attempts to discover associations between indi-
viduals based on transactions involving them.
Data Analytics (4) The process of evaluating data with the purpose of drawing conclusions to address
business questions. Indeed, effective Data Analytics provides a way to search through large structured and
unstructured data to identify unknown patterns or relationships.
data dictionary (19) Centralized repository of descriptions for all of the data attributes of the dataset.
data reduction (12) A data approach that attempts to reduce the amount of information that needs to
be considered to focus on the most critical items (i.e., highest cost, highest risk, largest impact, etc.).
link prediction (12) A data approach that attempts to predict a relationship between two data items.
predictor (or independent or explanatory) variable (11) A variable that predicts or explains
another variable, typically called a predictor or independent variable.
profiling (11) A data approach that attempts to characterize the “typical” behavior of an individual,
group, or population by generating summary statistics about the data (including mean, standard
deviations, etc.).
regression (11) A data approach that attempts to estimate or predict, for each unit, the numerical
value of some variable using some type of statistical model.
response (or dependent) variable (10) A variable that responds to, or is dependent on, another.
similarity matching (11) A data approach that attempts to identify similar individuals based on data
known about them.
structured data (4) Data that are organized and reside in a fixed field with a record or a file. Such
data are generally contained in a relational database or spreadsheet and are readily searchable by search
algorithms.
unstructured data (4) Data that do not adhere to a predefined data model in a tabular format.
26
ISTUDY
2. Banks frequently use credit scores from outside sources like Experian, TransUnion, and
Equifax to evaluate creditworthiness of its customers. However, if they have access to
all of their customer’s banking information, Data Analytics would allow them to evalu-
ate their customers’ creditworthiness. Banks would know how much money they have
and how they spend it. Banks would know if they had prior loans and if they were paid
in a timely manner. Banks would know where they work and the size and stability of
monthly income via the direct deposits. All of these combined, in addition to a credit
score, might be used to assess creditworthiness if customers desire a loan. It might also
give banks needed information for a marketing campaign to target potential creditwor-
thy customers.
3. The brand manager at Procter and Gamble might use Data Analytics to see what is
being said about Procter and Gamble’s Tide Pods product on social media websites
(e.g., Snapchat, Twitter, Instagram, and Facebook), particularly those that attract an older
demographic. This will help the manager assess if there is a problem with the perceptions
of its laundry detergent products.
4. Data Analytics might be used to collect information on the amount of overtime. Who
worked overtime? What were they working on? Do we actually need more full-time
employees to reduce the level of overtime (and its related costs to the company and to
the employees)? Would it be cost-effective to just hire full-time employees instead of pay-
ing overtime? How much will costs increase just to pay for fringe benefits (health care,
retirement, etc.) for new employees versus just paying existing employees for their over-
time. All of these questions could be addressed by analyzing recent records explaining
the use of overtime.
5. Management accounting and Data Analytics both (1) address questions asked by man-
agement, (2) find data to address those questions, (3) analyze the data, and (4) report the
results to management. In all material respects, management accounting and Data Analyt-
ics are similar, if not identical.
6. The tax staff would become much more adept at efficiently organizing data from multiple
systems across an organization and performing Data Analytics to help with tax planning to
structure transactions in a way that might minimize taxes.
7. The dependent variable could be the amount of money spent on fast food. Indepen-
dent variables could be proximity of the fast food, ability to cook own food, discretionary
income, socioeconomic status, and so on.
8. The data reduction approach might help auditors spend more time and effort on the most
risky transactions or on those that might be anomalous in nature. This will help them more
efficiently spend their time on items that may well be of highest importance.
9. According to the “magic quadrant,” the software tools represented by the Microsoft and
Tableau Tracks are considered innovative because they lead the market in the “ability to
execute” and “completeness of vision” dimensions.
10. Having Tableau software tools available on both the Mac and Windows computers gives
the analyst needed flexibility that is not available for the Microsoft Track, which are fully
available only on Windows computers.
11. The use of the data item whether a potential borrower owns or rents their house would
be expected to complement the risk score, debt levels (DTI bucket), and length of employ-
ment, since it can give a potential lender additional data on the financial position and
financial obligations (mortgage or rent payments) of the borrower.
12. An analysis of the rejected loans suggests that 85,793 of the total 645,414 rejected loans
were from the state of California. That represents 13.29 percent of the total rejected
loans. This is greater than the relative population of California to the United States as of
the 2010 census, of 12.1 percent (37,253,956/308,745,538).
13. A PivotTable analysis of the rejected loans suggests that more than 30.6 percent
(762/2,494) of those in the excellent risk credit score range asked for a loan with a debt-
to-income ratio of more than 20 percent.
ISTUDY
Microsoft Excel, 2016
28
ISTUDY
3. (LO 1-4) Which data approach attempts to identify similar individuals based on data
known about them?
a. Classification
b. Regression
c. Similarity matching
d. Data reduction
4. (LO 1-4) Which data approach attempts to predict connections between two data items?
a. Profiling
b. Classification
c. Link prediction
d. Regression
5. (LO 1-6) Which of these terms is defined as being a central repository of descriptions
for all of the data attributes of the dataset?
a. Big Data
b. Data warehouse
c. Data dictionary
d. Data Analytics
6. (LO 1-5) Which skills were not emphasized that analytic-minded accountants should
have?
a. Developed an analytics mindset
b. Data scrubbing and data preparation
c. Classification of test approaches
d. Statistical data analysis competency
7. (LO 1-5) In which areas were skills not emphasized for analytic-minded accountants?
a. Data quality
b. Descriptive data analysis
c. Data visualization and data reporting
d. Data and systems analysis and design
8. (LO 1-4) The IMPACT cycle includes all except the following steps:
a. perform test plan.
b. visualize the data.
c. master the data.
d. track outcomes.
9. (LO 1-4) The IMPACT cycle specifically includes all except the following steps:
a. data preparation.
b. communicate insights.
c. address and refine results.
d. perform test plan.
10. (LO 1-1) By the year 2024, the volume of data created, captured, copied, and c onsumed
worldwide will be 149 _____.
a. zettabytes
b. petabytes
c. exabytes
d. yottabytes
ISTUDY
Discussion and Analysis
®
1. (LO 1-1) The opening article “Accountants to Rely More on Big Data in 2020” sug-
gested that Data Analytics would be increasingly implementing Big Data in their busi-
ness processes. Why is that? How can Data Analytics help accountants do their jobs?
2. (LO 1-1) Define Data Analytics and explain how a university might use its techniques to
recruit and attract potential students.
3. (LO 1-2) Give a specific example of how Data Analytics creates value for businesses.
4. (LO 1-3) Give a specific example of how Data Analytics creates value for auditing.
5. (LO 1-3) How might Data Analytics be used in financial reporting? And how might it be
used in doing tax planning?
6. (LO 1-3) How is the role of management accounting similar to the role of the data
analyst?
7. (LO 1-4) Describe the IMPACT cycle. Why does its order of the processes and its recur-
sive nature make sense?
8. (LO 1-4) Why is identifying the question such a critical first step in the IMPACT process
cycle?
9. (LO 1-4) What is included in mastering the data as part of the IMPACT cycle described
in the chapter?
10. (LO 1-4) What data approach mentioned in the chapter might be used by Facebook to
find friends?
11. (LO 1-4) Auditors will frequently use the data reduction approach when considering
potentially risky transactions. Provide an example of why focusing on a portion of the
total number of transactions might be important for auditors to assess risk.
12. (LO 1-4) Which data approach might be used to assess the appropriate level of the
allowance for doubtful accounts?
13. (LO 1-6) Why might the debt-to-income attribute included in the declined loans dataset
considered in the chapter be a predictor of declined loans? How about the credit (risk)
score?
14. (LO 1-6) To address the question “Will I receive a loan from LendingClub?” we had
available data to assess the relationship among (1) the debt-to-income ratios and num-
ber of rejected loans, (2) the length of employment and number of rejected loans, and
(3) the credit (or risk) score and number of rejected loans. What additional data would
you recommend to further assess whether a loan would be offered? Why would they be
helpful?
Problems
®
1. (LO 1-4) Match each specific Data Analytics test to a specific test approach, as part of
performing a test plan:
• Classification
• Regression
• Similarity Matching
• Clustering
• Co-occurrence Grouping
• Profiling
• Link Prediction
• Data Reduction
30
ISTUDY
Test
Specific Data Analytics Test Approach
1. Predict which firms will go bankrupt and which firms will not go
bankrupt.
2. Use stratified sampling to focus audit effort on transactions with
greatest risk.
3. Work to understand normal behavior, to then be able to identify
abnormal behavior (such as fraud).
4. Look for relationships between related parties that are not
otherwise disclosed.
5. Predict which new customers resemble the company’s best
customers.
6. Predict the relationship between an investment in advertising
expenditures and subsequent operating income.
7. Segment all of the company’s customers into groups that will allow
further specific analysis.
8. The customers who buy product X will be most likely to be also
interested in product Y.
2. (LO 1-4) Match each of the specific Data Analytics tasks to the stage of the IMPACT
cycle:
• Identify the Questions
• Master the Data
• Perform Test Plan
• Address and Refine Results
• Communicate Insights
• Track Outcomes
Stage of IMPACT
Specific Data Analytics Test Cycle
1. Should we use company-specific data or macro-economic data to ad-
dress the accounting question?
2. What are appropriate cost drivers for activity-based costing purposes?
3. Should we consider using regression analysis or clustering analysis to
evaluate the data?
4. Should we use tables or graphs to show management what we’ve
found?
5. Now that we’ve evaluated the data one way, should we perform an-
other analysis to gain additional insights?
6. What type of dashboard should we use to get the latest, up-to-date
results?
3. (LO 1-5) Match the specific analysis need/characteristic to the appropriate Microsoft
Track software tool:
• Excel
• Power Query
• Power BI
• Power Automate
ISTUDY
Specific Analysis Need/Characteristic Microsoft Track Tool
1. Basic visualization
2. Robotics process automation
3. Data joining
4. Advanced visualization
5. Works on Windows/Mac/Online platforms
6. Dashboards
7. Collect data from multiple sources
8. Data cleaning
4. (LO 1-5) Match the specific analysis need/characteristic to the appropriate Tableau
Track software tool:
• Tableau Prep Builder
• Tableau Desktop
• Tableau Public
5. (LO 1-6) Navigate to the Connect Additional Student Resources page. Under Chapter 1
Data Files, download and consider the LendingClub data dictionary file “LCDataDiction-
ary,” specifically the LoanStats tab. This represents the data dictionary for the loans that
were funded. Choose among these attributes in the data dictionary and indicate which
are likely to be predictive that loans will go delinquent, or that loans will ultimately be
fully repaid and which are not predictive.
Predictive Attributes Predictive? (Yes/No)
1. date (Date when the borrower accepted the offer)
2. desc (Loan description provided by borrower)
3. dti (A ratio of debt owed to income earned)
4. grade (LC assigned loan grade)
5. home_ownership (Values include Rent, Own, Mortgage, Other)
6. loanAmnt (Amount of the loan)
7. next_pymnt_d (Next scheduled payment date)
8. term (Number of payments on the loan)
9. tot_cur_bal (Total current balance of all accounts)
6. (LO 1-6) Navigate to the Connect Additional Student Resources page. Under
Chapter 1 Data Files, download and consider the rejected loans dataset of Lending-
Club data titled “DAA Chapter 1-1 Data.” Choose among these attributes in the data
dictionary, and indicate which are likely to be predictive of loan rejection, and which
are not.
32
ISTUDY
Predictive Attributes Predictive? (Yes/No)
1. Amount Requested
2. Zip Code
3. Loan Title
4. Debt-To-Income Ratio
5. Application Date
6. Risk_Score
7. Employment Length
7. (LO 1-6) Navigate to the Connect Additional Student Resources page. Under Chapter 1
Data Files, download and consider the rejected loans dataset of LendingClub data titled
“DAA Chapter 1-1 Data” from the Connect website and perform an Excel P ivotTable by
state; then figure out the number of rejected applications for the state of Arkansas. That
is, count the loans by state and compute the percentage of the total rejected loans
in the United States that came from Arkansas. How close is that to the relative pro-
portion of the population of Arkansas as compared to the overall U.S. population (per
2010 census)? Use your browser to find the population of Arkansas and the United
States and calculate the relative percentage and answer the following questions.
7A. Multiple Choice: What is the percentage of total loans rejected in the United
States that came from Arkansas?
a. Less than 1%.
b. Between 1% and 2%.
c. More than 2%.
7B. Multiple Choice: Is this loan rejection percentage greater than the percentage of
the U.S. population that lives in Arkansas (per 2010 census)?
a. Loan rejection percentage is greater than the population.
b. Loan rejection percentage is less than the population.
8. (LO 1-6) Download the rejected loans dataset of LendingClub data titled “DAA
Chapter 1-1 Data” from Connect Additional Student Resources and do an Excel
PivotTable by state; then figure out the number of rejected applications for each state.
8A. Put the following states in order of their loan rejection percentage based on the
count of rejected loans (from high [1] to low [11]) of the total rejected loans. Does
each state’s loan rejection percentage roughly correspond to its relative propor-
tion of the U.S. population?
ISTUDY
8B. What is the state with the highest percentage of rejected loans?
8C. What is the state with the lowest percentage of rejected loans?
8D. Analysis: Does each state’s loan rejection percentage roughly correspond
to its relative proportion of the U.S. population (by 2010 U.S. census at https://
en.wikipedia.org/wiki/2010_United_States_census)?
For Problems 9, 10, and 11, we will be cleaning a data file in preparation for subse-
quent analysis.
The analysis performed on LendingClub data in the chapter (“DAA Chapter 1-1 Data”)
was for the years 2007–2012. For this and subsequent problems, please download the
rejected loans table for 2013 from Connect Additional Student Resources titled “DAA
Chapter Data 1-2.”
9. (LO 1-6) Consider the 2013 rejected loan data from LendingClub titled “DAA
Chapter 1-2 Data” from Connect Additional Student Resources. Browse the file in Excel
to ensure there are no missing data. Because our analysis requires risk scores, debt-to-
income data, and employment length, we need to make sure each of them has valid
data. There should be 669,993 observations.
a. Assign each risk score to a risk score bucket similar to the chapter. That is, classify
the sample according to this breakdown into excellent, very good, good, fair, poor,
and very bad credit according to their credit score noted in Exhibit 1-13. Classify
those with a score greater than 850 as “Excellent.” Consider using nested if–then
statements to complete this. Or sort by risk score and manually input into appropri-
ate risk score buckets.
b. Run a PivotTable analysis that shows the number of loans in each risk score bucket.
Which risk score bucket had the most rejected loans (most observations)? Which
risk score bucket had the least rejected loans (least observations)? Is it similar to
Exhibit 1-14 performed on years 2007–2012?
10. (LO 1-6) Consider the 2013 rejected loan data from LendingClub titled “DAA
Chapter 1-2 Data.” Browse the file in Excel to ensure there are no missing data. Because
our analysis requires risk scores, debt-to-income data, and employment length, we need
to make sure each of them has valid data. There should be 669,993 observations.
a. Assign each valid debt-to-income ratio into three buckets (labeled DTI bucket) by classi-
fying each debt-to-income ratio into high (>20.0 percent), medium (10.0–20.0 percent),
and low (<10.0 percent) buckets. Consider using nested if–then statements to com-
plete this. Or sort the row and manually input.
b. Run a PivotTable analysis that shows the number of loans in each DTI bucket.
Which DTI bucket had the highest and lowest grouping for this rejected Loans dataset?
Any interpretation of why these loans were rejected based on debt-to-income ratios?
11. (LO 1-6) Consider the 2013 rejected loan data from LendingClub titled “DAA
Chapter 1-2 Data.” Browse the file in Excel to ensure there are no missing data.
Because our analysis requires risk scores, debt-to-income data, and employment
length, we need to make sure each of them has valid data. There should be 669,993
observations.
a. Assign each risk score to a risk score bucket similar to the chapter. That is, classify
the sample according to this breakdown into excellent, very good, good, fair, poor,
and very bad credit according to their credit score noted in Chapter 1. Classify those
with a score greater than 850 as “Excellent.” Consider using nested if-then state-
ments to complete this. Or sort by risk score and manually input into appropriate risk
score buckets (similar to Problem 9).
b. Assign each debt-to-income ratio into three buckets (labeled DTI bucket) by classify-
ing each debt-to-income ratio into high (>20.0 percent), medium (10.0–20.0 percent),
34
ISTUDY
and low (<10.0 percent) buckets. Consider using nested if-then statements to com-
plete this. Or sort the row and manually classify into the appropriate bucket.
c. Run a PivotTable analysis to show the number of excellent risk scores but high DTI
bucket loans in each employment year bucket.
Which employment length group had the most observations to go along with excel-
lent risk scores but high debt-to-income? Which employment year group had the least
observations to go along with excellent risk scores but high debt-to-income? Analysis:
Any interpretation of why these loans were rejected?
ISTUDY
LABS ®
Microsoft | Excel
Tableau Track: Lab instructions for Tableau tools, including Tableau Prep and Tableau
Desktop, will appear in a blue box like this:
Tableau | Desktop
Throughout the lab you will be asked to answer questions about the process and the
results. Add your screenshots to your screenshot lab document. All objective and analysis
questions should be answered in Connect or included in your screenshot lab document,
depending on your instructor’s preferences.
36
ISTUDY
Lab 1-0 Part 1 Analysis Questions (LO 1-1, 1-5)
AQ1. What is the purpose of taking screenshots of your progress through the labs?
(Answer this in Connect or write your response in your lab document.)
1. If you haven’t already, download and install the latest version of Excel and
Power BI Desktop on your Windows computer or log on to the remote
desktop.
a. To install Excel, if your university provides Microsoft Office, go to portal.
office.com and click Install Office.
b. To install Power BI Desktop, search for Power BI Desktop in the Micro-
soft Store and click Install.
c. To access both Excel and Power BI Desktop on the remote desktop, go to
waltonlab.uark.edu and log in with the username and password provided
by your instructor.
2. Open Excel and create a new blank workbook.
3. From the ribbon, click Data> Get Data > Launch Power Query Editor. A
blank window will appear.
4. Take a screenshot (label it 1-0MA) of the Power Query Editor window and
paste it into your lab document.
a. To take a screenshot in Windows:
1. Open the Start menu and search for “Snipping Tool” or “Snip &
Sketch”.
2. Click New (Rectangular Snip) and draw a rectangle across your
screen that includes your entire window. A preview window with your
screenshot will appear.
3. Press Ctrl + C to copy your screenshot.
4. Go to your lab document and press Ctrl + V to paste the screenshot
into your document.
b. To take a screenshot on a Mac:
1. Press Cmd + Shift + 4 and draw a rectangle across your screen that
includes your entire window. Your screenshot will be saved in your
Desktop folder.
ISTUDY
2. Navigate to your Desktop folder and drag the screenshot file into
your lab document.
5. Close the Power Query Editor and close your Excel workbook.
6. Open Power BI Desktop and close the welcome screen.
7. Take a screenshot (label it 1-0MB) of the Power BI Desktop workspace
and paste it into your lab document.
8. Close Power BI Desktop.
1. If you haven’t already, download and install the latest version of Tableau Prep
and Tableau Desktop on your computer or log on to the remote desktop.
a. To install Tableau Prep and Tableau Desktop, go to tableau.com
/academic/students and click Get Tableau for Free. Complete the form,
then download and run the installers for both applications. Be sure to
register using your school email address (ending in .edu)—this will help
ensure that your application for a student license will be approved.
b. To access both Tableau Prep and Tableau Desktop on a remote desktop,
go to waltonlab.uark.edu and log in with the username and password
provided by your instructor.
2. Open Tableau Prep and open a sample flow.
3. Take a screenshot (label it 1-0TA) of the blank Tableau Prep window and
paste it into your lab document.
a. To take a screenshot in Windows:
1. Open the Start menu and search for “Snipping Tool” or “Snip &
Sketch”.
2. Click New (Rectangular Snip) and draw a rectangle across your
screen that includes your entire window. A preview window with your
screenshot will appear.
3. Press Ctrl + C to copy your screenshot.
4. Go to your lab document and press Ctrl + V to paste the screenshot
into your document.
b. To take a screenshot on a Mac:
1. Press Cmd + Shift + 4 and draw a rectangle across your screen that
includes your entire window. Your screenshot will be saved in your
Desktop folder.
2. Navigate to your Desktop folder and drag the screenshot file into
your lab document.
4. Close Tableau Prep.
5. Open Tableau Desktop and create a new workbook.
6. Choose a sample workbook from the selection screen or press the Esc key.
7. Take a screenshot (label it 1-0TB) of the blank Tableau Desktop workbook
and paste it into your lab document.
8. Close Tableau Desktop.
38
ISTUDY
Lab 1-0 Part 2 Objective Questions (LO 1-1, 1-5)
OQ1. Where did you go to complete this lab activity? (Answer this in Connect or write
your response in your lab document.)
OQ2. What type of computer operating system do you normally use? (Answer this in
Connect or write your response in your lab document.)
ISTUDY
expected answer is “Apple Inc’s gross margin has increased slightly in the past 3
years,” this tells you what attributes (or fields) to look for: company name, gross
margin (sales revenues – cost of goods sold), year.
40
ISTUDY
Lab 1-2 Data Analytics Questions in Managerial Accounting
Case Summary: Each day as you work in your company’s credit department, you must eval-
uate the credit worthiness of new and existing customers. As you observe the credit applica-
tion process, you wonder if there might be an opportunity to look at data from consumer
lending to see if you can help improve your company’s process. You are asked to evaluate
LendingClub, a U.S.-based, peer-to-peer lending company, headquartered in San Francisco,
California. LendingClub facilitates both borrowing and lending by providing a platform for
unsecured personal loans between $1,000 and $35,000. The loan period is for either 3 or 5
years. You should begin by identifying appropriate questions and developing a hypothesis
for each question. Then, using publicly available data, you should identify data fields and
values that could help answer your questions.
ISTUDY
LAB TABLE 1-2A Attribute Description
Names and Descrip-
id Loan identification number
tions of Selected Data
Attributes Collected by member_id Membership identification number
LendingClub loan_amnt Requested loan amount
emp_length Employment length
issue_d Date of loan issue
loan_status Fully paid or charged off
pymnt_plan Payment plan: yes or no
purpose Loan purpose: e.g., wedding, medical, debt_consolidation, car
zip_code Zip code
addr_state State
dti Debt-to-income ratio
delinq_2y Late payments within the past 2 years
earliest_cr_line Oldest credit account
inq_last_6mnths Credit inquiries in the past 6 months
open_acc Number of open credit accounts
revol_bal Total balance of all credit accounts
revol_util Percentage of available credit in use
total_acc Total number of credit accounts
application_type Individual or joint application
42
ISTUDY
Lab 1-3 Part 1 Identify the Questions
Your audit team has been tasked with identifying potential internal control weaknesses
within the order-to-cash process. You have been asked to consider what the risk of internal
control weakness might look like and how the data might help identify it.
Before you begin the lab, you should create a new blank Word document where you will
record your screenshot and save it as Lab 1-3 [Your name] [Your email address].docx.
ISTUDY
Lab 1-3 Submit Your Screenshot Lab Document
Verify that you have captured your required screenshot and have answered any questions
your instructor has assigned, then upload your screenshot lab document to Connect or the
location indicated by your instructor.
Lab 1-4 C
omprehensive Case: Questions about Dillard’s
Store Data
Case Summary: Dillard’s is a department store with approximately 330 stores in 29 states in
the United States. Its headquarters is located in Little Rock, Arkansas. You can learn more
about Dillard’s by looking at finance.yahoo.com (ticker symbol = DDS) and the Wikipedia
site for DDS. You’ll quickly note that William T. Dillard II is an accounting grad of the
University of Arkansas and the Walton College of Business, which may be why he shared
transaction data with us to make available for this lab and labs throughout this text. In this
lab, you will identify appropriate questions for a retailer. Then, translate questions into tar-
get tables, fields, and values in the Dillard’s database.
The Dillard’s Department Store Database contains retail sales information gathered from
store sales transactions. The sale process begins when a customer brings items intended
for purchase (clothing, jewelry, home décor, etc.) to any store register. A Dillard’s sales
associate scans the individual items to be purchased with a barcode reader. This popu-
lates the transaction table (TRANSACT), which will later be used to generate a sales
receipt listing the item, department, and cost information (related price, sale price, etc.)
for the customer. When the customer provides payment for the items, payment details
are recorded in the transaction table, the receipt is printed, and the transaction is com-
plete. Other tables are used to store information about stores, products, and departments.
Source: http://walton.uark.edu/enterprise/dillardshome.php (accessed July 15, 2021).
This is a gifted dataset that is based on real operational data. Like any real database,
integrity problems may be noted. This can provide a unique opportunity not only to be
exposed to real data, but also to illustrate the effects of data integrity problems.
For this lab, you should rely on your creativity and prior business knowledge to answer
the following analysis questions. Answer these questions in your lab doc or in Connect and
then continue to the next part of this lab.
44
ISTUDY
CUSTOMER table: LAB EXHIBIT 1-4
Attribute Description Sample values Dillard’s Sales Trans-
action Tables and
CUST_ID Unique identifier representing a cus- 219948527, 219930818
tomer instance
Attributes
DEPARTMENT table:
Attribute Description Sample values
DEPT The Dillard’s unique identifier for a collec- 0471, 0029
tion of merchandise within a store format
DEPT_DESC The name for a department collection (low- “Christian Dior”, “REBA”
est level of the category hierarchy) of mer-
chandise within a store format.
DEPTDEC The first three digits of a department code, a 047X, 002X
way to classify departments at a higher level.
DEPTDEC_DESC Descriptive name representing the decade ‘BASICS’, ‘TREATMENT’
(middle level of the category hierarchy) to
which a department belongs.
DEPTCENT The first two digits of a department code, a 04XX, 00XX
way to classify departments at a higher level.
DEPTCENT_DESC The descriptive name of the century (top CHILDRENS, COSMETICS
level of the category hierarchy).
SKU table:
Attribute Description Sample values
SKU Unique identifier for an item, identifies the 0557578, 6383039
item by size within a color and style for a
particular vendor.
DEPT The Dillard’s unique identifier for a collec- 0134, 0343
tion of merchandise within a store format.
SKU_CLASS Three-character alpha/numeric classification K51, 220
code used to define the merchandise. Class
requirements vary by department.
SKU_STYLE The Dillard’s numeric identifier for a style of 091923690, LBF41728
merchandise.
UPC A number provided by vendors to identify 889448437421, 44212146767
their product to the size level.
COLOR Color of an item. BLACK, PINEBARK
SKU_SIZE Size of an item. Product sizes are not stan- 6, 085M
dardized and issued by vendor
BRAND_NAME The item’s brand. Stride Rite, UNKNOWN
CLASSIFICATION Category used to sort products into logical Dress Shoe
groups.
PACKSIZE Number that describes how many of the 001, 002
product come in a package
ISTUDY
SKU_STORE table:
Attribute Description Sample values
STORE The numerical identifier for a Dillard’s store. 915, 701
SKU Unique identifier for an item, identifies the item by size 4305296, 6137609
within a color and style for a particular vendor.
RETAIL The price of an item. 11.90, 45.15
COST The price charged by a vendor for an item. 8.51, 44.84
STORE table:
Attribute Description Sample values
STORE The numerical identifier for any type of Dillard’s location. 767, 460
DIVISION The division to which a location is assigned for operational 07, 04
purposes.
CITY The city where the store is located. IRVING, MOBILE
STATE The state abbreviation where the store is located. MO, AL
ZIP_CODE The 5-digit zip code of a store’s address. 70601, 35801
ZIP_SECSEG The 4-digit code of a neighborhood within a specific zip 5052, 6474
code.
TRANSACT table:
Attribute Description Sample values
TRANSACTION_ID Unique numerical identifier for each scan of an item at a 40333797, 15129264
register.
TRAN_DATE Calendar date the transaction occurred in a store. 1/1/2015, 5/19/2014
STORE The numerical identifier for any type of Dillard’s location. 716, 205
REGISTER The numerical identifier for the register where the item was 91, 55, 12
scanned.
TRAN_NUM Sequential number of transactions scanned on a register. 184, 14
TRAN_TIME Time of day the transaction occurred. 1839, 1536
CUST_ID Unique identifier representing the instance of a customer. 118458688, 115935775
TRAN_LINE_NUM Sequential number of each scan or element in a transaction. 3, 2
MIC Manufacturer Identification Code used to uniquely identify 154, 128, 217
a vendor or brand within a department.
TRAN_TYPE An identifier for a purchase or return type of transaction or P, R
line item
ORIG_PRICE The original unit price of an item before discounts. 20.00, 6.00
SALE_PRICE The discounted unit price of an item. 15.00, 2.64, 6.00
TRAN_AMT The total pre-tax dollar amount the customer paid in a 15.00, 2.64
transaction.
TENDER_TYPE The type of payment a customer used to complete the BANK, DLRD,
transaction. DAMX
SKU Unique identifier for an item, identifies the item by size 6107653, 9999999950
within a color and style for a particular vendor.
46
ISTUDY
Lab 1-4 Part 2 Analysis Questions (LO 1-1, 1-3, 1-4)
AQ1. You’re trying to learn about where Dillard’s stores are located to identify loca-
tions for the next additional store. Consider the STORE table. What questions
could be asked about store location given data availability?
AQ2. What questions would you have regarding data fields in the SKU table that
could be used to help address the cost of shipping? What additional information
would be helpful to address this question?
The Dillard’s Department Store Database contains retail sales information gathered
from store sales transactions. The sale process begins when a customer brings items
intended for purchase (clothing, jewelry, home décor, etc.) to any store register. A
Dillard’s sales associate scans the individual items to be purchased with a barcode
reader. This populates the transaction table (TRANSACT), which will later be used
to generate a sales receipt listing the item, department, and cost information (related
price, sale price, etc.) for the customer. When the customer provides payment for the
items, payment details are recorded in the transaction table, the receipt is printed,
and the transaction is complete. Other tables are used to store information about
stores, products, and departments.
Source: http://walton.uark.edu/enterprise/dillardshome.php (accessed July 15, 2021).
This is a gifted dataset that is based on real operational data. Like any real database, integ-
rity problems may be noted. This can provide a unique opportunity not only to be exposed to
real data, but also to illustrate the effects of data integrity problems. The TRANSACT table
itself contains 107,572,906 records. Analyzing the entire population would take a significant
amount of computational time, especially if multiple users are querying it at the same time.
ISTUDY
Rev. Confirming Pages
In Part 1 of this lab, you will learn how to load the Dillard’s data into either Excel +
Power Query or Tableau Prep so that you can extract, transform, and load the data for later
assignments. You will also filter the data to a more manageable size. In Part 2, you will learn
how to load the Dillard’s data into either Power BI Desktop or Tableau Desktop to prepare
your data for visualization and Data Analytics models.
Tableau | Prep
48
Tableau | Desktop
50
ISTUDY
a. TRANSACT and STORE Note: Future labs may ask you to load different
tables.
b. Verify the relationship includes the Store attribute from both tables and
close the Relationships window.
5. Click the TRANSACT table and click Update Now to preview the data.
6. Take a screenshot (label it 1-5TC).
7. In the top-right corner of the Data Source screen, click Add below Filters.
a. Click Add. . . .
b. Choose Tran Date and click OK.
c. Choose Range of Dates and click Next.
d. Drag the sliders to limit the data from 1/1/2014 to 1/7/2014 and click
OK. Note: Future labs may ask you to load different date ranges.
e. Take a screenshot (label it 1-5TD).
f. Click OK to return to the Data Source screen.
8. Click the TRANSACT table and then click Update Now to preview the data.
9. When you are finished answering the lab questions you may close Tableau.
Save your file as Lab 1-5 Dillard’s Filter.twb.
Note: Tableau will try to query the server after each change you make and will
take a up to a minute. After each change, click Cancel to stop the query until you’re
ready to prepare the final report.
ISTUDY
Chapter 2
Mastering the Data
A Look Back
Chapter 1 defined Data Analytics and explained that the value of Data Analytics is in the insights it provides.
We described the Data Analytics Process using the IMPACT cycle model and explained how this process is used
to address both business and accounting questions. We specifically emphasized the importance of identifying
appropriate questions that Data Analytics might be able to address.
A Look Ahead
Chapter 3 describes how to go from defining business problems to analyzing data, answering questions, and address-
ing business problems. We identify four types of Data Analytics (descriptive, diagnostic, predictive, and prescriptive
analytics) and describe various approaches and techniques that are most relevant to analyzing accounting data.
52
ISTUDY
We are lucky to live in a world in which data are abundant. How
ever, even with rich sources of data, when it comes to being
able to analyze data and turn them into useful information and
insights, very rarely can an analyst hop right into a dataset and
begin analyzing. Datasets almost always need to be cleaned
and validated before they can be used. Not knowing how to
clean and validate data can, at best, lead to frustration and poor
insights and, at worst, lead to horrible security violations. While
this text takes advantage of open source datasets, these data
sets have all been scrubbed not only for accuracy, but also to
protect the security and privacy of any individual or company
Wichy/Shutterstock
whose details were in the original dataset.
In 2015, a pair of researchers named Emil Kirkegaard and
Julius Daugbejerg Bjerrekaer scraped data from OkCupid, a free dating website, and provided the data onto the
“Open Science Framework,” a platform researchers use to obtain and share raw data. While the aim of the Open
Science Framework is to increase transparency, the researchers in this instance took that a step too far—and a step
into illegal territory. Kirkegaard and Bjerrekaer did not obtain permission from OkCupid or from the 70,000 OkCupid
users whose identities, ages, genders, religions, personality traits, and other personal details maintained by the dat
ing site were provided to the public without any work being done to anonymize or sanitize the data. If the researchers
had taken the time to not just validate that the data were complete, but also to sanitize them to protect the individu
als’ identities, this would not have been a threat or a news story. On May 13, 2015, the Open Science Framework
removed the OkCupid data from the platform, but the damage of the privacy breach had already been done.1
A 2020 report suggested that “Any consumer with an average number of apps on their phone—anywhere between
40 and 80 apps—will have their data shared with hundreds or perhaps thousands of actors online,” said Finn Myrstad,
the digital policy director for the Norwegian Consumer Council, commenting specifically about dating apps.2
All told, data privacy and ethics will continue to be an issue for data providers and data users. In this chapter, we
look at the ethical considerations of data collection and data use as part of mastering the data.
OBJECTIVES
After reading this chapter, you should be able to:
LO 2-1 Understand available internal and external data sources and how data
are organized in an accounting information system.
LO 2-2 Understand how data are stored in a relational database.
LO 2-3 Explain and apply extraction, transformation, and loading (ETL)
techniques to prepare the data for analysis.
LO 2-4 Describe the ethical considerations of data collection and data use.
1
B. Resnick, “Researchers Just Released Profile Data on 70,000 OkCupid Users without Permission,”
Vox, 2016, http://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release (accessed
October 31, 2016).
2
N. Singer and A. Krolik, “Grindr and OkCupid Spread Personal Details, Study Says,” New York Times,
January 13, 2020, https://www.nytimes.com/2020/01/13/technology/grindr-apps-dating-data-tracking.
html (accessed December 2020).
ISTUDY
54 Chapter 2 Mastering the Data
As you learned in Chapter 1, Data Analytics is a process, and we follow an established Data
Analytics model called the IMPACT cycle.3 The IMPACT cycle begins with identifying
business questions and problems that can be, at least partially, addressed with data (the “I”
in the IMPACT model). Once the opportunity or problem has been identified, the next step
is mastering the data (the “M” in the IMPACT model), which requires you to identify and
obtain the data needed for solving the problem. Mastering the data requires a firm under-
standing of what data are available to you and where they are stored, as well as being skilled
in the process of extracting, transforming, and loading (ETL) the data in preparation for
data analysis. While the extraction piece of the ETL process may often be completed by the
information systems team or a database administrator, it is also possible that you will have
access to raw data that you will need to extract out of the source database. Both methods
of requesting data for extraction and of extracting data yourself are covered in this chapter.
The mastering the data step can be described via the ETL process. The ETL process is
made up of the following five steps:
Step 1 Determine the purpose and scope of the data request (extract).
Step 2 Obtain the data (extract).
Step 3 Validate the data for completeness and integrity (transform).
Step 4 Clean the data (transform).
Step 5 Load the data in preparation for data analysis (load).
This chapter will provide details for each of these five steps.
ISTUDY
Chapter 2 Mastering the Data 55
Exhibit 2-1 provides an example of different categories of external data sources includ-
ing economic, financial, governmental, and other sources. Each of these may be useful in
addressing accounting and business questions.
EXHIBIT 2-1 Potential External Data Sources Available to Address Business and Accounting Questions
ISTUDY
56 Chapter 2 Mastering the Data
1 1
ISTUDY
Chapter 2 Mastering the Data 57
outweighs the downside of having to export, validate, and sanitize the data every time you
need to analyze the information.
Storing data in a normalized, relational database instead of a flat file ensures that data
are complete, not redundant, and that business rules and internal controls are enforced;
it also aids communication and integration across business processes. Each one of these
benefits is detailed here:
• Completeness. Ensures that all data required for a business process are included in the
dataset.
• No redundancy. Storing redundant data is to be avoided for several reasons: It takes
up unnecessary space (which is expensive), it takes up unnecessary processing to run
reports to ensure that there aren’t multiple versions of the truth, and it increases the
risk of data-entry errors. Storing data in flat files yields a great deal of redundancy, but
normalized relational databases require there to be one version of the truth and for
each element of data to be stored in only one place.
• Business rules enforcement. As will become increasingly evident as we progress
through the material in this text, relational databases can be designed to aid in the
placement and enforcement of internal controls and business rules in ways that flat
files cannot.
• Communication and integration of business processes. Relational databases should
be designed to support business processes across the organization, which results
in improved communication across functional areas and more integrated business
processes.4
It is valuable to spend some time basking in the benefits of storing data in a relational data-
base because it is not necessarily easier to do so when it comes to building the data model or
understanding the structure. It is arguably more complex to normalize your data than it is to
throw redundant data without business rules or internal controls into a spreadsheet.
4
G. C. Simsion and G. C. Witt, Data Modeling Essentials (Amsterdam: Morgan Kaufmann, 2005).
ISTUDY
58 Chapter 2 Mastering the Data
table with a lot of redundancy. While this is often ideal for analyzing data, when the data
are stored in the database, each group of information is stored in a separate table. Then, the
tables that are related to one another are identified (e.g., Supplier and Purchase Order are
related; it’s important to know which Supplier the Purchase Order is from). The relation-
ship is created by placing a foreign key in one of the two tables that are related. The foreign
key is another type of attribute, and its function is to create the relationship between two
tables. Whenever two tables are related, one of those tables must contain a foreign key to
create the relationship.
The other columns in a table are descriptive attributes. For example, Supplier Name is
a critical piece of data when it comes to understanding the business process, but it is not
necessary to build the data model. Primary and foreign keys facilitate the structure of a rela-
tional database, and the descriptive attributes provide actual business information.
Refer to Exhibit 2-2, the database schema for a typical procure-to-pay process. Each table
has an attribute with the letters “PK” next to them—these are the primary keys for each
table. The primary key for the Materials Table is “Item_Number,” the primary key for the
Purchase Order Table is “PO_Number,” and so on. Several of the tables also have attributes
with the letters “FK” next to them—these are the foreign keys that create the relationship
between pairs of tables. For example, look at the relationship between the Supplier Table
and the Purchase Order Table. The primary key in the Supplier Table is “Supplier ID.” The
line between the two tables links the primary key to a foreign key in the Purchase Order
Table, also named “Supplier ID.”
The Line Items Table in Exhibit 2-3 has so much detail in it that it requires two attributes
to combine as a primary key. This is a special case of a primary key often referred to as a
composite primary key, in which the two foreign keys from the tables that it is linking com-
bine to make up a unique identifier. The theory and details that support the necessity of this
linking table are beyond the scope of this text—if you can identify the primary and foreign
keys, you’ll be able to identify the data that you need to request. Exhibit 2-4 shows a subset
of the data that are represented by the Purchase Order table. You can see that each of the
attributes listed in the class diagram appears as a column, and the data for each purchase
order are accounted for in the rows.
ISTUDY
Chapter 2 Mastering the Data 59
PROGRESS CHECK
1. Referring to Exhibit 2-2, locate the relationship between the Supplier and
Purchase Order tables. What is the unique identifier of each table? (The unique
identifier attribute is called the primary key—more on how it’s determined in the
next learning objective.) Which table contains the attribute that creates the rela
tionship? (This attribute is called the foreign key—more on how it’s determined in
the next learning objective.)
2. Referring to Exhibit 2-2, review the attributes in the Purchase Order table. There
are two foreign keys listed in this table that do not relate to any of the tables in
the diagram. Which tables do you think they are? What type of data would be
stored in those two tables?
3. Refer to the two tables that you identified in Progress Check 2 that would
relate to the Purchase Order table, but are not pictured in this diagram. Draw
a sketch of what the UML Class Diagram would look like if those tables were
included. Draw the two classes to represent the two tables (i.e., rectangles), the
relationships that should exist, and identify the primary keys for the two new
tables.
DATA DICTIONARIES
In the previous section, you learned about how data are stored by focusing on the
procure-to-pay database schema. Viewing schemas and processes in isolation clarifies each
individual process, but it can also distort reality—these schemas typically do not represent
their own separate databases. Rather, each process-specific database schema is a piece of a
greater whole, all combining to form one integrated database.
As you can imagine, once these processes come together to be supported in one data-
base, the amount of data can be massive. Understanding the processes and the basics of
how data are stored is critical, but even with a sound foundation, it would be nearly impos-
sible for an individual to remember where each piece of data is stored, or what each piece
of data represents.
Creating and using a data dictionary is paramount in helping database administrators
maintain databases and analysts identify the data they need to use. In Chapter 1, you
were introduced to the data dictionary for the LendingClub data for rejected loans (DAA
Chapter 1-1 Data). The same cut-out of the LendingClub data dictionary is provided in
Exhibit 2-5 as a reminder.
Because the LendingClub data are provided in a flat file, the only information nec-
essary to describe the data are the attribute name (e.g., Amount Requested) and a
description of that attribute. The description ensures that the data in each attribute are
used and analyzed in the appropriate way—it’s always important to remember that tech-
nology will do exactly what you tell it to, so you must be smarter than the computer!
If you run analysis on an attribute thinking it means one thing, when it actually means
another, you could make some big mistakes and bad decisions even when you are work-
ing with data validated for completeness and integrity. It’s critical to get to know the
data through database schemas and data dictionaries thoroughly before attempting to
do any data analysis.
When you are working with data stored in a relational database, you will have more
attributes to keep track of in the data dictionary. Exhibit 2-6 provides an example of a data
dictionary for a generic Supplier table:
ISTUDY
60 Chapter 2 Mastering the Data
Primary or Foreign Key? Required Attribute Name Description Data Type Default Value Field Size Notes
PK Y Supplier ID Unique Identifier for Each Number n/a 10
Supplier
N Supplier Name Official Name of Supplier Short Text n/a 30
FK N Supplier Type Type Code for Different Number Null 10 1: Vendor
Supplier Categories 2: Misc
PROGRESS CHECK
4. What is the purpose of the primary key? A foreign key? A nonkey (descriptive)
attribute?
5. How do data dictionaries help you understand the data from a database or flat
file?
ISTUDY
Chapter 2 Mastering the Data 61
Extract
Determine exactly what data you need in order to answer your business questions. Request-
ing data is often an iterative process, but the more prepared you are when requesting data,
the more time you will save for yourself and the database team in the long run.
Requesting the data involves the first two steps of the ETL process. Each step has ques-
tions associated with it that you should try to answer.
Lab Connection
Lab 2-1 has you explore work through the process of requesting data from IT.
5
T. Singleton, “What Every IT Auditor Should Know about Data Analytics,” n.d., http://www.isaca.
org/Journal/archives/2013/Volume-6/Pages/What-Every-IT-Auditor-Should-Know-About-Data-Analytics.
aspx#2.
ISTUDY
62 Chapter 2 Mastering the Data
In a later chapter, you will be provided a deep dive into the audit data standards (ADS)
developed by the American Institute of Certified Public Accountants (AICPA).6 The aim
of the ADS is to alleviate the headaches associated with data requests by serving as a guide
to standardize these requests and specify the format an auditor desires from the company
being audited. These include the following:
1. Order-to-Cash subledger standards
2. Procure-to-Pay subledger standards
3. Inventory subledger standards
4. General Ledger standards
While the ADS provide an opportunity for standardization, they are voluntary. Regard-
less of whether your request for data will conform to the standards, a data request form
template (as shown in Exhibit 2-7) can make communication easier between data requester
and provider.
EXHIBIT 2-7 Requester Name:
Example Standard Data
Request Form Requester Contact Number:
Please provide a description of the information needed (indicate which tables and which fields you require):
Format you wish the data to be delivered in (circle one): Spreadsheet Text File
Word Document Other: _____
Request Date:
Required Date:
Intended Audience:
Once the data are received, you can move on to the transformation phase of the ETL
process. The next step is to ensure the completeness and integrity of the extracted data.
ISTUDY
Chapter 2 Mastering the Data 63
After identifying the goal of the data analysis project in the first step of the IMPACT
cycle, you can follow a similar process to how you would request the data if you were going
to extract it yourself:
1. Identify the tables that contain the information you need. You can do this by looking
through the data dictionary or the relationship model.
2. Identify which attributes, specifically, hold the information you need in each table.
3. Identify how those tables are related to each other.
Once you have identified the data you need, you can start gathering the information.
There are a variety of methods that you could take to retrieve the data. Two will be explained
briefly here—SQL and Excel—and there is a deep dive into SQL in Appendices D and E, as
well as a deep dive into Excel’s VLookup and Index/Match in Appendix B.
SQL: “Structured Query Language” (SQL, often pronounced sequel) is a computer lan-
guage to interact with data (tables, records, and attributes) in a database by creating, updat-
ing, deleting, and extracting. For Data Analytics we only need to focus on extracting data
that match the criteria of our analysis goals. Using SQL, we can combine data from one
or more tables and organize the data in a way that is more intuitive and useful for data
analysis than the way the data are stored in the relational database. A firm understanding of
the data—the tables, how they are related, and their respective primary and foreign keys—is
integral to extracting the data.
Typically, data should be stored in the database and analyzed in another tool such as Excel,
IDEA, or Tableau. However, you can choose to extract only the portion of the data that you
wish to analyze via SQL instead of extracting full tables and transforming the data in Excel,
IDEA, or Tableau. This is especially preferable when the raw data stored in the database are
large enough to overwhelm Excel. Excel 2016 can hold only 1,048,576 rows on one spread-
sheet. When you attempt to bring in full tables that exceed that amount, even when you use
Excel’s powerful Power BI tools, it will slow down your analysis if the full table isn’t necessary.
As you will explore in labs throughout this textbook, SQL isn’t only directly within the
database. When you plan to perform your analysis in Excel, Power BI, or Tableau, each tool
has an SQL option for you to directly connect to the database and pull in a subset of the
data.
There is more description about writing queries and a chance to practice creating joins
in Appendix E.
Microsoft Excel or Power BI: When data are not stored in a relational database, or are not
too large for Excel, the entire table can be analyzed directly in a spreadsheet. The advantage
is that further analysis can be done in Excel or Power BI and it is beneficial to have all the
data to drill down into more detail once the initial question is answered. This approach is
often simpler for doing exploratory analysis (more on this in a later chapter). Understand-
ing the primary key and foreign key relationships is also integral to working with the data
directly in Excel.
When your data are stored directly in Excel, you can also use Excel functions and for-
mulas to combine data from multiple Excel tables into one table, similar to how you can
join data with SQL in Access or another relational database. Two of Excel’s most useful
techniques for looking up data from two separate tables and matching them based on a
matching primary key/foreign key relationship is the VLookup or Index/Match functions.
There are a variety of ways that the VLookup or Index/Match function can be used, but for
extracting and transforming data it is best used to add a column to a table.
More information about using VLookup and Index/Match functions in Excel is provided
in Appendix B.
The question of whether to use SQL or Excel’s tools (such as VLookup) is primar-
ily answered by where the data are stored. Because data are most frequently stored in a
ISTUDY
64 Chapter 2 Mastering the Data
relational database (as discussed earlier in this chapter, due to the efficiency and data integ-
rity benefits relational databases provide), SQL will often be the best option for retrieving
data, after which those data can be loaded into Excel or another tool for further analysis.
Another benefit of SQL queries is that they can be saved and reproduced at will or at
regular intervals. Having a saved SQL query can make it much easier and more efficient
to re-create data requests. However, if the data are already stored in a flat file in Excel,
there is little reason to use SQL. Sometimes when you are performing exploratory analysis,
even if the data are stored in a relational database, it can be beneficial to load entire tables
into Excel and bypass the SQL step. This should be considered carefully before doing so,
though, because relational databases handle large amounts of data much better than Excel
can. Writing SQL queries can also make it easier to load only the data you need to analyze
into Excel so that you do not overwhelm Excel’s resources.
Transform
Step 3: Validating the Data for Completeness and Integrity
Anytime data are moved from one location to another, it is possible that some of the data
could have been lost during the extraction. It is critical to ensure that the extracted data
are complete (that the data you wish to analyze were extracted fully) and that the integ-
rity of the data remains (that none of the data have been manipulated, tampered with, or
duplicated during the extraction). Being able to validate the data successfully requires you
to not only have the technical skills to perform the task, but also to know your data well. If
you know what to reasonably expect from the data in the extraction then you have a higher
likelihood of identifying errors or issues from the extraction. Examples of data validation
questions are: “How many records should have been extracted?” “What checksums or con-
trol totals can be performed to ensure data extraction is accurate?”
The following four steps should be completed to validate the data after extraction:
1. Compare the number of records that were extracted to the number of records in the source
database: This will give you a quick snapshot into whether any data were skipped or
didn’t extract properly due to an error or data type mismatch. This is a critical first
step, but it will not provide information about the data themselves other than ensuring
that the record counts match.
ISTUDY
Chapter 2 Mastering the Data 65
2. Compare descriptive statistics for numeric fields: Calculating the minimums, maximums,
averages, and medians will help ensure that the numeric data were extracted completely.
3. Validate Date/Time fields in the same way as numeric fields by converting the data type
to numeric and running descriptive statistic comparisons.
4. Compare string limits for text fields: Text fields are unlikely to cause an issue if you
extracted your data into Excel because Excel allows a generous maximum character
number (for example, Excel 2016 allows 32,767 characters per cell). However, if you
extracted your data into a tool that does limit the number of characters in a string, you
will want to compare these limits to the source database’s limits per field to ensure that
you haven’t cut off any characters.
If an error is found, depending on the size of the dataset, you may be able to easily find
the missing or erroneous data by scanning the information with your eyes. However, if the
dataset is large, or if the error is difficult to find, it may be easiest to go back to the extrac-
tion and examine how the data were extracted, fix any errors in the SQL code, and re-run
the extraction.
Lab Connection
Lab 2-5, Lab 2-6, Lab 2-7, and Lab 2-8 explore the process of loading and
validating data.
ISTUDY
66 Chapter 2 Mastering the Data
Lab Connection
Lab 2-2 and Lab 2-3 walk through how to prepare data for analysis and
resolve common data quality issues.
ISTUDY
Chapter 2 Mastering the Data 67
2. Numbers: Numbers can be misinterpreted, particularly if they are manually entered. For
example, 1 or I; 0 or O; 3 or E; 7 or seven. Watch for invalid number formats when you
start sorting and analyzing your data, and then go back and correct them. A dditionally,
accounting artifacts such as dollar signs, commas, and parentheses are pervasive in
spreadsheet data (e.g., $12,345.22 or (1,422.53)). As you clean the data, remove any extra
accounting characters so numbers appear in their raw form (e.g., 12345.22 or -1422.53).
3. International characters and encoding: When you work with data that span multiple
countries, it is likely that you will come across special characters, such as accent marks
(á or À), umlats (Ü), invisible computer characters (TAB, RETURN, linebreak, null),
or special characters that are used in query and scripting languages (*, #, “, ’). In many
cases, these can be corrected with a find and replace or contained in quote marks so
they are ignored by the query language. Additionally, while most modern computer
programs use UNICODE as the text encoding language, older databases will generate
data in the ASCII format. If your tool fails to populate your dataset accurately, having
international characters and symbols is likely to be a cause.
4. Languages and measures: Similar to international characters, data elements may contain
a variety of words or measures that have the same meaning. For example, cheese or
fromage; ketchup or catsup; pounds or lbs; $ or €; Arkansas or AR. In order to properly
analyze the comparable data, you’ll need to translate them into a common format by
choosing one word as the standard and replacing the equivalent words. Also make sure
the measure doesn’t change the meaning. The total value in U.S. dollars is not the same
thing as the total value in euros. Make sure you’re comparing apples to apples or euros
to euros.
5. Human error: Whenever there is manual input into the data, there is a high probability
that data will be bad simply because they were mistyped or entered into the wrong
place. There’s no hard and fast rule for dealing with input errors other than being
vigilant and making corrections (e.g., find and replace) when they occur.
Load
Step 5: Loading the Data for Data Analysis
If the extraction and transformation steps have been done well by the time you reach this
step, the loading part of the ETL process should be the simplest step. It is so simple, in
fact, that if your goal is to do your analysis in Excel and you have already transformed
and cleaned your data in Excel, you are finished. There should be no additional loading
necessary.
However, it is possible that Excel is not the last step for analysis. The data analysis tech-
nique you plan to implement, the subject matter of the business questions you intend to
answer, and the way in which you wish to communicate results will all drive the choice of
which tool you use to perform your analysis.
Throughout the text, you will be introduced to a variety of different tools to use for
analyzing data beyond including Excel, Power BI, Tableau Prep, and Tableau Desktop. As
these tools are introduced to you, you will learn how to load data into them.
ETL or ELT?
If loading the data into Excel is indeed the last step, are you actually “extracting, transform-
ing, and loading,” or is it “extracting, loading, and transforming”?
The term ETL has been in popular use since the 1970s, and even though methods for
extracting and transforming data have gotten easier to use, more accessible, as well as more
robust, the term has stuck. Increasingly, however, the procedure is shifting toward ELT. Par-
ticularly with tools such as Microsoft’s Power BI suite, all of the loading and transforming
ISTUDY
68 Chapter 2 Mastering the Data
can be done within Excel, with data directly loaded into Excel from the database, and then
transformed (also within Excel). The most common method for mastering the data that we
use throughout this textbook is more in line with ELT than ETL; however, even when the
order changes from ETL to ELT, it is still more common to refer to the procedure as ETL.
PROGRESS CHECK
6. Describe two different methods for obtaining data for analysis.
7. What are four common data quality issues that must be fixed before analysis can
take place?
7
S. White, “6 Ethical Questions about Big Data,” Financial Management, https://www.fm-magazine.com/
news/2016/jun/ethical-questions-about-big-data.html (accessed December 2020).
ISTUDY
Chapter 2 Mastering the Data 69
The user of the data must continue to recognize the potential risks associated with data
collection and data use, and work to mitigate those risks in a responsible way.
PROGRESS CHECK
8. A firm purchases data from a third party about customer preferences for laundry
detergent. How would you recommend that this firm conduct appropriate due
diligence about whether the third-party data provider follows ethical data prac
tices? An audit? A questionnaire? What questions should be asked?
Summary
■ The first step in the IMPACT cycle is to identify the questions that you intend to answer
through your data analysis project. Once a data analysis problem or question has been
identified, the next step in the IMPACT cycle is mastering the data, which includes
obtaining the data needed and preparing it for analysis. We often call the processes
associated with mastering the data ETL, which stands for extract, transform, and load.
(LO 2-2, 2-3)
■ In order to obtain the right data, it is important to have a firm grasp of what data are
available to you and how that information is stored. (LO 2-2)
◦ Data are often stored in a relational database, which helps to ensure that an organiza-
tion’s data are complete and to avoid redundancy. Relational databases are made up
of tables with rows of data that represent records. Each record is uniquely identified
with a primary key. Tables are related to other tables by using the primary key from
one table as a foreign key in another table.
■ Extract: To obtain the data, you will either have access to extract the data yourself or you
will need to request the data from a database administrator or the information systems
team. If the latter is the case, you will complete a data request form, indicating exactly
which data you need and why. (LO 2-3)
■ Transform: Once you have the data, they will need to be validated for completeness and
integrity—that is, you will need to ensure that all of the data you need were extracted and
that all data are correct. Sometimes when data are extracted some formatting or some-
times even entire records will get lost, resulting in inaccuracies. Correcting the errors and
cleaning the data is an integral step in mastering the data. (LO 2-3)
■ Load: Finally, after the data have been cleaned, there may be one last step of mastering
the data, which is to load them into the tool that will be used for analysis. Often, the
cleaning and correcting of data occur in Excel, and the analysis will also be done in
Excel. In this case, there is no need to load the data elsewhere. However, if you intend
to do more rigorous statistical analysis than Excel provides, or if you intend to do more
robust data visualization than can be done in Excel, it may be necessary to load the data
into another tool following the transformation process. (LO 2-3)
■ Mastering the data goes beyond just the ETL processes. Those who collect and use data
also have the responsibility of being good stewards, providing some assurance that the
data collection is not only secure, but also that the ethics of data collection and data use
have been considered. (LO 2-4)
ISTUDY
Key Words
accounting information system (54) A system that records, processes, reports, and communicates the
results of business transactions to provide financial and nonfinancial information for decision-making purposes.
composite primary key (58) A special case of a primary key that exists in linking tables. The compos-
ite primary key is made up of the two primary keys in the table that it is linking.
customer relationship management (CRM) system (54) An information system for managing all
interactions between the company and its current and potential customers.
data dictionary (59) Centralized repository of descriptions for all of the data attributes of the dataset.
data request form (62) A method for obtaining data if you do not have access to obtain the data
directly yourself.
descriptive attributes (58) Attributes that exist in relational databases that are neither primary nor
foreign keys. These attributes provide business information, but are not required to build a database. An
example would be “Company Name” or “Employee Address.”
Enterprise Resource Planning (ERP) (54) Also known as Enterprise Systems, a category of business
management software that integrates applications from throughout the business (such as manufacturing,
accounting, finance, human resources, etc.) into one system.
ETL (60) The extract, transform, and load process that is integral to mastering the data.
flat file (57) A means of storing data in one place, such as in an Excel spreadsheet, as opposed to stor-
ing the data in multiple tables, such as in a relational database.
foreign key (58) An attribute that exists in relational databases in order to carry out the relationship
between two tables. This does not serve as the “unique identifier” for each record in a table. These must be
identified when mastering the data from a relational database in order to extract the data correctly from
more than one table.
human resource management (HRM) system (54) An information system for managing all inter-
actions between the company and its current and potential employees.
mastering the data (54) The second step in the IMPACT cycle; it involves identifying and obtaining the
data needed for solving the data analysis problem, as well as cleaning and preparing the data for analysis.
primary key (57) An attribute that is required to exist in each table of a relational database and serves
as the “unique identifier” for each record in a table.
relational database (56) A means of storing data in order to ensure that the data are complete, not
redundant, and to help enforce business rules. Relational databases also aid in communication and integra-
tion of business processes across an organization.
supply chain management (SCM) system (54) An information system that helps manage all the
company’s interactions with suppliers.
70
ISTUDY
3.
Purchase Order Table
Materials Table PK: PO_Number Supplier Table
* 1
FK: Supplier ID
PK: Item_Number FK: EmployeeID PK: Supplier ID
FK: CashDisbursementID
1 1
CashDisbursement
Table
PK: Check Number
4. The purpose of the primary key is to uniquely identify each record in a table. The purpose
of a foreign key is to create a relationship between two tables. The purpose of a descrip
tive attribute is to provide meaningful information about each record in a table. Descrip
tive attributes aren’t required for a database to run, but they are necessary for people to
gain business information about the data stored in their databases.
5. Data dictionaries provide descriptions of the function (e.g., primary key or foreign key
when applicable), data type, and field names associated with each column (attribute) of
a database. Data dictionaries are especially important when databases contain several
different tables and many different attributes in order to help analysts identify the informa
tion they need to perform their analysis.
6. Depending on the level of security afforded to a business analyst, she can either obtain
data directly from the database herself or she can request the data. When obtaining data
herself, the analyst must have access to the raw data in the database and a firm knowledge
of SQL and data extraction techniques. When requesting the data, the analyst doesn’t need
the same level of extraction skills, but she still needs to be familiar with the data enough in
order to identify which tables and attributes contain the information she requires.
7. Four common issues that must be fixed are removing headings or subtotals, cleaning
leading zeroes or nonprintable characters, formatting negative numbers, and correcting
inconsistencies across the data.
8. Firms can ask to see the terms and conditions of their third-party data supplier, and ask
questions to come to an understanding regarding if and how privacy practices are main
tained. They also can evaluate what preventive controls on data access are in place and
assess whether they are followed. Generally, an audit does not need to be performed, but
requesting a questionnaire be filled out would be appropriate.
1. (LO 2-3) Mastering the data can also be described via the ETL process. The ETL pro
cess stands for:
a. extract, total, and load data.
b. enter, transform, and load data.
c. extract, transform, and load data.
d. enter, total, and load data.
ISTUDY
2. (LO 2-3) Which of the following describes part of the goal of the ETL process?
a. Identify which approach to Data Analytics should be used.
b. Load the data into a relational database for storage.
c. Communicate the results and insights found through the analysis.
d. Identify and obtain the data needed for solving the problem.
3. (LO 2-2) The advantages of storing data in a relational database include which of the
following?
a. Help in enforcing business rules
b. Increased information redundancy
c. Integrating business processes
d. All of the answers are correct
e. a and b
f. b and c
g. a and c
4. (LO 2-3) The purpose of transforming data is:
a. to validate the data for completeness and integrity.
b. to load the data into the appropriate tool for analysis.
c. to obtain the data from the appropriate source.
d. to identify which data are necessary to complete the analysis.
5. (LO 2-2) Which attribute is required to exist in each table of a relational database and
serves as the “unique identifier” for each record in a table?
a. Foreign key
b. Unique identifier
c. Primary key
d. Key attribute
6. (LO 2-2) The metadata that describe each attribute in a database are which of the following?
a. Composite primary key
b. Data dictionary
c. Descriptive attributes
d. Flat file
7. (LO 2-3) As mentioned in the chapter, which of the following is not a common way that
data will need to be cleaned after extraction and validation?
a. Remove headings and subtotals.
b. Format negative numbers.
c. Clean up trailing zeroes.
d. Correct inconsistencies across data.
8. (LO 2-2) Why is Supplier ID considered to be a primary key for a Supplier table?
a. It contains a unique identifier for each supplier.
b. It is a 10-digit number.
c. It can either be for a vendor or miscellaneous provider.
d. It is used to identify different supplier categories.
9. (LO 2-2) What are attributes that exist in a relational database that are neither primary
nor foreign keys?
a. Nondescript attributes
b. Descriptive attributes
c. Composite keys
d. Relational table attributes
72
ISTUDY
10. (LO 2-4) Which of the following questions are not suggested by the Institute of Business
Ethics to allow a business to create value from data use and analysis, and still protect
the privacy of stakeholders?
a. How does the company use data, and to what extent are they integrated into firm
strategy?
b. Does the company send a privacy notice to individuals when their personal data are
collected?
c. Does the data used by the company include personally identifiable information?
d. Does the company have the appropriate tools to mitigate the risks of data misuse?
1. (LO 2-2) The advantages of a relational database include limiting the amount of redun
dant data that are stored in a database. Why is this an important advantage? What can
go wrong when redundant data are stored?
2. (LO 2-2) The advantages of a relational database include integrating business pro
cesses. Why is it preferable to integrate business processes in one information system,
rather than store different business process data in separate, isolated databases?
3. (LO 2-2) Even though it is preferable to store data in a relational database, storing data
across separate tables can make data analysis cumbersome. Describe three reasons it
is worth the trouble to store data in a relational database.
4. (LO 2-2) Among the advantages of using a relational database is enforcing business
rules. Based on your understanding of how the structure of a relational database helps
prevent data redundancy and other advantages, how does the primary key/foreign
key relationship structure help enforce a business rule that indicates that a company
shouldn’t process any purchase orders from suppliers who don’t exist in the database?
5. (LO 2-2) What is the purpose of a data dictionary? Identify four different attributes that
could be stored in a data dictionary, and describe the purpose of each.
6. (LO 2-3) In the ETL process, the first step is extracting the data. When you are obtaining
the data yourself, what are the steps to identifying the data that you need to extract?
7. (LO 2-3) In the ETL process, if the analyst does not have the security permissions to
access the data directly, then he or she will need to fill out a data request form. While
this doesn’t necessarily require the analyst to know extraction techniques, why does
the analyst still need to understand the raw data very well in order to complete the data
request?
8. (LO 2-3) In the ETL process, when an analyst is completing the data request form, there
are a number of fields that the analyst is required to complete. Why do you think it is
important for the analyst to indicate the frequency of the report? How do you think that
would affect what the database administrator does in the extraction?
9. (LO 2-3) Regarding the data request form, why do you think it is important to the data
base administrator to know the purpose of the request? What would be the importance
of the “To be used in” and “Intended audience” fields?
10. (LO 2-3) In the ETL process, one important step to process when transforming the data
is to work with null, n/a, and zero values in the dataset. If you have a field of quantitative
data (e.g., number of years each individual in the table has held a full-time job), what
would be the effect of the following?
a. Transforming null and n/a values into blanks
b. Transforming null and n/a values into zeroes
c. Deleting records that have null and n/a values from your dataset
(Hint: Think about the impact on different aggregate functions, such as COUNT and
AVERAGE.)
ISTUDY
11. (LO 2-4) What is the theme of each of the six questions proposed by the Institute of
Business Ethics? Which one addresses the purpose of the data? Which one addresses
how the risks associated with data use and collection are mitigated? How could these
two specific objectives be achieved at the same time?
Problems
®
1. (LO 2-2) Match the relational database function to the appropriate relational database
term:
• Composition primary key
• Descriptive attribute
• Foreign key
• Primary key
• Relational database
2. (LO 2-3) Identify the order sequence in the ETL process as part of mastering the data
(i.e., 1 is first; 5 is last).
3. (LO 2-3) Identify which ETL tasks would be considered “Validating” the data, and which
would be considered “Cleaning” the data.
74
ISTUDY
4. (LO 2-3) Match each ETL task to the stage of the ETL process:
• Determine purpose
• Obtain
• Validate
• Clean
• Load
5. (LO 2-4) For each of the six questions suggested by the Institute of Business Ethics to
evaluate data privacy, categorize each question into one of these three types:
A. Evaluate the company’s purpose of the data
B. Evaluate the company’s use or misuse of the data
C. Evaluate the due diligence of the company’s data vendors in preventing misuse of
the data
6. (LO 2-2) Which of the following are useful, established characteristics of using a rela
tional database?
ISTUDY
7. (LO 2-3) As part of master the data, analysts must make certain trade-offs when they
consider which data to use. Consider these three different scenarios:
a. Analysis: What are the trade-offs of using data that are highly relevant to the
question, but have a lot of missing data?
b. Analysis: What are the trade-offs an analyst should consider between data that are
very expensive to acquire and analyze, but will most directly address the question at
hand? How would you assess whether they are worth the extra cost?
c. Analysis: What are the trade-offs between extracting needed data by yourself, or
asking a data scientist to get access to the data?
8. (LO 2-4) The Institute of Business Ethics proposes that a company protect the privacy of
stakeholders by considering these questions of its third-party data providers:
• Does our company conduct appropriate due diligence when sharing with or acquir
ing data from third parties?
• Do third-party data providers follow similar ethical standards in the acquisition and
transmission of the data?
a. Analysis: What type of due diligence with regard to a third party sharing and acquir
ing data would be appropriate for the company (or company accountant or data
scientist) to perform? An audit? A questionnaire? Standards written in to a contract?
b. Analysis: How would you assess whether the third-party data provider follows ethi
cal standards in the acquisition and transmission of the data?
76
ISTUDY
LABS ®
ISTUDY
in Appendix J for guidance on what tables and attributes are available. For
example, to answer the question about state sales, you would need the Cus-
tomer_State attribute that is located in the Customer master table as well as the
Sales_Order_Quantity_Sold attribute in the Sales table. If you had access to
store or distribution center location data, you may also look for a State field
there, as well.
Sales_Order_Lines Table
Attribute Description of Attribute
Sales_Order_ID (FK) Unique identifier for each sales order
Sales_Order_Quantity_Sold Sales order line quantity
Product_Sale_Price Sales order line price per unit
Finished_Goods_Products Table
Attribute Description of Attribute
Product_Code (PK) Unique identifier for each product
Product_Description Product description (plain English) to indicate the name or other identifying
characteristics of the product
Product_Sale_Price Price per unit of the associated product
You may notice that while there are a few attributes that may be useful in your sales analy-
sis, the list may be incomplete and be missing several values. This is normal with data requests.
78
ISTUDY
Lab 2-2 Prepare Data for Analysis—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Sláinte is a fictional brewery that has recently gone through big changes.
Sláinte sells six different products. The brewery has only recently expanded its business to
distributing from one state to nine states, and now its business has begun stabilizing after
the expansion. Sláinte has brought you in to help determine potential areas for sales growth
in the next year. Additionally, management have noticed that the company’s margins aren’t
as high as they had budgeted and would like you to help identify some areas where they
could improve their pricing, marketing, or strategy. Specifically, they would like to know
how many of each product were sold, the product’s actual name (not just the product code),
and the months in which different products were sold.
Data: Lab 2-2 Slainte Dataset.zip - 83KB Zip / 90KB Excel
Microsoft Excel
LAB 2-2M Example of PivotTable in Microsoft Excel for November and December
ISTUDY
Tableau | Desktop
In this lab, you will learn how to connect to data in Microsoft Power BI or Excel using
the Internal Data Model and how to connect to data and build relationships among tables
in Tableau. This will prepare you for future labs that require you to transform data, as well
as aid in understanding of primary and foreign key relationships.
80
ISTUDY
b. Sales_Order_Lines: Change the data type for Product_Sale_Price to
Currency.
c. Sales_Order: Change the data type for Invoice_Order_Total and
Shipping_Cost to Currency.
7. Take a screenshot (label it 2-2MA) of the Power Query Editor window
with your changes.
8. At this point, we are ready to connect the data to our Excel sheet. We
will only create a connection so we can pull it in for specific analyses.
Click the Home tab and choose the arrow below Close & Load > Close &
Load To. . .
9. Choose Only Create Connection and Add this data to the Data Model and click
OK. The three queries will appear in a tab on the right side of your sheet.
10. Save your workbook as Lab 2-2 Slainte Model.xlsx, and continue to Part 2.
Tableau | Desktop
ISTUDY
Lab 2-2 Part 2 Validate the Data
Now that the data have been prepared and organized, you’re ready for some basic analysis.
Given the sales data, management has asked you to prepare a report showing the total
number of each item sold each month between January and April 2020. This means that we
should create a PivotTable with a column for each month, a row for each product, and the
sum of the quantity sold where the two intersect.
82
ISTUDY
Tableau | Desktop
ISTUDY
Lab 2-3 Resolve Common Data Problems—LendingClub
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: LendingClub is a peer-to-peer marketplace where borrowers and inves-
tors are matched together. The goal of LendingClub is to reduce the costs associated with
these banking transactions and make borrowing less expensive and investment more engag-
ing. LendingClub provides data on loans that have been approved and rejected since 2007,
including the assigned interest rate and type of loan. This provides several opportunities for
data analysis. There are several issues with this dataset that you have been asked to resolve
before you can process the data. This will require you to perform some cleaning, reformat-
ting, and other transformation techniques.
Data: Lab 2-3 Lending Club Approve Stats.zip - 120MB Zip / 120MB Excel
Microsoft Excel
LAB 2-3M Example of Cleaned Data in Microsoft Excel
84
ISTUDY
Tableau | Prep
Attribute Description
LAB EXHIBIT 2-3A
Source: LendingClub
loan_amnt Requested loan amount
term Length of the loan in months
int_rate Interest rate of the loan
grade Quality of the loan: e.g. A, B, C
emp_length Employment length
(continued)
ISTUDY
LAB EXHIBIT 2-3A Attribute Description
(concluded)
home_ownership Whether the borrower rents or owns a home
annual_inc Annual income
issue_d Date of loan issue
loan_status Fully paid or charged off
title Loan purpose
zip_code The first three digits of the applicant’s zip code
addr_state State
dti Debt-to-income ratio
delinq_2y Late payments within the past 2 years
earliest_cr_line Oldest credit account
open_acc Number of open credit accounts
revol_bal Total balance of all credit accounts
revol_util Percentage of available credit in use
total_acc Total number of credit accounts
application_type Individual or joint application
86
ISTUDY
Microsoft | Excel + Power Query
ISTUDY
8. Select the term column.
a. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “ months” with a space as the first
character (do not include the quotation marks).
2. Leave the Replace With box blank.
3. Click OK.
9. Select the emp_length column.
a. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “ years” with a space as the first
character.
2. Leave the Replace With box blank.
3. Click OK.
b. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “ year” with a space as the first
character.
2. Leave the Replace With box blank.
3. Click OK.
c. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “<1” with a space between the two
characters.
2. In the Replace With box, type “0”.
3. Click OK.
d. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “n/a”.
2. In the Replace With box, type “0”.
3. Click OK.
e. In the Transform tab, click Extract > Text Before Delimiter.
1. In the Value to Find box, type “+”.
2. Click OK.
f. In the Transform tab, click Extract > Text Before Delimiter.
1. In the Value to Find box, type “ ” (a single space).
2. Click OK.
g. In the Transform tab, click Data Type > Whole Number.
10. Take a screenshot (label it 2-3MB) of your cleaned data file, showing the
term and emp_length columns.
11. Click the Home tab in the ribbon and then click Close & Load. It will take a
minute to clean the entire data file.
12. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 2-3 Lending Club Transform.xlsx.
88
ISTUDY
Tableau | Prep
Lab Note: Tableau Prep takes extra time to process large datasets.
1. Open Tableau Prep Builder.
2. Click Connect to Data > To a File > Microsoft Excel.
3. Locate the Lab 2-3 Lending Club Approve Stats.xlsx file on your computer
and click Open.
4. Drag LoanStats3c to your flow. Notice that all of the Field Names are
incorrect.
First we have to fix the column headers and remove unwanted data.
5. Check Use Data Interpreter in the pane on the left to automatically fix the
Field Names.
6. Uncheck the box next to any attribute that is NOT in the following list to
remove it from our analysis. Hint: Once you get to initial_list_status, all of
the remaining fields can be removed.
a.
loan_amnt
b.
term
c.
int_rate
d.
grade
e.
emp_length
f.
home_ownership
g.
annual_inc
h.
issue_d
i.
loan_status
j.
title
k.
zip_code
l.
addr_state
m.
dti
n.
delinq_2y
o.
earliest_cr_line
p.
open_acc
q.
revol_bal
r.
revol_util
s.
total_acc
7. Take a screenshot (label it 2-3TA) of your corrected and reduced list of
Field Names.
Next, remove text values from numerical values and replace values so we
can do calculations and summarize the data. These extraneous text values
include months, <1, n/a, +, and years:
ISTUDY
8. Click the + next to LoanStats3c in the flow and choose Add Clean Step. It
may take a minute or two to load.
9. An Input step will appear in the top half of the workspace, and the details of
that step are in the bottom of the workspace in the Input Pane. Every flow
requires at least one Input step at the beginning of the flow.
10. In the Input Pane, you can further limit which fields you bring into Tableau
Prep, as well as seeing details about each field including:
a. Type: this indicates the data type of each field (for example, numeric,
date, or short text).
b. Linked Keys: this indicates whether or not the field is a primary or a
foreign key.
c. Sample Values: provides a few example values from that field so you can
see how the data are formatted.
11. In the term pane:
a. Right-click the header or click the three dots and choose Clean > Remove
Letters.
b. Click the Data Type (Abc) button in the top-left corner and change the
data type to Number (Whole).
12. In the emp_length pane:
a. Right-click the header or click the three dots and choose Group Values >
Manual Selection.
1. Double-click <1 year in the list and type “0” to replace those values
with 0.
2. Double-click n/a in the list and type “0” to replace those values
with 0.
3. While you are in the Group Values window, you could quickly replace
all of the year values with single numbers (e.g., 10+ years becomes
“10”) or you can move to the next step to remove extra characters.
4. Click Done.
b. If you didn’t remove the “years” text in the previous step, right-click the
emp_length header or click the three dots and choose Clean > Remove
Letters and then Clean > Remove All Spaces.
c. Finally, click the Data Type (Abc) button in the top-left corner and
change the data type to Number (Whole).
13. In the flow pane, right-click Clean 1 and choose Rename and name the step
“Remove text”.
14. Take a screenshot (label it 2-3TB) of your cleaned data file, showing the
term and emp_length columns.
15. Click the + next to your Remove text task and choose Output.
16. In the Output pane, click Browse:
a. Navigate to your preferred location to save the file.
b. Name your file Lab 2-3 Lending Club Transform.hyper.
c. Click Accept.
90
ISTUDY
17. Click Run Flow. When it is finished processing, click Done.
18. When you are finished answering the lab questions you may close Tableau
Prep. Save your file as Lab 2-3 Lending Club Transform.tfl.
ISTUDY
Microsoft | Power BI Desktop
Microsoft Excel
LAB 2-4M Example of Data Distributions in Microsoft Power Query
Tableau | Desktop
92
ISTUDY
In this part we are interested in understanding more about the loan amounts, interest
rates, and annual income by looking at their summary statistics. This process can be used
for data validation and later for outlier detection.
Lab Note: These instructions can also be performed in the Power Query included
with Excel 365.
ISTUDY
16. Take a screenshot (label it 2-4MB) of the column statistics and value
distribution.
17. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 2-4 Lending Club Summary.pbix.
Tableau | Desktop
94
ISTUDY
AQ4. Compare and contrast: What are some of the summary statistics measures that
are unique to Power Query? To Tableau Desktop?
Microsoft Excel
LAB 2-5M Example of Cleaned College Scorecard Data in Microsoft Excel
ISTUDY
Tableau | Prep + Desktop
96
ISTUDY
6. Take a screenshot (label it 2-5MA) of your columns with the proper data
types.
7. From the Home tab, click Close & Load.
8. To ensure that you captured all of the data through the extraction from the
txt file, we need to validate them:
a. In the Queries & Connections pane, verify that there are 7,703 rows loaded.
b. Compare the attribute names (column headers) to the attributes listed
in the data dictionary (found in Appendix K of the textbook). There
should be 30 columns (the last column in Excel should be AD).
c. Click Column H for the SAT_AVG attribute. In the summary statistics at
the bottom of your worksheet, the overall average SAT score should be
1,059.07.
9. Take a screenshot (label it 2-5MB) of your data table in Excel.
10. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 2-5 College Scorecard Transform.xlsx. Your data are
now ready for the test plan. This lab will continue in Lab 3-3.
ISTUDY
Rev. Confirming Pages
16. From the menu bar, click Analysis > Aggregate Measures to remove the check
mark. To show each unique entry, you have to disable aggregate measures.
17. To show the summary statistics, go to the menu bar and click Worksheet >
Show Summary. A Summary card appears on the right side of the screen
with the Count, Sum, Average, Minimum, Maximum, and Median values.
18. Drag Unitid to the Rows shelf and note the summary statistics.
19. Take a screenshot (label it 2-5TB) of the Unitid stats in your worksheet.
20. Create two new sheets and repeat steps 16–18 for Sat Avg and C150 4, not-
ing the count, sum, average, minimum, maximum, and median of each.
21. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 2-5 College Scorecard Transform.twb. Your
data are now ready for the test plan. This lab will continue in Lab 3-3.
98
Microsoft Excel
LAB 2-6M Example of Dillard’s Data Model in Microsoft Power BI
Tableau | Desktop
ISTUDY
Lab 2-6 Build Relationships between Tables
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-6 [Your name] [Your email address].docx.
Before you can analyze the data, you must first define the relationships that show how
the different tables are connected. Most tools will automatically detect primary key–foreign
key relationships, but you should always double-check to make sure your data model is
accurate.
100
ISTUDY
b. When you hover over any of the relationships, the keys that are common
between the two tables highlight.
1. Something important to consider is that in the raw data, the primary
key is typically the first attribute listed. In this Power BI modeling
window, the attributes have been re-ordered to appear in alphabetical
order. For example, SKU is the primary key of the SKU table, and it
exists in the Transact table as a foreign key.
9. Take a screenshot (label it 2-6MB) of the All tables sheet.
10. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 2-6 Dillard’s Diagram.pbix.
Note: While it may seem easier and faster to rely on the automatically created
data model in Power BI, you should review the table relationships to make sure
the appropriate keys match.
Tableau | Desktop
ISTUDY
8. Finally, double-click the SKU_STORE table from the list on the left.
a. The SKU_Store table is related to both the SKU and the Store tables,
but Tableau will likely default to connecting it to the Transact table,
resulting in a broken relationship.
b. To fix the relationship,
1. Close the Edit Relationships window without making changes.
2. Right-click SKU_STORE in the top pane and choose Move to >
SKU.
3. Verify the related keys and close the Edit Relationships window.
4. Note: It is not necessary to also relate the SKU_Store table to the
Store table in Tableau; that is only a database requirement.
9. Take a screenshot (label it 2-6TB) of the Data Source tab.
10. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 2-6 Dillard’s Diagram.twb.
102
ISTUDY
Lab 2-7 omprehensive Case: Preview Data from
C
Tables—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. After analyzing the ER Diagram to gain a bird’s-eye view of all the differ-
ent tables and fields in the database, you are ready to further explore the data in each table
and how the fields are formatted. In particular, you will connect to the Dillard’s database
using Tableau Prep or Microsoft BI and you will explore the data types, the primary and
foreign keys, and preview individual tables.
In Lab 2-6, the Tableau Track had you focus on Tableau Desktop. In this lab, you
will connect to Tableau Prep instead. Tableau Desktop showcases the table relationships
quicker, but Tableau Prep makes it easier to preview and clean the data prior to analysis.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.
Microsoft Excel
LAB 2-7M Example of Summary Statistics in Microsoft Power BI
ISTUDY
Tableau | Prep
104
ISTUDY
6. In the Navigator window, click the TRANSACT table.
a. It may take a few moments for the data to load, but once they do, you
will see a preview of the data to the right. Scroll through the data to get
a feel for the data stored in the Transact table.
b. Unlike Tableau Prep, we cannot limit which attributes we pull in at this
point in the process. However, the preview pane to the right shows the
first 27 records in the table that you have selected. This gives you an idea
of the different data types and examples of the data stored in the table.
c. Scroll to the right in the preview pane to see several fields that do not
have preview data; instead each record is marked with the term Value.
These fields are indicating the tables that are related to the Trans-
act table. In this instance, they are CUSTOMER, SKU(SKU), and
STORE(STORE).
7. Click the CUSTOMER table to preview that table’s data.
8. Take a screenshot (label it 2-7MA).
9. Answer the lab questions and continue to Part 2.
Tableau | Prep
ISTUDY
Lab 2-7 Part 1 Objective Questions (LO 2-2, 2-3)
OQ1. What is the primary key for the CUSTOMER table?
OQ2. What is the primary key for the SKU table?
OQ3. Which tables are related to the Customer table? (Hint: Do not forget the foreign
keys that you discovered in the Transact table.)
1. Place a check mark in the TRANSACT and CUSTOMER tables in the Navi-
gator window.
2. Click Transform Data.
a. This will open a new window for the Power Query Editor (this is the
same interface that you will encounter in Excel’s Get & Transform).
b. On the left side of the Power Query Editor, you can click through the
different queries to see previews of each table’s data. Similar to Tableau
Prep, you are provided only a sample of the dataset.
c. Click the Transact query to preview the data from the Transact table.
d. Scroll the main view to the right to see more of the attributes.
3. Power Query does not default to providing data profiling information the
way Tableau Prep’s Clean step does, but we can activate those options.
4. Click the View tab and place check marks in the Column Distribution and
Column Profile boxes.
a. Column distribution: Provides thumbnails of each column’s distribution
above the first row of data. However, it is limited to only the thumbnail—
you cannot hover over bars in the distribution charts to gain additional
details or filter the data.
b. Column profile: When you select a column, it will provide a more
detailed glimpse into the distribution of that particular column. You can
click into a bar in the distribution to filter the records based on that cri-
teria. This will also cause the Column distribution thumbnails to adjust.
c. Again—caution! The distributions and profiling are based on the top
1,000 rows from the table you have connected to.
5. Some of the attributes are straightforward in what they represent, but others
aren’t as clear. For instance, you may be curious about what TRAN_TYPE
represents.
6. Filter the purchases and returns:
a. Click the drop-down button next to Tran_Type and filter for just the
records with P values or click the bar associated with the P values in the
106
ISTUDY
Column profile. Scroll over and look at the results in the Tran_Amt field
and note whether they are positive or negative.
b. Now adjust the filter so that you see only R Tran_Types. Note the values
in the Tran_Amt field again.
7. Take a screenshot (label it 2-7MB).
8. When you are finished answering the lab questions, you may close the Power
Query Editor and Power BI Desktop. Save your file as Lab 2-7 Dillard’s
Data.pbix.
Tableau | Prep
1. Add a new Clean step extending from the TRANSACT table (click the +
icon next to TRANSACT and choose Clean Step from the menu). A phan-
tom step for View and Clean may already exist. If so, just click that step to
add it:
a. The Clean step provides many different options for preparing your data,
which we will get to in future labs. In this lab, you will use it as a means
for familiarizing yourself with the dataset.
b. Beneath the Flow Pane, you can see two new panes: the Profile Pane and
the Data Grid.
1. The Data Grid provides a more robust sample of data values than you
were able to see in the Input Pane from the Input step.
2. The Profile Pane provides summary visualizations of each attribute
in the table. Note: When datasets are large, these summary values are
calculated only from the first several thousand records in the original
table, so be cautious about using these visualizations to drive insights!
In this instance, we can see a good example of this being merely a
sample by looking at the TRAN_DATE visual summary. It shows
only dates from 12/30/2013 to 01/27/2014, but we know the dataset
has transactions through 2016.
c. Some of the attributes are straightforward in what they represent, but
others aren’t as clear. For instance, you may be curious about what
TRAN_TYPE represents. Look at the data visualization provided for
TRAN_TYPE in the Profile Pane and click P. This will filter the results
in the Data Grid.
1. Look at the results in the TRAN_AMT field and note whether they
are positive or negative (you can do so by looking at the data grid or
by looking at the filtered visualization for TRAN_AMT).
2. Adjust the filter so that you see only R transaction types. Note the
values in the Tran_Amt field again.
2. Take a screenshot (label it 2-7TB).
3. When you are finished answering the lab questions, you may close Tableau
Prep. Save your file as Lab 2-7 Dillard’s Data.tfl.
ISTUDY
Lab 2-7 Part 2 Objective Questions (LO 2-2, 2-3)
OQ1. What do you notice about the TRAN_AMT for transactions with
TRAN_TYPE “P”?
OQ2. What do you notice about the TRAN_AMT for transactions with
TRAN_TYPE “R”?
OQ3. What do “P” type transactions and “R” type transactions represent?
Lab 2-8 C
omprehensive Case: Preview a Subset of Data in
Excel, Tableau Using a SQL Query—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. So far you have analyzed the ER Diagram to gain a bird’s-eye view of all
the different tables and fields in the database, and you have explored the data in each table
to gain a glimpse of sample values from each field and how they are all formatted. You also
gained a little insight into the distribution of sample values across each field, but at this
point you are ready to dig into the data a bit more. In the previous comprehensive labs, you
connected to full tables in Tableau or Power BI to explore the data. In this lab, instead of
connecting to full tables, we will write a SQL query to pull only a subset of data into Tableau
or Excel. This tactic is more effective when the database is very large and you can derive
insights from a sample of the data. We will analyze 5 days’ worth of transaction data from
September 2016. In this lab we will look at the distribution of transactions across different
states in order to get to know our data a little better.
Data: Dillard’s sales data are available only on the University of Arkansas Remote
Desktop (waltonlab.uark.edu). See your instructor for login credentials.
108
ISTUDY
Microsoft | Excel + Power Query
Microsoft Excel
LAB 2-8M Example of a PivotTable and PivotChart Microsoft Power BI
Tableau | Desktop
ISTUDY
Microsoft | Excel + Power Query
Tableau | Desktop
1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT TRANSACT.*, STATE
FROM TRANSACT
INNER JOIN STORE
ON TRANSACT.STORE = STORE.STORE
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’
e. Click Preview Results. . . to test your query on a sample data set.
f. If everything looks good, close the preview and click OK.
3. Take a screenshot (label it 2-8TA).
4. Click Sheet 1.
110
ISTUDY
Lab 2-8 Part 2 View the Distribution of Transaction
Amounts across States
In addition to data from the Transact table, our query also pulled in the attribute State from the
Store table. We can use this attribute to identify the sum of transaction amounts across states.
Tableau | Desktop
ISTUDY
Lab 2-8 Part 2 Objective Questions
OQ1. Which state has the highest average transaction amount?
OQ2. What is the average of transactions for North Carolina? Round your answer to
the nearest dollar.
112
ISTUDY
ISTUDY
Chapter 3
Performing the Test Plan and
Analyzing the Results
A Look Back
Chapter 2 provided a description of how data are prepared, scrubbed, and made ready to use to answer business
questions. We explained how to extract, transform, and load data and then how to validate and normalize the data.
In addition, we explained how data standards are used to facilitate the exchange of data between both senders and
receivers. We also emphasized the ethical importance of maintaining privacy in both the collection and use of data.
A Look Ahead
Chapter 4 will demonstrate various techniques that can be used to effectively communicate the results of your
analyses, emphasizing the use of visualizations. Additionally, we discuss how to refine your results and translate
your findings into useful information for decision makers.
114
ISTUDY
Liang Zhao Zhang, a San Francisco–based janitor, made more
than $275,000 in 2015. The average janitor in the area earns just
$26,180 a year. Zhang, a Bay Area Rapid Transit (BART) janitor,
has a base pay of $57,945 and $162,050 in overtime pay. With
benefits, the total was $276,121. While some call his compensa-
tion “outrageous and irresponsible,” Zhang signed up for every
available overtime slot that became available. To be sure, Zhang
worked more than 4,000 hours last year and received overtime
pay. Can BART predict who might take advantage of overtime
pay? Should it set a policy restricting overtime pay? Would it be
better for BART to hire more regular, full-time employees instead
jaruek/123RF of offering so much overtime to current employees?
Can Data Analytics help somehow to address these questions?
Using a profiling Data Analytics approach detailed in this chapter, BART could generate summary statistics of its
workers and their overtime pay to see the extent that overtime is required.
Using regression and classification approaches to Data Analytics would help to classify which employees are most
likely to exceed normal bounds and why. BART, for example, has a policy of offering overtime by seniority. So do the
most senior employees sign up first and leave little overtime to others? Will a senior employee get paid more for over-
time than more junior-level employees? If so, is that the best policy for the company and its employees?
Source: http://www.cnbc.com/2016/11/04/how-one-bay-area-janitor-made-276000-last-year.xhtml.
OBJECTIVES
After reading this chapter, you should be able to:
LO 3-1 Understand and distinguish among the four types of Data Analytics in
performing the test plan.
LO 3-2 Explain several descriptive analytics approaches, including summary
statistics and data reduction, and how they summarize results.
LO 3-3 Explain the diagnostic approach to Data Analytics, including profiling
and clustering.
LO 3-4 Understand the techniques associated with predictive analytics,
including regression and classification.
LO 3-5 Describe the use of prescriptive analytics, including decision support
systems, machine learning, and artificial intelligence.
ISTUDY
116 Chapter 3 Performing the Test Plan and Analyzing the Results
EXHIBIT 3-1
Four Main Types of
Data Analytics
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 117
• Prescriptive analytics are procedures that work to identify the best possible options
given constraints or changing conditions. These typically include developing more
advanced machine learning and artificial intelligence models to recommend a course of
action, or optimize, based on constraints and/or changing conditions.
The choice of Data Analytics model depends largely on the type of question that you’re
trying to answer and your access to the data needed to answer the question. Descriptive
and diagnostic analytics are typically paired when you would want to describe the past data
and then compare them to a benchmark to determine why the results are the way they are,
similar to the accounting concepts of planning and controlling. Likewise, predictive and
prescriptive analytics make good partners when you would want to predict an outcome and
then make a recommendation on how to follow up, similar to an auditor flagging a transac-
tion as high risk and then following a decision flowchart to determine whether to request
additional evidence or include it in audit findings.
As you move from one Data Analytics approach to the next, you trade hindsight and infor-
mation, which are traditionally accounting domain areas, for foresight and optimization.
Ultimately, the model you use comes down to the questions you are trying to answer. We
highlighted the Data Analytics approaches in Chapter 1. Here we categorize them into the
four main analytics types, summarized in Exhibit 3-2:
1. Descriptive analytics:
• Summary statistics describe a set of data in terms of their location (mean, median), range
(standard deviation, minimum, maximum), shape (quartile), and size (count).
• Data reduction or filtering is used to reduce the number of observations to focus on relevant
items (e.g., highest cost, highest risk, largest impact, etc.). It does this by taking a large set of
ISTUDY
118 Chapter 3 Performing the Test Plan and Analyzing the Results
data (perhaps the population) and reducing it to a smaller set that has the vast majority of the
critical information of the larger set. For example, auditing may use data reduction to narrow
transactions based on relevance or size. While auditing has employed various random and
stratified sampling over the years, Data Analytics suggests new ways to highlight which trans-
actions do not need the same level of vetting as other transactions.
2. Diagnostic analytics:
• Profiling identifies the “typical” behavior of an individual, group, or population by compiling
summary statistics about the data (including mean, standard deviations, etc.) and comparing
individuals to the population. By understanding the typical behavior, we’ll be able to identify ab-
normal behavior more easily. Profiling might be used in accounting to identify transactions that
might warrant some additional investigation (e.g., outlier travel expenses or potential fraud).
• Clustering helps identify groups (or clusters) of individuals (such as customers) that share
common underlying characteristics—in other words, identifying groups of similar data ele-
ments and the underlying drivers of those groups. For example, clustering might be used to
segment a customer into a small number of groups for additional analysis and risk assessment.
Likewise, transactions might also be put into clusters to understand underlying relationships.
• Similarity matching is a grouping technique used to identify similar individuals based on data
known about them. For example, companies identify seller and customer fraud based on
various characteristics known about each seller and customer to see if they were similar to
known fraud cases.
• Co-occurrence grouping discovers associations between individuals based on common events,
such as transactions they are involved in. Amazon might use this to sell another item to you
by knowing what items are “frequently bought together” or “Customers who bought this item
also bought . . .” as shown in Chapter 1.
3. Predictive analytics:
• Regression estimates or predicts the numerical value of a dependent variable based on the
slope and intersect of a line and the value of an independent variable. An R2 value indicates
how closely the line fits to the data used to calculate the regression. An example of regression
analysis might be: given a balance of total accounts receivable held by a firm, what is the ap-
propriate level of allowance for doubtful accounts for bad debts?
• Classification predicts a class or category for a new observation based on the manual identi-
fication of classes from previous observations. Membership of a class may be binary in the
case of decision trees or indicate the distance from a decision boundary. Some examples of
classification include predicting which loans are likely to default, credit applications that are
expected to be approved, the classification of an operating or financing lease, or identification
of suspicious transactions. In each of these cases, prior data must be manually identified as
belonging to each class to build the predictive model.
• Link prediction predicts a relationship between two data items, such as members of a social
media platform. For example, if two individuals have mutual friends on social media and both
attended the same university, it is likely that they know each other, and the site may make a
recommendation for them to connect. Chapter 1 provides an example of this used in Facebook.
Link prediction in an accounting setting might work to use social media to look for relationships
between related parties that are not otherwise disclosed to identify related party transactions.
4. Prescriptive analytics:
• Decision support systems are rule-based systems that gather data and recommend actions based
on the input. Tax preparation software, investment advice tools, and auditing tools recommend
courses of actions based on data that are input as part of an interview or interrogation process.
• Machine learning and artificial intelligence are learning models or intelligent agents that adapt
to new external data to recommend a course of action. For example, an artificial intelligence
model may observe opinions given by an audit partner and adjust the model to reflect chang-
ing levels of risk appetite and regulation.
While these are all important and applicable data approaches, in the rest of the chapter
we limit our discussion to the more common models, including summary statistics, data
reduction, profiling, clustering, regression, classification, and artificial intelligence. You’ll
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 119
find that these data approaches are not mutually exclusive and that actual analysis may
involve parts of several approaches to appropriately address the accounting question.
Just as the different analytics types are important in the third step of the IMPACT
cycle model, “performing the test plan,” they are equally important in the fourth step “A”—
“address and refine results”—as analysts learn from the test approaches and refine them.
PROGRESS CHECK
1. Using Exhibit 3-2, identify the appropriate approach for the following questions:
a. Will a customer purchase item X if given incentive A?
b. Should we offer a customer a line of credit?
c. What quantity of products will the customer purchase?
2. What is the main difference between descriptive and diagnostic methods?
Summary Statistics
Summary statistics describe the location, spread, shape, and dependence of a set of obser-
vations. These commonly include the count, sum, minimum, maximum, mean or average,
standard deviation, median, quartiles, correlation covariance, and frequency that describe a
specific measurable value, shown in Exhibit 3-3.
ISTUDY
120 Chapter 3 Performing the Test Plan and Analyzing the Results
The use of summary statistics helps the user understand what the data look like. For
example, the sum function can be used to determine account balances. The mean and
median can be used to aggregate transactions by employee, location, or division. The stan-
dard deviation and frequency help to identify normal behavior and trends in the data.
Lab Connection
Lab 2-4 and Lab 3-4 have you generate multiple summary statistics at once.
Data Reduction
As you recall, the data reduction approach attempts to reduce the amount of detailed infor-
mation considered to focus on the most critical, interesting, or abnormal items (e.g., highest
cost, highest risk, largest impact, etc.). It does this by filtering through a large set of data
(perhaps the total population) and reducing it to a smaller set that has the vast majority
of the critical information of the larger set. The data reduction approach is done primarily
using structured data—that is, data that are stored in a database or spreadsheet and are read-
ily searchable.
Data reduction involves the following steps (using an example of an employee creating a
fictitious vendor and submitting fake invoices):
1. Identify the attribute you would like to reduce or focus on. For example, an employee may
commit fraud by creating a fictitious vendor and submitting fake invoices. Rather than
evaluate every employee, an auditor may be interested only in employee records that
have addresses that match vendor addresses.
2. Filter the results. This could be as simple as using filters in Excel, or using the WHERE
phrase in a SQL query. It may also involve a more complicated calculation. For exam-
ple, employees who create fictitious vendors will often use addresses that are similar
to, but not exactly the same as, their own address to foil basic SQL queries. Here the
auditor should use a tool that allows fuzzy matching, which uses probability to identify
likely similar addresses.
3. Interpret the results. Once you have eliminated irrelevant data, take a moment to see
if the results make sense. Calculate the summary statistics. Have you eliminated any
obvious entries? Looking at the list of matching employees, the auditor might tweak the
probability in the fuzzy match to be more or less precise to narrow or broaden the num-
ber of employees who appear.
4. Follow up on results. At this point, you will continue to build a model or use the results
as a targeted sample for follow-up. The auditor should review company policy and fol-
low up with each employee who appears in the reduced list as it represents risk.
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 121
EXHIBIT 3-4
Use Filters to Reduce
Data
fictitious or employee-created vendor. The data reduction approach allows us to focus more
time and effort on those vendors and transactions that might require additional analysis to
make sure they are legitimate.
Another example of the data reduction approach is gap detection, where we look for
missing numbers in a sequence, such as payments made by check. Finding out why certain
check numbers were skipped and not recorded requires additional consideration such as
interviewing business process owners or those that oversee/supervise that process to deter-
mine if there are valid reasons for the missing numbers.
Data reduction may also be used to filter all the transactions between known related
party transactions. Focusing specifically on related party transactions allows the auditor to
focus on those transactions that might potentially be sensitive and/or risky.
Finally, data reduction might be used to compare the addresses of vendors and employ-
ees to ensure that employees are not siphoning funds to themselves. Use of fuzzy match
looks for correspondences between portions, or segments, of the text of each potential
match, shown in Exhibit 3-5. Once potential matches between vendors and employees are
found, additional analysis must be conducted to figure out if funds have been, or potentially
could be, siphoned.
EXHIBIT 3-5
A Fuzzy Matching
Shows a Likely Match
of an Employee
and Vendor
Lab Connection
Lab 3-1 has you reduce data by looking for similar values with a fuzzy match.
ISTUDY
122 Chapter 3 Performing the Test Plan and Analyzing the Results
duplicate payments are found and new internal controls are considered to mitigate dupli-
cate payments from occurring in the future.
Data reduction approaches may also be useful in a financial statement analysis setting,
perhaps performed by financial analysts, pension fund managers, or individual investors.
Among other uses, XBRL (eXtensible Business Reporting Language) is used to facilitate the
exchange of financial reporting information between the company and the Securities and
Exchange Commission (SEC). The SEC then makes it available to all interested parties,
including suppliers, competitors, investors, and financial analysts. XBRL requires that the
data be tagged according to the XBRL taxonomy. Using these tagged data in common-sized
financial statements, analysts develop models to access all relevant financial or nonfinan-
cial data to summarize and predict earnings, solvency, liquidity, and profitability. We’ll
explore XBRL further in Chapter 8.
PROGRESS CHECK
3. Describe how the data reduction approach could be used to evaluate employee
travel and entertainment expenses.
4. Explain how XBRL might be used by lenders to focus on specific areas of interest.
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 123
Profiling
Profiling involves gaining an understanding of a typical behavior of an individual, group,
or population (or sample). Profiling is done primarily using structured data—data that are
stored in a database or spreadsheet and are readily searchable. Using these data, analysts
can use common summary statistics to describe the individual, group, or population, includ-
ing knowing its mean, standard deviation, sum, and so on. Profiling is generally performed
on data that are readily available, so the data have already been gathered and are ready for
further analysis.
Profiling is used to discover patterns of behavior. In Exhibit 3-7, for example, the higher
the Z-score (farther away from the mean), the more likely a customer will have a delayed
shipment (blue circle). As shown in Exhibit 3-7, a Z-score of 3 represents three standard
deviations away from the mean. We use profiling to explore the attributes of the products or
vendors that may be experiencing shipping delays.
1
im, J.-H., J. Park, G. F. Peters, and V. J. Richardson, “Examining the Potential Benefits of Audit Data
L
Analytics,” University of Arkansas working paper.
ISTUDY
124 Chapter 3 Performing the Test Plan and Analyzing the Results
EXHIBIT 3-7 Z-scores Provide an Example of Profiling That Helps Identify Outliers (in This Case, Categories with Unusually
High Average Days to Ship)
A box plot will provide similar insight. Instead of focusing on the mean and standard
deviation, a box plot highlights the median and quartiles, similar to Exhibit 3-8. Box plots are
used to visually represent how data are dispersed in terms of the interquartile range (IQR).
The IQR is a way to analyze the shape of your dataset that focuses on the median. To find
EXHIBIT 3-8 Box Plots Provide an Example of Profiling That Helps Identify Outliers (in This Case, Categories with
Unusually High Average Days to Ship)
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 125
the interquartile range, the dataset must first be divided into four parts (quartiles), and the
middle two quartiles that surround the median are the IQR. The IQR is considered more
helpful than a simple range measure (maximum observation − minimum observation) as a
measure of dispersion when you are analyzing a sample instead of a population because it
turns the focus toward the most common values and minimizes the risk of the range being
too heavily influenced by outliers.
Creating quartiles and identifying the IQR is done in the following steps (of course,
Tableau and Excel also have ways to automatically create the quartiles and box plots so that
you do not have to perform these steps manually):
1. Rank order your data first (the same as you would do to find the median or to find
the range).
2. Quartile 1: the lowest 25 percent of observations
3. Quartile 2: the next 25 percent of observations—its cutoff is the median
4. Quartile 3: begins at the median and extends to the third 25 percent of observations
5. Quartile 4: the highest 25 percent of observations
6. The interquartile range includes Quartile 2 and Quartile 3.
Box plots are sometimes called box and whisker plots because they typically are made up
of one box and two “whiskers.” The box represents the IQR and the two whiskers represent
the ends of Quartiles 1 and 4. If your data have extreme outliers, these will be indicated as
dots that extend beyond the “whiskers.”
Data profiling can be as simple as calculating summary statistics on transactional data,
such as the average number of days to ship a product, the typical amount we pay for a prod-
uct, or the number of hours an employee is expected to work. On the other hand, profiling
can be used to develop complex models to predict potential fraud. For example, you might
create a profile for each employee in a company that may include a combination of salary,
hours worked, and travel and entertainment purchasing behavior. Sudden deviations from an
employee’s past behavior may represent risk and warrant follow-up by the internal auditors.
Similar to evaluating behavior, data profiling is typically used to assess data quality and
internal controls. For example, data profiling may identify customers with incomplete or
erroneous master data or mistyped transactions.
Data profiling typically involves the following steps:
1. Identify the objects or activity you want to profile. What data do you want to evaluate?
Sales transactions? Customer data? Credit limits? Imagine a manager wants to track
sales volume for each store in a retail chain. They might evaluate total sales dollars,
asset turnover, use of promotions and discounts, and/or employee incentives.
2. Determine the types of profiling you want to perform. What is your goal? Do you want to
set a benchmark for minimum activity, such as monthly sales? Have you set a budget
that you wish to follow? Are you trying to reduce fraud risk? In the retail store scenario,
the manager would likely want to compare each store to the others to identify which
ones are underperforming or overperforming.
3. Set boundaries or thresholds for the activity. This is a benchmark that may be manually
set, such as a budgeted value, or automatically set, such as a statistical mean, quartile, or
percentile. The retail chain manager may define underperforming stores as those whose
sales activity falls below the 20th percentile of the group and overperforming stores as
those whose sales activity is above the 80th percentile. These thresholds are automati-
cally calculated based on the total activity of the stores, so the benchmark is dynamic.
4. Interpret the results and monitor the activity and/or generate a list of exceptions. Here is
where dashboards come into play. Management can use digital dashboards to quickly
see multiple sets of profiled data and make decisions that would affect behavior. As
you evaluate the results, try to understand what a deviation from the defined boundary
ISTUDY
126 Chapter 3 Performing the Test Plan and Analyzing the Results
Lab Connection
Lab 3-5 has you profile online and in-person sales and make comparisons of
sales behavior.
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 127
Management accounting relies heavily on diagnostic analytics in the planning and con-
trolling process. By comparing the actual results of activity to the budgeted expectation,
management determines the processes and procedures that resulted in favorable and unfa-
vorable activity. For example, in a manufacturing company like AERT, variance analysis
compares the actual cost, price, and volume of various activities with standard equivalents,
shown in Exhibit 3-9. The unfavorable variances appear in orange as the actual cost exceeds
the budgeted cost or are to the left of the budget reference line. Favorable variances appear
to the right of the budget reference line in blue. Sales exceed the budgeted sales. As sales
volume increases, the costs (negative values) also increase, leading to an unfavorable vari-
ance in orange.
ISTUDY
128 Chapter 3 Performing the Test Plan and Analyzing the Results
to pay for CDs, beer, an all-terrain vehicle, a customized dog kennel, even a computer as his
son’s graduation gift—all the while describing the purchases as routine business expenses.”2
EXHIBIT 3-10
Benford’s Law Applied
to Large Numerical
Datasets (including
Employee Transactions)
Cluster Analysis
The clustering data approach works to identify groups of similar data elements and the
underlying relationships of those groups. More specifically, clustering techniques are used
to group data/observations into a specific number of clusters or groups so that all the data
within any cluster are similar, while data across clusters are different. Cluster analysis works
by calculating the minimum distance between each observation and the center of each clus-
ter, shown in Exhibit 3-11.
2
. Barbaro, “Wal-Mart Official Misused Company Funds,” Washington Post, July 15, 2005,
M
http://www.washingtonpost.com/wp-dyn/content/article/2005/07/14/AR2005071402055.html
(accessed August 2, 2017).
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 129
EXHIBIT 3-11
Clustering Is Used to
Find Three Natural
Groupings of Vendors
Based on Purchase
Volume
Activity
Distance
When you are exploring the data for these patterns and don’t have a specific question,
you would use an unsupervised approach. For example, consider the question: “Do our ven-
dors form natural groups based on similar attributes?” In this case, there isn’t a specific target
because you don’t yet know what similarities our vendors have. You may use clustering to
evaluate the vendor attributes and see which ones are closely related. You could also use
co-occurrence grouping to match vendors by geographic region; data reduction to simplify
vendors into obvious categories, such as wholesale or retail or based on overall volume of
orders; or profiling to evaluate products or vendors with similar on-time delivery behavior,
shown in Exhibit 3-7. In any of these cases, the data drive the analysis, and you evaluate the
output to see if it matches our intuition. These exploratory exercises may help to define bet-
ter questions, but are generally less useful for making decisions.
As an example, Walmart may want to understand the types of customers who shop at its
stores. Because Walmart has good reason to believe there are different market segments of
people, it may consider changing the design of the store or the types of products to accom-
modate the different types of customers, emphasizing the ones that are most profitable to
Walmart. To learn about the different types of customers, managers may ask whether cus-
tomers agree with the following statements using a scale of 1–7 (on a Likert scale):
• Enjoy: I enjoy shopping.
• Budget: I try to avoid shopping because it is bad for the budget.
• Eating: I like to combine my shopping with eating out.
• Coupons: I use coupons when I shop.
• Quality: I care more about the quality of the products than I do about the price.
• Apathy: I don’t care about shopping.
• Comparing: You can save a lot of money by comparing prices between various stores.
Additionally, they would ask about numerical customer behavior:
• Income: The household income of the respondent (in dollars).
• Shopping at Walmart: How many times a month do you visit Walmart?
Accountants may analyze the data and plot the responses to see if there are correlations
within the data on a scatter plot. The visual plot of the relationship between responses to
the various questions may help cluster the various customers into different clusters and help
Walmart cater to specific customer clusters better through superior insights.
Lab Connection
Lab 3-2 has you use cluster analysis to identify natural groupings of loan data.
ISTUDY
130 Chapter 3 Performing the Test Plan and Analyzing the Results
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 131
The data are normalized to reduce the distortion of the data and other outliers are
removed. They are then plotted with the number of days to pay on the y-axis and the pay-
ment amount on the x-axis. Of the eight clusters identified, three clusters highlight potential
anomalies that may require further investigation as part of an internal or external audit.
• Cluster 6 payments (purple) have a long duration between the processing to
payment dates.
• Cluster 7 payments (pink) have high payment amounts.
• Cluster 8 payments (brown) have high payment amounts and a long duration between
the processing date and the payment date.
With this insight, auditors may assess the risk associated with these payments and under-
stand transaction behavior relative to acceptable behavior defined in internal controls.
Statistical Significance
When we are working with a sample of data instead of the entire population, testing our
hypotheses is more complicated than simply comparing two means. If we discover that
the average shipping time for Copiers is higher than the average shipping time for all other
ISTUDY
132 Chapter 3 Performing the Test Plan and Analyzing the Results
categories, simply seeing that there is a difference between the two means is not enough—
we need to determine if that difference is big enough to not have been due to chance, or
put in statistical words, we need to determine if the difference is significant. When we are
making decisions about the entire population based on only a subset of data, it is possible
that even with the best sampling methods, we might have collected a sample that is not
perfectly representative of the population as a whole. Because of that possibility, we have to
take into account some room for error. When we work with data, we are not in control of
many measures that factor into the statistical tests that we run—we can’t change the mean
or the standard deviation, for example. And depending on how much control we have over
retrieving our own data, we may not even have much control over how large of a sample we
collect. Of course, if we have control, then the larger the sample, the better. Larger samples
have a better chance of being more representative of the population as a whole. We do have
control over one thing, though, and that is our significance level.
Keep in mind that when we run our hypothesis tests, we will come to the conclusion to
either “reject the null hypothesis” or “fail to reject the null hypothesis.” There is always a
risk when making inferences about any population based on a sample that the data don’t
accurately reflect the population, and that would result in us making a wrong decision. That
wrong decision isn’t the same thing as making a mistake in a calculation or getting a ques-
tion wrong on an exam—this type of error is one we won’t even know that we made. That’s
the risk associated with making inferences about a population based on a sample. What we
do to protect ourselves (other than doing everything we can to collect a large and represen-
tative sample) is determine which wrong result presents us with the least risk. Would we
prefer to erroneously assume that Copier shipping times are significantly low, when actu-
ally the shipping times for other categories are lower? Or would it be safer to erroneously
assume Copier shipping times are not significantly low, when actually they are? There is no
cut and dry answer to this. It always depends on the scenario.
The significance level, also referred to as alpha, reflects which erroneous decision we are
more comfortable with. The lower alpha is, the more difficult it is to reject the null hypoth-
esis, which minimizes the risk of erroneously rejecting the null hypothesis. Common alphas
are 1 percent, 5 percent, and 10 percent. In the next section, we will take a closer look at
what alpha means and how we use it to determine whether we will reject or fail to reject the
null hypothesis.
The p-value
We describe findings as statistically significant by interpreting the p-value of the statisti-
cal test. The p-value is compared to the alpha threshold. A result is statistically significant
when the p-value is less than alpha, which signifies a difference between the two means was
detected: that the default hypothesis can be rejected. The p-value is the result of a calcula-
tion that involves summary measures from your sample. It is completely dependent upon
the sample you are analyzing, and nothing else. Alpha is the only measure in your control.
If p-value > alpha: Fail to reject the null hypothesis (data do not present a significant
result).
If p-value < = alpha: Reject the null hypothesis (data present a significant result).
Now let’s consider risk again. Consider the following screenshot of a t-test, the p-value is
highlighted in yellow (Exhibit 3-13):
If our alpha is 5 percent (0.05), we would have to fail to reject the null because the p-value
of 0.073104 is greater than alpha. But what if we set our alpha at 10 percent instead of
5 percent? All of a sudden, our p-value of 0.073104 is less than alpha (0.10). It is critical to
make the decision of what your significance level will be (1 percent, 5 percent, 10 percent)
prior to running your statistical test. The p-value shouldn’t dictate which alpha you select,
as tempting as that may be!
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 133
EXHIBIT 3-13
t-test Assessing for Sig-
nificant Differences in
Average Shipping Times
across Categories
PROGRESS CHECK
5. How does Benford’s law provide an expectation of any set of naturally occurring
collections of numbers?
6. Identify a reason the sales amount of any single product may or may not follow
Benford’s law.
7. Name three clusters of customers who might shop at Walmart.
8. In Exhibit 3-12, Cluster 1 of the group insurance highlighted claims have a long
period from death to payment dates. Why would that cluster be of interest to
internal auditors?
ISTUDY
134 Chapter 3 Performing the Test Plan and Analyzing the Results
EXHIBIT 3-14
Regression
Days to ship
Volume
EXHIBIT 3-15
Classification
Volume
Distance
where you attempt to identify causation (which can be expensive), identify a series of char-
acteristics that predict a model, or attempt to identify other relationships, respectively.
Predictive analytics facilitate making forecasts of accounting outcomes, including these
examples:
1. Helping management accountants predict future performance, including future sales,
earnings, and cash flows. This will help management accountants set budgets and plan
production, and estimate available funds for loan repayments, dividend payments, and
operations.
2. Helping company accountants predict which customers will be able to pay what they
owe the company. This will help accountants estimate the appropriate allowance for
doubtful accounts.
3. Helping auditors predict which financial statements need to be restated.
4. Helping investors and lenders predict which companies are likely to go bankrupt, or
unable to continue as a going concern.
5. Helping investors and financial analysts predict future sales, earnings, and cash flows,
critical to stock valuation.
Regression
Regressions allow the accountant to develop models to predict expected outcomes. These
expected outcomes might be to predict the number of days to ship products relative to the
volume of orders placed by the customer, shown in Exhibit 3-14.
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 135
Regression is a supervised method used to predict specific values. In this case, the num-
ber of days to ship is dependent on the number of items in the order. Therefore, we can use
regression to predict the number of days it takes Vendor A to ship based on the volume in
the order. (Vendor A is represented by the gold star in Exhibits 3-14 and 3-15.)
Regression analysis involves the following process:
1. Identify the variables that might predict an outcome (or target or dependent variable).
The inputs, or explanatory variables, are called independent variables, where the
output is a dependent variable. You will probably remember the formula for a lin-
ear equation from algebra or other math classes, y = mx + b. When we run a linear
regression with only one explanatory variable, we use the same equation, with m
representing the slope of the explanatory variable, and b representing the y-intercept.
Because linear regression models can have more than one explanatory variable,
though, the formula is written just slightly different as y = b0 + b1x for a simple (one
explanatory variable) regression, with b0 representing the y-intercept and b1 repre-
senting the slope of the explanatory variable. In a multiple regression model, each
explanatory variable receives their own slope: y = b0 + b1x1 + b2x2 . . . and on until
all of the explanatory variables and their respective slopes have been accounted for.
When you run a regression in Excel, Tableau, or other statistical software, the values
for the intercept and the slopes of each explanatory variable will be provided in a
regression output. To create a predictive model, you simply plug in the values pro-
vided in the regression output along with the values for the particular scenario you
are estimating or predicting.
A. Dummy variables: Variables in linear regression must be numerical, but sometimes
we need to include categorical variables, for instance whether consumers are male
or female, or if they are from Arkansas or New York. When that is the case, we have
to transform our categorical variables into numbers (we can’t add a word to our
formula!), and in particular into binary numbers called dummy variables. Dummy
variables can take on the values of 0 or 1, when 0 represents the absence of some-
thing and 1 represents the presence of something. You will see examples of dummy
variables in Comprehensive Lab 3-6.
2. Determine the functional form of the relationship. Is it a linear relationship where each
input plots to another, or is the relationship nonlinear? While most accounting ques-
tions utilize a linear relationship, it is possible to consider a nonlinear relationship.
3. Identify the parameters of the model. What are the relative weights of each indepen-
dent variable on the dependent variable? These are the coefficients on each of the
independent variables. Statistical t-tests assess each regression coefficient at a time
to determine if the weight is statistically different than 0 (or no weight at all). Par-
ticularly in multiple regression, it can be useful to assess the p-value for each variable.
You interpret the p-value for each variable the same way you assess the p-value in a
t-test: If the p-value is less than your alpha (typically 0.05), then you reject the null
hypothesis. In regression, that implies that the explanatory variable is statistically
significant.
4. Evaluate the goodness of fit. Calculate the adjusted R2 value to determine whether
the data are close to the line or not. In general, the better the fit (e.g., R2 > 0.8),
the more accurate the prediction will be. The adjusted R2 is a value between 0 and
1. An adjusted R2 value of 0 represents no ability to explain the dependent variable,
and an adjusted R2 value of 1 represents perfect ability to explain the dependent
variable. Another statistic is the Model F-test. The F-test of the overall significance of
the hypothesized model (that has one or more independent variables) compared to a
model with no independent variables tells us statistically if our model is better
than chance.
ISTUDY
136 Chapter 3 Performing the Test Plan and Analyzing the Results
Lab Connection
Lab 3-3 and Lab 3-6 have you calculate linear regression to predict comple-
tion rates and sales types.
3
ttp://www.cpafma.org/articles/inside-public-accounting-releases-2015-national-benchmarking-report/
h
(accessed November 9, 2016).
4
A. S. Ahmed, C. Takeda, and S. Thomas, “Bank Loan Loss Provisions: A Reexamination of Capital
Management, Earnings Management and Signaling Effects,” Journal of Accounting and Economics 28,
no. 1 (1999), pp. 1–25.
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 137
Accounting Standards Update 2016-13, which requires that banks provide an estimate of
expected credit losses (ECLs) by considering historical collection rates, current informa-
tion, and reasonable and supportable forecasts, including estimates of prepayments.5 Using
these historical and industry data, auditors may work to test a model to establish a loan loss
reserve in this way:
Allowance for loan loses amount = f (Current aged loans, Loan type,
Customer loan history, Collections success)
Classification
The goal of classification is to predict whether an individual we know very little about will
belong to one class or another. For example, will a customer have their balance written off?
The key here is that we are predicting whether the write-off will occur or not (in other words,
there are two classes: “Write-Off” and “Good”). This is in contrast with a regression that
attempts to predict many possible values of the dependent variable, rather than just a few
classes as used in classification.
5
ttp://www.pwc.com/us/en/cfodirect/publications/in-brief/fasb-new-impairment-guidance-financial-
h
instruments.html (accessed November 9, 2016).
ISTUDY
138 Chapter 3 Performing the Test Plan and Analyzing the Results
Classification is a supervised method that can be used to predict the class of a new
observation. In this case, blue circles represent “on-time” vendors. Green squares represent
“delayed” vendors. The gold star represents a new vendor with no history.
Classification is a little more involved as we are now dealing with machine learning and
complex probabilistic models. Here are the general steps:
1. Identify the classes you wish to predict.
2. Manually classify an existing set of records.
3. Select a set of classification models.
4. Divide your data into training and testing sets.
5. Generate your model.
6. Interpret the results and select the “best” model.
Classification Terminology
First, a bit of terminology to prepare us for our discussion.
Training data are existing data that have been manually evaluated and assigned a class.
We know that some customer accounts have been written off, so those accounts are assigned
the class “Write-Off.” We will train our model to learn what it is that those customers have
in common so we can predict whether a new customer will default or not.
Test data are existing data used to evaluate the model. The classification algorithm will
try to predict the class of the test data and then compare its prediction to the previously
assigned class. This comparison is used to evaluate the accuracy of the model or the prob-
ability that the model will assign the correct class.
Decision trees are used to divide data into smaller groups, and decision boundaries mark
the split between one class and another.
Exhibit 3-16 provides an illustration of both decision trees and decision boundaries.
Decision trees split the data at each branch into two or more groups. In this example, the first
branch divides the vendor data by geographic distance and inserts a decision boundary through
the middle of the data. Branches 2 and 3 split each of the two new groups by vendor volume.
Note that the decision boundaries in the graph on the right are different for each grouping.
Pruning removes branches from a decision tree to avoid overfitting the model. In other
words, pruning reduces the number of times we split the groups of data into smaller groups,
as shown in Exhibit 3-16. Pre-pruning occurs during the model generation. The model stops
creating new branches when the information usefulness of an additional branch is low.
Post-pruning evaluates the complete model and discards branches after the fact. Exhibit 3-17
provides an illustration of how pruning might work in a decision tree.
Linear classifiers are useful for ranking items rather than simply predicting class prob-
ability. These classifiers are used to identify a decision boundary. Exhibit 3-18 shows an
illustration of linear classifiers segregating the two classes.
EXHIBIT 3-16 1
Example of Decision
Trees and Decision 1
Boundaries
2
Volume
2 3
3
Distance
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 139
1 EXHIBIT 3-17
Illustration of Pruning
a Decision Tree
2 3
3 EXHIBIT 3-18
Illustration of Linear
2 1
Classifiers
Volume
Error
2
1 3
Distance
A linear discriminant uses an algebraic line to separate the two classes. In the example
noted here, the classification is a function of both volume and distance:
Class(x) = if 1.0 × Volume – 1.5 × Distance + 50 > 0
Class(x) = if 1.0 × Volume – 1.5 × Distance + 50 ≤ 0
We don’t expect linear classifiers to perfectly segregate classes. For example, the green
square that appears below the line in Exhibit 3-18 would be incorrectly classified as a circle
and considered an error.
Support vector machine is a discriminating classifier that is defined by a separating hyper-
plane that works first to find the widest margin (or biggest pipe) and then works to find the
middle line. Exhibits 3-19 and 3-20 provide an illustration of support vector machines and
how they work to find the best decision boundary.
EXHIBIT 3-19
Support Vector
Machines
With support vector
Volume
Volume
Distance Distance
ISTUDY
140 Chapter 3 Performing the Test Plan and Analyzing the Results
EXHIBIT 3-20
Support Vector
Machine Decision
Boundaries Error
OK
Volume
SVMs have two
decision boundaries at Error
the edges of the pipes. OK Decision boundary
Decision boundary
Distance
Evaluating Classifiers
When classifiers wrongly classify an observation, they are penalized. The larger the penalty
(error), the less accurate the model is at predicting a future value, or classification.
Overfitting
Rarely will datasets be so clean that you have a clear decision boundary. You should always
be wary of classifiers that are too accurate. Exhibit 3-21 provides an illustration of over-
fitting and underfitting. You want a good amount of accuracy without being too perfect.
Notice how the error rate declines from 6 to 3 to 0. You want to be able to generalize your
results, and complete accuracy creates a complex model with little predictive value. For-
mally defined, overfitting is a modeling error when the derived model too closely fits a lim-
ited set of data points. In contrast, underfitting refers to a modeling error when the derived
model poorly fits a limited set of data points.
Exhibit 3-22 provides a good illustration of the trade-offs between the complexity of the
model and the accuracy of the classification. While you may be able to come up with a very
complex model with the training data, chances are it will not improve the accuracy of correctly
classifying the test data. There is, in some sense, a sweet spot, where the model is most accurate
without being so complex to thus allow classification of both the training as well as the test data.
EXHIBIT 3-21
Illustration of Underfit-
ting and Overfitting the
Data with a Predictive
Model
.8 Sweet spot
the Model and the
Accuracy of the .7
Classification
.6
.5 Testing data
Complexity of model
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 141
PROGRESS CHECK
9. If we are trying to predict the extent of employee turnover, do you believe the
health of the economy, as measured using GDP, will be positively or negatively
associated with employee turnover?
10. If we are trying to predict whether a loan will be rejected, would you expect
credit score to be positively or negatively associated with loan rejection by a
bank such as LendingClub?
6
. M. Sullivan and R. Feinn, “Using Effect Size—or Why the P Value Is Not Enough,” Journal of Graduate
G
Medical Education 4, no. 3 (2012), pp. 279–82, https://doi.org/10.4300/JGME-D-12-00156.1.
ISTUDY
142 Chapter 3 Performing the Test Plan and Analyzing the Results
yes/no question. The answers to those questions determine what calculations to include,
which schedules to complete, and what the value of the tax return will be.
Decision support systems can help with application of accounting rules as well. For
example, when a company classifies a lease as a financing or operating lease, it must con-
sider whether the lease meets a number of criteria. Using a decision support system, a
controller could evaluate a new lease and answer five questions to determine the proper
classification, shown in Exhibit 3-23.
Under a previous version of the FASB lease standard, there would have been bright
lines to indicate hard rules to determine the lease (for example, “The lease term is greater
than or equal to 75 percent of the estimated economic life of the leased asset.”). Decision
support systems are easier to use when you have clear rules. Under the newer standard,
more judgment is needed to reach the most appropriate conclusion for the business. More
on this later.
Auditors use decision support systems as part of their audit procedures. For example,
they indicate a series of parameters such as tolerable and expected error rates. A tool like
IDEA will calculate the appropriate sample size for evaluating source documents. Once the
procedure has been performed, that is, source documents are evaluated, the auditor will
then input the number or extent of exceptional items and the decision support system might
classify the audit risk as low, medium, or high for that area.
EXHIBIT 3-23
Lease Classification
Flowchart
ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 143
Take lease classification, for instance. With the recent accounting standard, the language
has moved from bright lines (“75 percent of the useful life”) to judgment (“major part”).
While it may be tempting to rely on the accountant to manually make this decision for each
new lease, machine learning will do it more quickly and more accurately than the manual
classification. A company with a sufficiently large portfolio of previously classified leases
may use those leases as a training set for a machine learning model. Using the data attri-
butes from these leases (e.g., useful life, total payments, fair value, originating firm) and the
prior manual classification (e.g., financing, operating) of the company’s leases, the model
can evaluate a new lease and assign the appropriate classification. Post-classification verifi-
cation and correction in the case of an inappropriate outcome is then fed into the model to
improve the performance of the model.
Artificial intelligence models work similarly in that they learn from the inputs and cor-
rections to improve decision making. For example, image classification allows auditors to
take aerial photography of inventory or fixed assets and automatically identify the objects
within the photo rather than having an auditor manually check each object. Classification
of closed-circuit footage enables automatic counting of foot traffic in a retail location for
managers. Modeling of past judgment decisions by audit partners makes it possible to deter-
mine whether an allowance or estimate falls within a normal range for a client and is accept-
able or should be qualified. Artificial intelligence models track sentiment in social media
and popular press posts to predict positive stock market returns for analysts.
For most application of artificial intelligence models, the computational power is such
that most companies will outsource the underlying system to companies like Microsoft,
Amazon, or Google rather than develop it themselves. These companies provide the datasets
to train and build the model, and the platforms provide the algorithms and code. When
public accounting firms outsource data, clients may be hesitant to allow their financial data
to be used in these platforms without additional assurance surrounding the privacy and
security of their data.
PROGRESS CHECK
11. How might you expect managers to use decision support systems when evaluat-
ing employee bonuses?
12. How do machine learning and artificial intelligence models improve their recom-
mendations over time?
Summary
■ In this chapter, we addressed the third and fourth steps of the IMPACT cycle model: the
“P” for “performing test plan” and “A” for “address and refine results.” That is, how are
we going to test or analyze the data to address a problem we are facing? (LO 3-1)
■ We identified descriptive analytics that help describe what happened with the data,
including summary statistics, data reduction, and filtering. (LO 3-2)
■ We provided examples of diagnostic analytics that help users identify relationships in
the data that uncover why certain events happen through profiling, clustering, similarity
matching, and co-occurrence grouping. (LO 3-3)
ISTUDY
■ We introduced some specific models and terminology related to these tools, including
Benford’s law, test and training data, decision trees and boundaries, linear classifiers,
and support vector machines. We identified cases where creating models that overfit
existing data are not very accurate at predicting the future. (LO 3-4)
■ We explained examples of predictive analytics and introduced some data mining con-
cepts related to regression, classification, and link prediction that can help predict future
events or values. (LO 3-4)
■ We discussed prescriptive analytics, including decision support systems and artificial
intelligence and provided some examples of how these systems can make recommenda-
tions for future actions. (LO 3-5)
Key Words
alternative hypothesis (131) The opposite of the null hypothesis, or a potential result that the
analyst may expect.
Benford’s law (128) The principle that in any large, randomly produced set of natural numbers, there
is an expected distribution of the first, or leading, digit with 1 being the most common, 2 the next most,
and down successively to the number 9.
causal modeling (133) A data approach similar to regression, but used to test for cause-and-effect
relationships between multiple variables.
classification (133) A data approach that attempts to assign each unit in a population into a few cat-
egories potentially to help with predictions.
clustering (128) A data approach that attempts to divide individuals (like customers) into groups (or
clusters) in a useful or meaningful way.
co-occurrence grouping (129) A data approach that attempts to discover associations between indi-
viduals based on transactions involving them.
data reduction (120) A data approach that attempts to reduce the amount of information that needs
to be considered to focus on the most critical items (e.g., highest cost, highest risk, largest impact, etc.).
decision boundaries (138) Technique used to mark the split between one class and another.
decision support system (141) An information system that supports decision-making activity within
a business by combining data and expertise to solve problems and perform calculations.
decision tree (138) Tool used to divide data into smaller groups.
descriptive analytics (116) Procedures that summarize existing data to determine what has happened
in the past. Some examples include summary statistics (e.g. Count, Min, Max, Average, Median), distribu-
tions, and proportions.
diagnostic analytics (116) Procedures that explore the current data to determine why something has
happened the way it has, typically comparing the data to a benchmark. As an example, these allow users to
drill down in the data and see how it compares to a budget, a competitor, or trend.
digital dashboard (125) An interactive report showing the most important metrics to help users
understand how a company or an organization is performing. Often created using Excel or Tableau.
dummy variables (135) A numerical value (0 or 1) to represent categorical data in statistical analysis;
values assigned a 1 indicate the presence of something and 0 represents the absence.
effect size (141) Used in addition to statistical significance in statistical testing; effect size demon-
strates the magnitude of the difference between groups.
interquartile range (IQR) (124) A measure of variability. To calculate the IQR, the data are first
divided into four parts (quartiles) and the middle two quartiles that surround the median are the IQR.
link prediction (133) A data approach that attempts to predict a relationship between two data items.
144
ISTUDY
null hypothesis (131) An assumption that the hypothesized relationship does not exist, or that there
is no significant difference between two samples or populations.
overfitting (140) A modeling error when the derived model too closely fits a limited set of data points.
predictive analytics (116) Procedures used to generate a model that can be used to determine what is
likely to happen in the future. Examples include regression analysis, forecasting, classification, and other
predictive modeling.
prescriptive analytics (117) Procedures that work to identify the best possible options given con-
straints or changing conditions. These typically include developing more advanced machine learning and
artificial intelligence models to recommend a course of action, or optimizing, based on constraints and/or
changing conditions.
profiling (123) A data approach that attempts to characterize the “typical” behavior of an individual,
group, or population by generating summary statistics about the data (including mean, standard devia-
tions, etc.).
regression (133) A data approach that attempts to estimate or predict, for each unit, the numerical
value of some variable using some type of statistical model.
similarity matching (133) A data approach that attempts to identify similar individuals based on data
known about them.
structured data (123) Data that are organized and reside in a fixed field with a record or a file. Such data
are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms.
summary statistics (119) Describe the location, spread, shape, and dependence of a set of observa-
tions. These commonly include the count, sum, minimum, maximum, mean or average, standard deviation,
median, quartiles, correlation covariance, and frequency that describe a specific measurable value.
supervised approach/method (133) Approach used to learn more about the basic relationships
between independent and dependent variables that are hypothesized to exist.
support vector machine (139) A discriminating classifier that is defined by a separating hyperplane
that works first to find the widest margin (or biggest pipe).
test data (138) A set of data used to assess the degree and strength of a predicted relationship estab-
lished by the analysis of training data.
time series analysis (137) A predictive analytics technique used to predict future values based on
past values of the same variable.
training data (138) Existing data that have been manually evaluated and assigned a class, which
assists in classifying the test data.
underfitting (140) A modeling error when the derived model poorly fits a limited set of data points.
unsupervised approach/method (129) Approach used for data exploration looking for potential
patterns of interest.
XBRL (eXtensible Business Reporting Language) (122) A global standard for exchanging finan-
cial reporting information that uses XML.
ISTUDY
3. Data reduction may be used to filter out ordinary travel and entertainment expenses so an
auditor can focus on those that are potentially erroneous or fraudulent.
4. The XBRL tagging allows an analyst or decision maker to focus on one or a category of
expenses of most interest to a lender. For example, lenders might be most interested in
monitoring the amount of long-term debt, interest payments, and dividends paid to assess
if the borrower will be able to repay the loan. Using the capabilities of XBRL, lenders could
focus on just those individual accounts for further analysis.
5. For many real-life sets of numerical data, Benford’s law provides an expectation for the
leading digit of numbers in the dataset. Diagnostic analytics use Benford’s law as the
expectation to highlight differences a dataset might have, and potentially serve as an indi-
cator of fraud or errors.
6. A dollar store might sell everything for exactly $1.00. In that case, the use of Benford’s law
for any single product or even for every product would not follow Benford’s law!
7. Three clusters of customers who might consider Walmart could include thrifty shoppers
(looking for the lowest price), shoppers looking to shop for all of their household needs
(both grocery and non-grocery items) in one place, and those customers who live close to
the store (good location).
8. The longer time between the death and payment dates begs one to ask why it has taken
so long for payment to occur and if the interest required to be paid is likely large. Because
of these issues, there might be a possibility that the claim is fraudulent or at least deserves
a more thorough review to explain why there was such a long delay.
9. We certainly could let the data speak and address this question directly. In general, when
the health of the economy is stronger, there are fewer layoffs and fewer people out look-
ing for a job, which means less turnover. Additional analysis could determine whether the
turnover is voluntary or involuntary.
10. Chapter 1 illustrated that LendingClub collects the credit score data, and the initial analy-
sis there suggested the higher the credit score, the less likely to be rejected. Given this
evidence, we would predict a negative relationship between credit score and loans that
are rejected.
11. Decision support systems follow rules to determine the appropriate amount of a bonus.
Following a set of rules, the system may evaluate management goals, such as a sales target or
number of new accounts, to calculate and recommend the appropriate bonus compensation.
12. Machine learning and artificial intelligence models learn by incorporating new data and
through manual correction of data. For example, when a misclassified lease is corrected,
the accuracy of the recommended classification of future leases improves.
1. (LO 3-4) is a set of data used to assess the degree and strength of a predicted
relationship.
a. Training data
b. Unstructured data
c. Structured data
d. Test data
2. (LO 3-4) These data are organized and reside in a fixed field with a record or a file. Such
data are generally contained in a relational database or spreadsheet and are readily
searchable by search algorithms. The term matching this definition is:
a. training data.
b. unstructured data.
c. structured data.
d. test data.
146
ISTUDY
3. (LO 3-3) An observation about the frequency of leading digits in many real-life sets of
numerical data is called:
a. leading digits hypothesis.
b. Moore’s law.
c. Benford’s law.
d. clustering.
4. (LO 3-1) Which approach to Data Analytics attempts to predict a relationship between
two data items?
a. Similarity matching
b. Classification
c. Link prediction
d. Co-occurrence grouping
5. (LO 3-4) In general, the more complex the model, the greater the chance of:
a. overfitting the data.
b. underfitting the data.
c. pruning the data.
d. a more accurate prediction of the data.
6. (LO 3-4) In general, the simpler the model, the greater the chance of:
a. overfitting the data.
b. underfitting the data.
c. pruning the data.
d. the need to reduce the amount of data considered.
7. (LO 3-4) is a discriminating classifier that is defined by a separating hyperplane that
works first to find the widest margin (or biggest pipe) and then works to find the middle line.
a. Linear classifier
b. Support vector machine
c. Decision tree
d. Multiple regression
8. (LO 3-3) Auditing financial statements, looking for errors, anomalies and possible fraud,
is most consistent with which type of analytics?
a. Descriptive analytics
b. Diagnostic analytics
c. Predictive analytics
d. Prescriptive analytics
9. (LO 3-4) Models associated with regression and classification data approaches all have
these important parts except:
a. identifying which variables (we’ll call these independent variables) might help pre-
dict an outcome (we’ll call this the dependent variable).
b. the functional form of the relationship (linear, nonlinear, etc.).
c. the numeric parameters of the model (detailing the relative weights of each of the
variables associated with the prediction).
d. test data.
10. (LO 3-1) Which approach to Data Analytics attempts to assign each unit in a population
into a small set of classes where the unit belongs?
a. Classification
b. Regression
c. Similarity matching
d. Co-occurrence grouping
ISTUDY
Discussion and Analysis
®
Problems
®
1. (LO 3-1) Match the test approach to the appropriate type of Data Analytics:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
1. Clustering
2. Classification
3. Summary statistics
5. Link prediction
6. Co-occurrence grouping
8. Similarity matching
10. Profiling
11. Regression
148
ISTUDY
2. (LO 3-2) Identify the order sequence in the data reduction approach to descriptive ana-
lytics (i.e., 1 is first; 4 is last).
3. (LO 3-1) Match the accounting question to the appropriate type of Data Analytics:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
4. (LO 3-4) Identify the order sequence in the classification approach to descriptive analyt-
ics (i.e., 1 is first; 6 is last).
5. (LO 3-4) Match the classification definitions to the appropriate classification terminology:
• Training data
• Test data
• Decision trees
• Decision boundaries
• Support vector machine
• Overfitting
• Underfitting
ISTUDY
Classification Definition Classification Terms
1. T
echnique used to mark the location of the split between
one class or another.
2. A
set of data used to assess the degree and strength of a
predicted relationship established by the training data.
3. A
modeling error when the derived model poorly fits a
limited set of data points.
4. E
xisting data that have been manually evaluated and
assigned a class.
5. A
discriminating classifier that is defined by a separating
hyperplane that works to find the widest margin (or biggest
pipe).
6. Tool used to divide data into smaller groups.
7. A
modeling error when the derived model too closely fits a
limited set of data points.
6. (LO 3-3) Related party transactions involve people who have close ties to an organi-
zation, such as board members. Assume an accounting manager decides that fuzzy
matching would be a useful technique to find undisclosed related party transactions.
Using the fields below, identify the pairings between the related party table and the
vendor and customer tables that could independently identify a fuzzy match.
7. (LO 3-3) An auditor is trying to figure out if the inventory at an electronics store chain
is obsolete. From the following list, identify whether each attribute would be useful for
predicting inventory obsolescence or not.
150
ISTUDY
8. (LO 3-4) An auditor is trying to figure out if the goodwill its client recognized when it
purchased another company has become impaired. What characteristics might be used
to help establish a model predicting goodwill impairment? Label each of the following
as either a Supervised data approach or Unsupervised data approach.
9. (LO 3-3) Analysis: How might clustering be used to describe customers who owe
money (accounts receivable)?
10. (LO 3-2) Analysis: Why would the use of data reduction be useful to highlight related
party transactions (e.g., CEO has their own separate company that the main company
does business with)?
11. (LO 3-2) An investor wants to do an analysis of the industry’s inventory turnover using
XBRL. Indicate the XBRL tags that would be used with an inventory turnover calculation.
12. (LO 3-2) Identify the behavior, error, or fraudulent scheme that could be detected when
you apply Benford’s law to the following accounts.
ISTUDY
distance from a company’s warehouse matter? How about the age of vendor relationship or
number of vendor employees? What else?
Distance Formula
You can use a distance formula in Excel to calculate the distance in miles or kilometers
between the warehouse and the vendor. First, you determine the latitude and longitude
based on the address, then use the following formula. Note: Use first number 3959 for miles
or 6371 for kilometers.
3959 * ACOS(SIN(RADIANS([Lat])) * SIN(RADIANS([Lat2])) +
COS(RADIANS([Lat])) * COS(RADIANS([Lat2])) * COS(RADIANS
([Long2]) – RADIANS([Long])))
Assign Classes
Take a moment to define your classes. You are trying to predict whether a given order
shipment will either be “On-time” or “Delayed” based on the number of days it takes from
the order date to the shipping date. What does “on-time” mean? Let’s define on-time as an
order that ships in 5 days or less and a delayed order as one that ships later than 5 days.
You’ll use this rule to add the class as a new attribute to each of your historical records (see
Table 3-A2).
On-time = (Days to ship ≤ 5)
Delayed = (Days to ship > 5)
152
ISTUDY
LABS ®
ISTUDY
Tableau | Prep
154
ISTUDY
b. Click the Suppliers query then go to the Home tab in the ribbon and
click Merge Queries.
c. In the Merge window:
1. Choose Employee_Listing from the drop-down menu above the bot-
tom table.
2. From the Join Kind drop-down menu, choose Inner (only matching
rows).
3. Check the box next to Use fuzzy matching to perform the merge.
4. Click the arrow next to Fuzzy matching options and set the similarity
threshold to 0.5.
5. Now, hold the Ctrl key and select the Supplier_Address and
Supplier_Zip columns in the Suppliers table in the preview at the top
of the screen.
6. Finally, hold the Ctrl key and select the Employee_Street_Address
and Employee_Zip columns in the Employee_Listing table in the
preview at the bottom of the screen.
7. Click OK to merge the tables.
d. In the Power Query Editor, scroll to the right until you find
Employee_Listing and click the expand (two arrows) button to the right
of the column header.
e. Click OK to add all of the employee columns.
f. From the Home tab in the ribbon, click Choose Columns.
g. Uncheck (Select all columns) and check the following attributes and
click OK:
1. Supplier_Company_Name
2. Supplier_Address
3. Supplier_Zip
4. Employee_First_Name
5. Employee_Last_Name
6. Employee_Street_Address
7. Employee_Zip
h. Take a screenshot (label it 3-1MA).
3. You may notice that you have too many matches. Now adjust your fuzzy
match to show fewer, more likely matches:
a. In the Query Steps panel on the right, click the gear icon next to the
Merged Queries step to open the Merge panel with fuzzy match options.
b. Click the arrow next to Fuzzy matching options and change the Similar-
ity threshold value to 0.7.
c. Click OK to return to Power Query Editor.
d. In the Query Steps panel on the right, click the last step to expand and
remove the other columns: Removed Other Columns.
e. Take a screenshot (label it 3-1MB).
4. Answer the lab questions, then close Power Query and Power BI. Save your
Power BI workbook as 3-1 Slainte Fuzzy.pbix.
ISTUDY
Tableau | Prep
156
ISTUDY
AQ3. If you were the employee committing fraud, what would you try to do with the
data to evade detection?
ISTUDY
Tableau | Desktop
158
ISTUDY
Rev. Confirming Pages
c. In the Fields pane, drag the following values to the appropriate fields,
then click the drop-down menu next to each to set the summary measure
(e.g., Sum, Average, Count):
1. X Axis: dti > Average
2. Y Axis: loan_amnt > Average
3. Values: int_rate
4. Now show clusters in your data:
a. Click the three dots in top-right corner of your scatter chart visualization
and choose Automatically Find Clusters.
b. Enter the following parameters and click OK.
1. Name: Clusters by Interest Rate
2. Number of clusters: 6
c. Right-click on your scatter plot and choose Show as Table.
5. Finally, clean up your report. In the Visualizations pane, click the Format
visual (paintbrush) icon and add the following:
a. Visual > X axis > Title > Average Debt-to-Income Ratio
b. Visual > Y axis > Title > Average Loan Amount
c. General > Title > Text > Interest Clusters
6. Take a screenshot (label it 3-2MA) of your report.
7. When you have finished answering the questions, close Power BI Desktop
and save your workbook as 3-2 Lending Club Clusters.pbix.
Tableau | Desktop
Lab 3-3 P
erform a Linear Regression Analysis—College
Scorecard
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Company Summary: The data used are a subset of the College Scorecard dataset that is
provided by the U.S. Department of Education. These data provide federal financial aid and
earnings information, insights into the performance of schools eligible to receive federal
financial aid, and the outcomes of students at those schools. You can learn more about how
the data are used and view the raw data yourself at collegescorecard.ed.gov/data. However,
for this lab, you should use the text file provided to you.
Data: Lab 3-3 College Scorecard Transform.zip - 1.8MB Zip / 1.3MB Excel / 1.4MB
Tableau
160
ISTUDY
Lab 3-3 Example Output
By the end of this lab, you will create a regression to understand relationships with data.
While your results will include different data values, your work should look similar to this:
Tableau | Desktop
ISTUDY
Lab 3-3 Prepare Your Data and Create a Linear
Regression
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-3 [Your name] [Your email address].docx.
This lab relies upon the steps completed in Lab 2-5 in which the data were prepared. For
a description of the variables, refer to the data dictionary in Appendix K.
We will begin with a simple regression with two variables, SAT average (SAT_AVG) and
completion rate for first-time, full-time students at four-year institutions (C150_4).
A note about regressions: You can perform regressions on data only where a value is
present for both the explanatory and response variable. If there are blank or null values, you
will encounter errors at best and inaccurate models at worst. Make sure you filter your data
before beginning the analysis.
Note: This lab will show how to calculate a regression using both Excel and Power BI.
Excel provides more detailed numbers, such as the slope and y-intercept so that we can
build a prediction model. However, if we don’t need to build the model but we want
to include a regression model visualization in a report, then PowerBI is a good choice.
1. Open a new workbook in Excel and connect to your data:
a. From the Data ribbon, click Get Data > From File > From Workbook.
b. Navigate to your Lab 3-3 College Scorecard Transform.xlsx file and click
Open.
c. Check the Lab_3_3_College_Scorecard_Dataset table and click
Transform or Edit.
d. Click the drop-down menu next to SAT_AVG and click Remove Empty
to remove null values.
e. Click the drop-down menu next to C150_4 and click Remove Empty to
remove null values.
f. Click Close & Load.
2. Before you can perform a regression in Excel, you need to first enable the
Data Analysis ToolPak.
a. Go to File > Options > Add-ins.
b. Next to Manage at the bottom of the screen, choose Excel Add-ins and
click Go. . . .
c. Check the box next to Analysis ToolPak and click OK. You will now see
the Data Analysis option in the Data tab on the Excel ribbon.
3. Now you can perform you regression:
a. From the Data tab in the ribbon click the Data Analysis button.
b. Choose Regression from the Analysis Tools list and click OK. A regres-
sion window will pop up for you to input your variables.
c. For the Input Y Range, click the select (up arrow) button and
highlight the data that contains your response variable: C150_4
($AA$1:$AA$1272).
162
ISTUDY
Rev. Confirming Pages
d. For the Input X Range, click the select (up arrow) button and
highlight the data that contain your response variable: SAT_AVG
($H$1:$H$1272).
e. If your selections contain column headers, place a check mark in the box
next to Labels.
f. Click OK. This will run the regression test and place the output on a new
spreadsheet in your Excel workbook.
g. Click the R Square value and highlight the cell yellow.
4. Take a screenshot (label it 3-3MA) of your regression output.
5. When you are finished answering the lab questions you may close Excel. Save
your file as Lab 3-3 College Scorecard Regression.xlsx.
Now repeat the model using Power BI:
1. Open a new workbook in Power BI and connect to your data:
a. From the Home tab on the ribbon, click Get Data > Excel.
b. Navigate to your Lab 3-3 College Scorecard Transform.xlsx file and click
Open.
c. Check the Lab_3_3_College_Scorecard_Dataset table and click Load.
Note: We will assume that this Power BI report would include more
data models and visualizations in addition to the regression, so we
will filter the values on the report page instead of in Power Query like
we did with Excel.
d. In the View tab in the ribbon, click the Page view button and change to
Actual Size.
2. Begin by creating a scatter chart on your report:
a. In the Visualizations panel, click the Scatter chart button to add a new
visual to your report.
b. Drag the following attributes to the visualization fields panel:
1. Values: UNITID
2. X Axis (explanatory variable): SAT_AVG
3. Y Axis (response variable): C150_4
c. To remove null values from our analysis, go to the Filters panel and adjust
the following:
1. Drag SAT_AVG from the fields list to Filters on this page.
2. From the Show items when the value: drop-down menu, choose is not
blank and click Apply Filter.
3. Drag C150_4 from the fields list to Filters on this page.
4. From the Show items when the value: drop-down menu, choose is not
blank and click Apply Filter.
d. Now add your regression line:
1. In the Visualizations panel, click the Analytics (magnifying glass)
button.
2. Click Trend line > On.
e. Take a screenshot (label it 3-3MB) of your scatter chart.
164
ISTUDY
AQ3. Identifying the cause and effect as you did in Q2 can help you determine the
explanatory and response variables. Which variable, SAT average or completion
rate, is the explanatory variable?
Lab 3-4 C
omprehensive Case: Descriptive Analytics:
Generate Summary Statistics—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. So far you have analyzed the ER Diagram to gain a bird’s-eye view of all
the different tables and fields in the database, and you have explored the data in each table
to gain a glimpse at sample values from each field and how they are all formatted. You also
gained a little insight into the distribution of sample values across each field, but at this
point you are ready to dig into the data a bit more.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.
Microsoft Excel
LAB 3-4M Example Summary Statistics in Microsoft Excel
166
ISTUDY
Tableau | Desktop
ISTUDY
d. Click OK.
e. Click Edit to open Power Query Editor.
3. Take a screenshot (label it 3-4MA).
4. Click the Home tab on the ribbon and then click Close & Load.
5. While you can calculate these statistics individually using formulas like
SUM() and AVERAGE(), you can also have Excel calculate them automati-
cally through the Data Analysis ToolPak. If you haven’t added this compo-
nent into Excel yet, follow this menu path: File > Options > Add-ins. From
this window, select the Go. . . button, and then place a check mark in the
box next to Analysis ToolPak. Once you click OK, you will be able to access
the ToolPak from the Data tab on the Excel ribbon.
6. Click the Data Analysis button from the Data tab on the Excel ribbon and
select Descriptive Statistics and click OK.
a. For the Input Range, select the three columns that we are measuring,
ORIG_PRICE, SALE_PRICE, and TRAN_AMT. Leave the default to
columns, and place a check mark in Labels in First Row.
b. Place a check mark next to Summary Statistics, then press OK. This
may take a few moments to run because it is a large dataset.
7. Take a screenshot (label it 3-4MB) of the summary statistic results.
8. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 3-4 Dillard’s Stats.xlsx.
Tableau | Desktop
1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT TRANSACT.*
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’
e. Click OK.
f. Click Update Now to preview your data.
3. Take a screenshot (label it 3-4TA).
4. Click Sheet 1 and rename it Summary Statistics.
168
ISTUDY
5. Drag the following fields to the Columns shelf:
a. ORIG_PRICE
b. SALE_PRICE
c. TRAN_AMT
6. From the Analysis menu, uncheck Aggregate Measures. This will change the
bars to show individual circles for each observation in the dataset. This may
take a few seconds to run.
7. From the Worksheet menu, check Show Summary.
8. Hide the Show Me pane to see your summary statistics.
9. Click the drop-down menu in the Summary card and check Standard
Deviation.
10. Take a screenshot (label it 3-4TB).
11. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 3-4 Dillard’s Stats.twb.
ISTUDY
Lab 3-5 Example Output
By the end of this lab, you will create a chart that will let you compare online and in-person
sales. While your results will include different data values, your work should look similar
to this:
Microsoft Excel
LAB 3-5M Example Comparison of Distributions in Microsoft Excel
Tableau | Desktop
170
ISTUDY
In this lab, we will separate the online sales and the in-person sales and view how their
distributions differ. After viewing the distributions, we will run a hypothesis t-test to deter-
mine if the average transaction amount for online sales versus in-person sales is significantly
different. We cannot complete hypothesis t-tests in Tableau without combining it with R or
Python, so Part 2 of this lab will only be in the Microsoft Excel path.
ISTUDY
b. From the ribbon, select Insert > Statistical Charts > Box and Whisker.
1. Glancing at this chart, it appears that there are some extreme outli-
ers in the In-Person category, but if you expand the chart you will
see the average and median are higher for online sales. To view the
values for the summary statistics, you can select the column and view
the results in the status bar or you can calculate them manually with
functions (=avg or =median) or run descriptive statistics from the
Data Analysis ToolPak.
5. Take a screenshot (label it 3-5MA).
6. Answer the lab questions, then continue to Part 2.
Tableau | Desktop
1. The data preparation for working with the box plots in Tableau Desktop
is much simpler because we are not also preparing our data for hypothesis
t-test analysis.
2. Open Tableau Desktop.
3. Go to Connect > To a Server > Microsoft SQL Server.
4. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
5. Instead of connecting to a table, you will create a New Custom SQL query.
Double-click New Custom SQL and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
6. Click Sheet 1 and rename it Sales Analysis.
7. Double-click TRAN_AMT.
8. From the Analysis menu, uncheck Aggregate Measures. This will change the
bars to show individual circles for each observation in the dataset. This may
take a few seconds to run.
9. To resize the circles, click the Size button on the Marks pane and drag it
until it is close to the left.
10. To add a box-and-whiskers plot, click the Analytics tab and drag Box Plot to
the Cell button on your sheet.
11. This box plot shows all of the transaction amounts (both online and in-person).
To view two separate box plots, you need to first create a new variable to
isolate online sales from in-person sales.
a. From the Analysis menu, select Create Calculated Field. . .
b. Replace the default Calculation1 with the name Dummy Variable.
172
ISTUDY
c. Input the following Calculation: IF [STORE] = 698 THEN “Online”
ELSE “In-Person” END.
d. Drag Dummy Variable to the Columns shelf.
e. The default size is thin and a little difficult to read. Changing the axis
to a logarithmic scale will let you see the box plot better. Right-click the
vertical axis (TRAN_AMT) and choose Edit Axis. . .
f. Check the box next to Logarithmic and give your axis a friendly title, like
Transaction Amount. Close the window to return to your chart.
12. You now have two separate box plots to compare. The In-Person box plot
has a much wider range and some quite high outliers, but the interquartile
range for the Online box plot is broader. This suggests that we would want
to explore this data further to understand the very high outliers in the In-
Person transactions and if the averages are significantly different between
the two sets of transactions.
13. Take a screenshot (label it 3-5TA).
14. Answer the lab questions and close your workbook. Save it as Lab 3-5 Dillards
Sales Analysis.twb. The Tableau track does not have a Part 2 for this lab.
1. From Microsoft Excel, click the Data Analysis button from the Data tab on
the ribbon.
a. If you haven’t added this component into Excel yet, follow this menu
path: File > Options > Add-ins. From this window, select the Go. . .
button, and then place a check mark in the box next to Analysis
ToolPak. Once you click OK, you will be able to access the ToolPak
from the Data tab on the Excel ribbon.
ISTUDY
b. Select t-test: Two Sample Assuming Unequal Variances.
1. Variable 1 Range: select all In-Person transactions (Column O).
2. Variable 2 Range: select all Online transactions (Column P).
3. Place a check mark next to Labels.
4. Click OK.
2. Take a screenshot (label it 3-5MB) of the t-test output.
Tableau | Desktop
This portion of the lab cannot be completed using Tableau.
Lab 3-6 C
omprehensive Case: Create a Data Abstract and
Perform Regression Analysis—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: After running diagnostic analysis on Dillard’s transactions, you have
found that there is a statistically significant difference between the amount of money cus-
tomers spend in online transactions versus in-person transactions. You decide to take this
a step further and design a predictive model to help determine how much a customer will
spend based on the transaction type (online or in-person). You will run a simple regression
to create a predictive model using one explanatory variable in Part 1, and in Part 2 you will
extend your analysis to add in an additional explanatory variable—tender type (the method
of payment the customer chooses).
174
ISTUDY
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.
Microsoft Excel
LAB 3-6M Example Regression in Microsoft Excel
Tableau | Desktop
ISTUDY
Lab 3-6 Part 1 Perform an Analysis of the Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-6 [Your name] [Your email address].docx.
Dillard’s is trying to figure out when its customers spend more on individual transac-
tions. We ask questions regarding how Dillard’s sells its products.
176
ISTUDY
5. Perform a regression analysis by performing the following steps:
a. Click on the Data Analysis button in the Data tab in the ribbon. If you
do not have the Data Analysis ToolPak added in, see Appendix C to
learn how to add it to Excel.
b. Click Regression, and then click OK.
c. Reference the cells that contain the TRAN_AMT in the Input Y Range
and Online-Dummy in the Input X Range and then click OK. Note:
Regression Analysis does not accept null values, so it will not work to
select the entire columns for your Y and X ranges. Be sure to select
only the column label and the values; the values should extend to row
570,062.
6. Take a screenshot (label it 3-6MB) of your results.
7. Save your file as Lab 3-6 Dillard’s Regression.xlsx.
Tableau | Desktop
1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
e. Click OK.
3. Take a screenshot (label it 3-6TA).
4. Click Sheet 1 and name it Regression.
5. Double-click TRAN_AMT to add it to the Rows shelf.
6. From the Analysis menu, uncheck Aggregate Measures. This will change the
bars to show individual circles for each observation in the dataset. This may
take a few seconds to run.
7. To isolate online sales from in-person sales, you need to create a calculated
field. Unlike the way you performed this change to create box plots in the
Diagnostic Analytics lab, this time you must name the variables 1 and 0
instead of online and in-person.
a. From the Analysis menu, select Create Calculated Field. . .
b. Replace the default Calculation1 with the name Online-Dummy.
ISTUDY
c. Input the following Calculation: IF [STORE] = 698 THEN 1 ELSE 0
END.
8. Drag Online-Dummy to the Columns shelf.
9. From the Analysis tab click Trend Lines > Show All Trend Lines.
10. To see the trend line more clearly, right-click the vertical axis (Tran Amt)
and choose Edit Axis. Check the box next to Logarithmic, add a friendly axis
title such as Transaction Amount, and close the window.
11. Use your mouse to hover over the trend line to reveal the regression formula
and p-value.
12. Take a screenshot (label it 3-6TB).
13. Save your file as Lab 3-6 Dillard’s Regression.twb.
1. Return to your Lab 3-6 Regression.xlsx file and edit the existing query.
a. The Queries & Connections window should still be available on the
right side of your workbook. If it is, double-click on Query 1 to open the
Power Query editor.
1. If the Queries & Connections window is not showing to the right of your
workbook, click the Data tab in the ribbon > Queries & Connections.
178
ISTUDY
2. Add a new conditional column. This time name it BANK-Dummy.
a. New column name: BANK-Dummy
b. Column Name: TENDER_TYPE
c. Operator: =
d. Value: BANK
e. Output: 1
f. Otherwise: 0
3. Close and load the data back into Excel.
4. Now it is time to run the regression. You will take similar steps to what you
did in Part 1 of this lab, but this time you will select both dummy variable
columns for your x-variable.
a. Click on Data Analysis button in the Data tab in the ribbon.
b. Click Regression and then click OK.
c. Reference the cells that contain the TRAN_AMT in the Input Y Range
and Online-Dummy and BANK-Dummy in the Input X Range and then
click OK.
5. Take a screenshot (label it 3-6MC) of your results.
6. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 3-6 Dillard’s Regression.xlsx.
Tableau | No Lab
This portion of the lab is not possible in Tableau Prep nor in Tableau Desktop.
ISTUDY
Chapter 4
Communicating Results and
Visualizations
A Look Back
In Chapter 3, we considered various models and techniques used for Data Analytics and discussed when to use them
and how to interpret the results. We also provided specific accounting-related examples of when each of these specific
data approaches and models is appropriate to address our particular question.
A Look Ahead
Most of the focus of Data Analytics in accounting is focused on auditing, managerial accounting, financial statement
analysis, and tax. This is partly due to the demand for high-quality data and the need for enhancing trust in the assur-
ance process, informing management for decisions, and aiding investors as they select their portfolios. In Chapter 5,
we look at how both auditors and managers are using technology in general to improve the decisions being made. We
also introduce how Data Analytics helps facilitate continuous auditing and reporting.
180
ISTUDY
One of the first uses of a heat map as a form of data visualization is also one of history’s most impactful. In the mid-
1800s, there was a worldwide cholera pandemic. Scientists were desperate to determine the cause to put a stop to
the pandemic, and one of those scientists, John Snow, studied a particular London neighborhood that was suffering
from a large number of cholera cases in 1854. Snow created a map of the outbreak that included small bar charts on
the streets indicating the number of people affected by the disease across different locations in the neighborhood.
He suspected that the outbreak was linked to water, so he also drew small crosses on the map to indicate water
sources. Through this visualization, Snow was able to identify that the people who were dying nearly all had one thing
in common—they were drinking out of the same water source. This led to the discovery of cholera being conveyed
through contaminated water. Exhibit 4-1A shows Snow’s 1854 cholera map.
Software and methods for creating heat maps to visualize epidemics has improved since 1854, but the purpose
still exists. Using a heat map to visualize clusters of people impacted by epidemics helps researchers, health profes-
sionals, and policy makers identify patterns and ultimately inform decisions about how to resolve epidemics. For
example, in Exhibit 4-1B, this map can help readers quickly come to insight about where the overdose epidemic is
most prevalent.
Without Snow’s hypothesis, methods for testing it, and ultimately communicating the results through data
visualization, the 1854 cholera outbreak would have continued with scientists still being uncertain of the cause of
cholera.
EXHIBIT 4-1A
ISTUDY
Drug overdose
deaths per 100,000
0–4
4.1–8
8.1–12
12.1–16
16.1–20
>20
EXHIBIT 4-1B
Source: CDC
OBJECTIVES
After reading this chapter, you should be able to:
182
ISTUDY
Chapter 4 Communicating Results and Visualizations 183
Data are important, and Data Analytics is effective, but they are only as important and
effective as we can communicate and make the data understandable. One of the authors
often asks her students what they would do if they were interns and their boss asked them to
supply information regarding in which states all of the customers their organization served
were located. Would they simply point their boss to the Customers table in the sales data-
base? Would they go a step further and isolate the attributes to the Company Name and the
State? Perhaps they could go even further and run a quick query or PivotTable to perform
a count on the number of customers in each different state that the company serves. If they
were to give their boss what they actually wanted, however, they should provide a short writ-
ten summary of the answer to the research question, as well as an organized chart to visual-
ize the results. Data visualization isn’t just for people who are “visual” learners. When the
results of data analysis are visualized appropriately, the results are made easier and quicker
to interpret for everybody. Whether the data you are analyzing are “small” data or “big” data,
they still merit synthesis and visualization to help your stakeholders interpret the results with
ease and efficiency.
Think back to some of the first data visualizations and categorizations you were exposed
to (the food guide pyramid/food plate, the animal kingdom, the periodic table) and, more
modernly, how frequently infographics are applied to break down a series of complicated
information on social media. These charts and infographics make it easier for people to
understand difficult concepts by breaking them down into categories and visual components.
ISTUDY
184 Chapter 4 Communicating Results and Visualizations
EXHIBIT 4-2
Anscombe’s Quartet
(Data)
It is only when the data points are visualized that you see that the datasets are quite dif-
ferent. Even the regression results are not able to differentiate between the various datasets
(as shown in Exhibit 4.3). While not always the case, the example of Anscombe’s Quartet
would be a case where visualizations would more readily communicate the results of the
analysis better than statistics.
EXHIBIT 4-3
Figure Plotting the
Four Datasets in
Anscombe’s Quartet
ISTUDY
Chapter 4 Communicating Results and Visualizations 185
Gorodenkoff/Shutterstock
ISTUDY
186 Chapter 4 Communicating Results and Visualizations
tasks; analysts use graphs to plot stock price and financial performance over time to select
portfolios that meet expected performance goals.
In any project that will result in a visual representation of data, the first charge is ensur-
ing that the data are reliable and that the content necessitates a visual. In our case, however,
ensuring that the data are reliable and useful has already been done through the first three
steps of the IMPACT model.
At this stage in the IMPACT model, determining the method for communicating your
results requires the answers to two questions:
1. Are you explaining the results of previously done analysis, or are you exploring the data
through the visualization? (Is your purpose declarative or exploratory?)
2. What type of data are being visualized (conceptual [qualitative] or data-driven
[quantitative])?
Scott Berinato, senior editor at Harvard Business Review, summarizes the possible
answers to these questions4 in a chart shown in Exhibit 4-4. The majority of the work that
we will do with the results of data analysis projects will reside in quadrant 2 of Exhibit 4-4,
the declarative, data-driven quadrant. We will also do a bit of work in Exhibit 4-4’s quadrant
4, the data-driven, exploratory quadrant. There isn’t as much qualitative work to be done,
although we will work with categorical qualitative data occasionally. When we do work
with qualitative data, it will most frequently be visualized using the tools in quadrant 1, the
declarative, conceptual quadrant.
3 4
Exploratory
Once you know the answers to the two key questions and have determined which quad-
rant you’re working in, you can determine the best tool for the job. Is a written report with a
simple chart sufficient? If so, Word or Excel will suffice. Will an interactive dashboard and
repeatable report be required? If so, Tableau may be a better tool. Later in the chapter, we
will discuss these two tools in more depth, along with when each should be used.
ISTUDY
Chapter 4 Communicating Results and Visualizations 187
ways, nominal data and ordinal data. Nominal data are the simplest form of data. Examples
of nominal data are hair color, gender, and ethnic groups. If you have a set of data on peo-
ple with different hair color, you can count the number of individuals who fit into the same
hair color category, but you cannot rank it (brown hair isn’t better than red hair), nor can
you take an average or do any other further calculations beyond counting (you can’t take an
average of “blonde”). Increasing in complexity, but still categorized as qualitative data, are
ordinal data. Ordinal data can also be counted and categorized like nominal data but can
go a step further—the categories can also be ranked. Examples of ordinal data include gold,
silver, and bronze medals, 1–5 rating scales on teacher evaluations, and letter grades. If you
have a set of data of students and the letter grades they have earned in a given course, you
can count the number of instances of A, B, C, and so on, and you can categorize them, just
like with nominal data. You can also sort the data meaningfully—an A is better than a B,
which is better than a C, and so on. But that’s as far as you can take your calculations—as
long as the grades remain as letters (and aren’t transformed into the corresponding numeri-
cal grade for each individual), you cannot calculate an average, standard deviation, or any
other more complex calculation.
Beyond counting and possibly sorting (if you have ordinal data), the primary statistic
used with qualitative data is proportion. The proportion is calculated by counting the num-
ber of items in a particular category, then dividing that number by the total number of
observations. For example, if you had a dataset of 150 people and had each individual’s
corresponding hair color with 25 people in your dataset having red hair, you could calculate
the proportion of red-haired people in your dataset by dividing 25 (the number of people
with red hair) by 150 (the total number of observations in your dataset). The proportion of
red-haired people, then, would be 16.7 percent.
Qualitative data (both nominal and ordinal) can also be referred to as “conceptual” data
because such data are text-driven and represent concepts instead of numbers.
Quantitative data are more complex than qualitative data because not only can they be
counted and grouped just like qualitative data, but the differences between each data point
are meaningful—when you subtract 4 from 5, the difference is a numerical measure that can
be compared to subtracting 3 from 5. Quantitative data are made up of observations that
are numerical and can be counted and ranked, just like ordinal qualitative data, but that can
also be averaged. A standard deviation can be calculated, and datasets can be easily com-
pared when standardized (if applicable).
Similar to qualitative data, quantitative data can be categorized into two different types:
interval and ratio. However, there is some dispute among the analytics community on whether
the difference between the two datasets is meaningful, and for the sake of the analytics and
calculations you will be performing throughout this textbook, the difference is not pertinent.
The simplest way to express the difference between interval and ratio data is that ratio data
have a meaningful 0 and interval data do not. In other words, for ratio data, when a dataset
approaches 0, 0 means “the absence of.” Consider money as ratio data—we can have 5 dol-
lars, 72 dollars, or 8,967 dollars, but as soon as we reach 0, we have “the absence of” money.
Interval data do not have a meaningful 0; in other words, in interval data, 0 does not mean
“the absence of” but is simply another number. An example of interval data is the Fahren-
heit scale of temperature measurement, where 90 degrees is hotter than 70 degrees, which
is hotter than 0 degrees, but 0 degrees does not represent “the absence of” temperature—it’s
just another number on the scale.
Due to the “meaningful 0” difference between interval and ratio data, ratio data are
considered the most sophisticated form of data. This is because the meaningful zero allows
us to calculate fractions, proportions, and percentages—ratios reflecting the relationship
between values. However, we can perform all other arithmetic functions on both interval
and ratio data. In Chapter 3, you learned more about statistical tests such as hypothesis
ISTUDY
188 Chapter 4 Communicating Results and Visualizations
testing, regression, and correlation. We can run all of these tests and calculate the mean,
median, and standard deviation on interval and ratio data.
Quantitative data can be further categorized as either discrete or continuous data.
Discrete data are data that are represented by whole numbers. An example of discrete data
is points in a basketball game—you can earn 2 points, 3 points, or 157 points, but you can-
not earn 3.5 points. On the other hand, continuous data are data that can take on any value
within a range. An example of continuous data is height: you can be 4.7 feet, 5 feet, or
6.27345 feet. The difference between discrete and continuous data can be blurry sometimes
because you can express a discrete variable as continuous—for example, the number of chil-
dren a person can have is discrete (a woman can’t have 2.7 children, but she could have 2 or
3). However, if you are researching the average number of children that women aged 25–40
have in the United States, the average would be a continuous variable. Whether your data
are discrete or continuous can also help you determine the type of chart you create because
continuous data lend themselves more to a line chart than do discrete data.
Lab Connection
Lab 4-1, Lab 4-3, and Lab 4-4 have you create dashboards with visualizations
that present declarative data.
ISTUDY
Chapter 4 Communicating Results and Visualizations 189
On the other hand, you will sometimes use data visualizations to satisfy an exploratory
visualization purpose. When this is done, the lines between steps “P” (perform test plan),
“A” (address and refine results), and “C” (communicate results) are not as clearly divided.
Exploratory data visualization will align with performing the test plan within visualization
software—for example, Tableau—and gaining insights while you are interacting with the
data. Often the presenting of exploratory data will be done in an interactive setting, and
the answers to the questions from step “I” (identify the questions) won’t have already been
answered before working with the data in the visualization software.
Lab Connection
Lab 4-2 and Lab 4-5 have you create dashboards with visualizations that
present exploratory data.
Exhibit 4-5 is similar to the first four chart types presented to you in Exhibit 4-4, but Exhibit
4-5 has more detail to help you determine what to do once you’ve answered the first two
questions. Remember that the quadrant represents two main questions:
1. Are you explaining the results of the previously done analysis, or are you exploring the
data through the visualization? (Is your purpose declarative or exploratory?)
2. What type of information is being visualized (conceptual [qualitative] or data-driven
[quantitative])?
EXHIBIT 4-5
The Four Chart Types
Quadrant with Detail
Source: S. Berinato, Good
Charts: The HBR Guide to
Making Smarter, More Persua-
sive Data Visualizations (Bos-
ton: Harvard Business Review
Press, 2016).
Once you have determined the answers to the first two questions, you are ready to begin deter-
mining which type of visualization will be the most appropriate for your purpose and dataset.
In Chapter 1, you were introduced to the Gartner Magic Quadrant for Business Intelli-
gence and Analytics Platforms, and through the labs in the previous three chapters you have
worked with Microsoft products (Excel, Power Query, Power BI) and Tableau products
(Tableau Prep and Tableau Desktop). In Chapter 4 and the remaining chapters, you will
continue having the two tracks to learn how to use each tool, but when you enter your pro-
fessional career, you may need to make a choice about which tool to use for communicating
ISTUDY
190 Chapter 4 Communicating Results and Visualizations
your results. While both Microsoft and Tableau provide similar methods for analysis and
visualization, we offer a discussion on when you may prefer each solution.
Microsoft’s tools slightly outperform Tableau in their execution of the entire analytics
process. Tableau is a newer product and has placed the majority of its focus on data visu-
alization, while Microsoft Excel has a more robust platform for data analysis. If your data
analysis project is more declarative than exploratory, it is more likely that you will perform
your data visualization to communicate results in Excel, simply because it is likely that you
performed steps 2 through 4 in Excel, and it is convenient to create your charts in the same
tool that you performed your analysis.
Tableau Desktop and Microsoft Power BI earn high praise for being intuitive and easy to use,
which makes them ideal for exploratory data analysis. When your question isn’t fully defined
or specific, exploring your dataset in Tableau or Power BI and changing your visualization
type to discover different insights is as much a part of performing data analysis as crafting
your communication. While we recognize that you have already worked with Tableau in
previous labs, now that our focus has turned toward data visualization, we recommend
opening the Superstore Sample Workbook provided within Tableau Desktop to explore dif-
ferent types of visualizations. You will find the Superstore Sample Workbook at the bottom
of the start screen in Tableau Desktop under “Sample workbooks” (Exhibit 4-6).
EXHIBIT 4-6
Tableau Software, Inc. All
rights reserved.
ISTUDY
Chapter 4 Communicating Results and Visualizations 191
Once you open the workbook, you will see a variety of tabs at the bottom of the work-
book that you can page through and see different ways that the same dataset can be ana-
lyzed and visualized. When you perform exploratory analysis in Tableau, or even if you have
already performed your analysis and you have uploaded the dataset into Tableau to com-
municate insights, we recommend trying several different types of charts to see which one
makes your insights stand out the most effectively. In the top-right corner of the Tableau
workbook, you will see the Show Me window, which provides different options for visual-
izing your dataset (Exhibit 4-7).
EXHIBIT 4-7
Tableau Software, Inc. All
rights reserved.
In the Show Me tab, only the visualizations that will work for your particular dataset will
appear in full color.
For more information on using Tableau, see Appendix I.
ISTUDY
192 Chapter 4 Communicating Results and Visualizations
PROGRESS CHECK
1. What are two ways that complicated concepts were explained to you via catego-
rization and data visualization as you were growing up?
2. Using the Internet or other resources (other textbooks, a newspaper, or a magazine),
identify an example of a data visualization for each possible quadrant.
3. Identify which type of data scale the following variables are measured on
(qualitative nominal, qualitative ordinal, or quantitative):
a. Instructor evaluations in which students select excellent, good, average, or poor.
b. Weekly closing price of gold throughout a year.
c. Names of companies listed on the Dow Jones Industrial Average.
d. Fahrenheit scale for measuring temperature.
ISTUDY
Chapter 4 Communicating Results and Visualizations 193
30%
EXHIBIT 4-8
Imperial Pie Charts and Bar
25%
Stout (or Column) Chart
20%
IPA Show Different Ways to
15% Visualize Proportions
Stout 10%
Pale Ale 5%
Wheat 0%
Imperial IPA Stout Pale Ale Wheat Imperial
Imperial Stout IPA
IPA
7%
EXHIBIT 4-9
Imperial Stout Pie Chart Showing
11% 28% Proportion
IPA
Stout
12%
Pale Ale
17% 26% Wheat
Imperial IPA
120 80%
110
70%
100
% of Total Number of Records
90 60%
Number of Records
80
50%
70
60 40%
50
30%
Product Description
40
Imperial Stout
20%
30 IPA
Stout
20
10% Pale Ale
10 Wheat
Imperial IPA
0%
0
ISTUDY
194 Chapter 4 Communicating Results and Visualizations
shows the proportion of each type of beer sold expressed in the number of beers sold for
each product, while the latter shows the proportion expressed in terms of percentage of the
whole in a 100 percent stacked bar chart.
While bar charts and pie charts are among the most common charts used for qualitative
data, there are several other charts that function well for showing proportions:
• Tree maps and heat maps. These are similar types of visualizations, and they both use
size and color to show proportional size of values. While tree maps show proportions
using physical space, heat maps use color to highlight the scale of the values. However,
both are heavily visual, so they are imperfect for situations where precision of the num-
bers or proportions represented is necessary.
• Symbol maps. Symbol maps are geographic maps, so they should be used when express-
ing qualitative data proportions across geographic areas such as states or countries.
• Word clouds. If you are working with text data instead of categorical data, you can repre-
sent them in a word cloud. Word clouds are formed by counting the frequency of each
word mentioned in a dataset; the higher the frequency (proportion) of a given word, the
larger and bolder the font will be for that word in the word cloud. Consider analyzing
the results of an open-ended response question on a survey; a word cloud would be a
great way to quickly spot the most commonly used words to tell if there is a positive or
negative feeling toward what’s being surveyed. There are also settings that you can put
into place when creating the word cloud to leave out the most commonly used English
words—such as the, an, and a—in order to not skew the data. Exhibit 4-11 is an example
of a word cloud for the text of Chapter 2 from this textbook.
EXHIBIT 4-11
Word Cloud Example
from Chapter 2 Text
ISTUDY
Chapter 4 Communicating Results and Visualizations 195
• Box and whisker plots. Useful for when quartiles, medians, and outliers are required for
analysis and insights.
• Scatter plots. Useful for identifying the correlation between two variables or for identify-
ing a trend line or line of best fit.
• Filled geographic maps. As opposed to symbol maps, a filled geographic map is used
to illustrate data ranges for quantitative data across different geographic areas such as
states or countries.
A summary of the chart types just described appears in Exhibit 4-12. Each chart option
works equally well for exploratory and declarative data visualizations. The chart types are
categorized based on when they will be best used (e.g., when comparing qualitative vari-
ables, a bar chart is an optimal choice), but this figure shouldn’t be used to stifle creativity—
bar charts can also be used to show comparisons among quantitative variables, just as many
of the charts in the listed categories can work well with other data types and purposes than
their primary categorization below.
Conceptual Data-Driven
EXHIBIT 4-12
(Qualitative) (Quantitative) Summary of Chart
Types
As with selecting and refining your analytical model, communicating results is more
art than science. Once you are familiar with the tools that are available, your goal should
always be to share critical information with stakeholders in a clear, concise manner. While
visualizations can be incredibly impactful, they can become a distraction if you’re not care-
ful. For example, bar charts can be manipulated to show a bias and, while novel, 3D graphs
are incredibly deceptive because they may distort the scale even if the numbers are fine.
ISTUDY
196 Chapter 4 Communicating Results and Visualizations
First
estimate 0.60%
If we reworked the data points to show the correct scale (starting at 0 instead of 0.55) and
the change over time (plotting the data along the horizontal axis), we’d see something like
Exhibit 4-14. If we wanted to emphasize growth, we might choose a chart like Exhibit 4-15.
Notice that both new graphs show an increase that is less dramatic and confusing.
0.5%
0.4%
0.3%
0.2%
0.1%
0%
First Second
estimate estimate
0.5
0.4
Old
0.3
estimate
0.2
0.1
ISTUDY
Chapter 4 Communicating Results and Visualizations 197
Our next example of a problematic method of data visualization is in Exhibit 4-16. The
data represented come from a study assessing cybersecurity attacks, and this chart in par-
ticular attempted to describe the number of cybersecurity attacks employees fell victim to,
as well as what their role was in their organization.
Assess the chart provided in Exhibit 4-16. Is a pie chart really the best way to present
these data?
There are simply too many slices of pie, and the key referencing the job role of each user
is unclear. There are a few ways we could improve this chart.
If you want to emphasize users, consider a rank-ordered bar chart like Exhibit 4-17. To
emphasize the category, a comparison like that in Exhibit 4-18 may be helpful. Or to show
proportion, maybe use a stacked bar (Exhibit 4-19).
Sloane; 2
Rose; 3 Bryan; 1
Molly; 1 Efrain; 3
Assistant
Marisol; 2
Jan; 3
Researcher
Luna; 3 Kinley; 2
Lia; 3 Leroy; 1 Administrator
9
Assistant
EXHIBIT 4-17
8 More Clear Rank-
Researcher
7 Ordered Bar Chart
Administrator
6
Attacks
5
4
3
2
1
0
Azul
Alanna
Efrain
Jan
Lia
Luna
Rose
Yareli
Kinley
Marisol
Sloane
Sophie
Steve
Zachery
Ann
Aubrie
Bryan
Leroy
Molly
Willie
ISTUDY
198 Chapter 4 Communicating Results and Visualizations
EXHIBIT 4-18 30
Bar Chart Emphasizing
Attacks by Job 25
Function
20
Attacks
15
10
0
Assistant Researcher Administrator
EXHIBIT 4-19 50
Stacked Bar Chart 45
Emphasizing
Proportion of Attacks 40
by Job Function 35 Assistant
30
Attacks
25
20
15 Researcher
10
5 Administrator
0
PROGRESS CHECK
4. The following two charts represent the exact same data—the quantity of beer
sold on each day in the Sláinte Sales Subset dataset. Which chart is more appro-
priate for working with dates, the column chart or the line chart? Which do you
prefer? Why?
a.
ISTUDY
Chapter 4 Communicating Results and Visualizations 199
b.
5. The same dataset was consolidated into quarters. This chart was made with the
chart wizard feature in Excel, which made the creation of it easy, but something
went wrong. Can you identify what went wrong with this chart?
6. The following four charts represent the exact same data quantity of each beer
sold. Which do you prefer, the line chart or the column chart? Whichever you
chose, line or column, which of the pair do you think is the easiest to digest?
a.
ISTUDY
200 Chapter 4 Communicating Results and Visualizations
b.
c.
d.
ISTUDY
Chapter 4 Communicating Results and Visualizations 201
Color
Similar to how Excel and Tableau have become stronger tools at picking appropriate data
scales and increments, both Excel and Tableau will have default color themes when you
begin creating your data visualizations. You may choose to customize the theme. However,
if you do, here are a few points to consider:
• When should you use multiple colors? Using multiple colors to differentiate types
of data is effective. Using a different color to highlight a focal point is also effective.
However, don’t use multiple colors to represent the same type of data. Be careful to not
use color to make the chart look pretty—the point of the visualization is to showcase
insights from your data, not to make art.
• We are trained to understand the differences among red, yellow, and green, with red
meaning something negative that we would want to “stop” and green being something
positive that we would want to “continue,” just like with traffic lights. For that reason,
use red and green only for those purposes. Using red to show something positive or
green to show something negative is counterintuitive and will make your chart harder to
understand. You may also want to consider a color-blind audience. If you are concerned
that someone reading your visuals may be color blind, avoid a red/green scale and
consider using orange/blue. Tableau has begun defaulting to orange/blue color scales
instead of red/green for this reason.
ISTUDY
202 Chapter 4 Communicating Results and Visualizations
• Once your chart has been created, convert it to grayscale to ensure that the contrast still
exists—this is both to ensure your color-blind audience can interpret your visuals and
also to ensure that the contrast, in general, is stark enough with the color palette you
have chosen.
PROGRESS CHECK
7. Often, external consultants will use a firm’s color scheme for a data visualization
or will use a firm’s logo for points on a scatter plot. While this might be a great
approach to support a corporate culture, it is often not the most effective way to
create a chart. Why would these methods harm a chart’s effectiveness?
5
Justin Zobel, Writing for Computer Science (Singapore: Springer-Verlag, 1997).
ISTUDY
Chapter 4 Communicating Results and Visualizations 203
but including an overview of the type of analysis performed and any limitations that
you encountered will be important to include.
C: If you are including a data visualization with your write-up, you need to explain
how to use the visual. If there are certain aspects that you expect to stand out from
the analysis and the accompanying visual, you should describe what those compo-
nents are—the visual should speak for itself, but the write-up can provide confirma-
tion that the important pieces are gleaned.
T: Discuss what’s next in your analysis. Will the visual or the report result in a week-
ly or quarterly report? What trends or outliers should be paid attention to over time?
ISTUDY
204 Chapter 4 Communicating Results and Visualizations
style, and what they need from the communication—and provide it, via the right message,
the right tone, and the right vehicle.
Revising
Just as you addressed and refined your results in the fourth step of the IMPACT model,
you should refine your writing. Until you get plenty of practice (and even once you con-
sider yourself an expert), you should ask other people to read through your writing to make
sure that you are communicating clearly. Justin Zobel suggests that revising your writing
requires you to “be egoless—ready to dislike anything you have previously written. . . . If
someone dislikes something you have written, remember that it is the readers you need to
please, not yourself.”6 Always placing your audience as the focus of your writing will help
you maintain an appropriate tone, provide the right content, and avoid too much detail.
PROGRESS CHECK
Progress Checks 5 and 6 display different charts depicting the quantity of beer sold on
each day in the Sláinte Sales Subset dataset. If you had created those visuals, starting
with the data request form and the ETL process all the way through data analysis, how
would you tailor the written report for the following two roles?
8. For the CEO of the brewery who is interested in how well the different products
are performing.
9. For the programmers who will be in charge of creating a report that contains the
same information that needs to be sent to the CEO on a monthly basis.
6
Justin Zobel.
Summary
■ This chapter focused on the fifth step of the IMPACT model, or the “C,” on how to com-
municate the results of your data analysis projects. Communication can be done through
a variety of data visualizations and written reports, depending on your audience and the
data you are exhibiting. (LO 4-1)
■ In order to select the right chart, you must first determine the purpose of your data visu-
alization. This can be done by answering two key questions:
◦ Are you explaining the results of a previously done analysis, or are you exploring the
data through the visualization? (Is your purpose declarative or exploratory?)
◦ What type of data are being visualized (conceptual [qualitative] data or data-driven
[quantitative] data)? (LO 4-3)
■ The differences between each type of data (declarative and exploratory, qualitative and
quantitative) are explained, as well as how each data type affects both the tool you’re
likely to use (generally either Excel or Tableau) and the chart you should create. (LO 4-3)
■ After selecting the right chart based on your purpose and data type, your chart will need
to be further refined. Selecting the appropriate data scale, scale increments, and color for
your visualization is explained through the answers to the following questions: (LO 4-4)
◦ How much data do you need to share in the visual to avoid being misleading, yet also
avoid being distracting?
ISTUDY
◦ If your data contain outliers, should they be displayed, or will they distort your scale
to the extent that you can leave them out?
◦ Other than how much data you need to share, what scale should you place those data on?
◦ Do you need to provide context or reference points to make the scale meaningful?
◦ When should you use multiple colors?
■ Finally, this chapter discusses how to provide a written report to describe your data analysis
project. Each step of the IMPACT model should be communicated in your write-up, and the
report should be tailored to the specific audience to whom it is being delivered. (LO 4-5)
Key Words
continuous data (188) One way to categorize quantitative data, as opposed to discrete data. Continuous
data can take on any value within a range. An example of continuous data is height.
declarative visualizations (188) Made when the aim of your project is to “declare” or present your
findings to an audience. Charts that are declarative are typically made after the data analysis has been
completed and are meant to exhibit what was found in the analysis steps.
discrete data (188) One way to categorize quantitative data, as opposed to continuous data. Discrete
data are represented by whole numbers. An example of discrete data is points in a basketball game.
exploratory visualizations (189) Made when the lines between steps “P” (perform test plan), “A”
(address and refine results), and “C” (communicate results) are not as clearly divided as they are in a
declarative visualization project. Often when you are exploring the data with visualizations, you are per-
forming the test plan directly in visualization software such as Tableau instead of creating the chart after
the analysis has been done.
interval data (187) The third most sophisticated type of data on the scale of nominal, ordinal, interval,
and ratio; a type of quantitative data. Interval data can be counted and grouped like qualitative data, and
the differences between each data point are meaningful. However, interval data do not have a meaningful
0. In interval data, 0 does not mean “the absence of” but is simply another number. An example of interval
data is the Fahrenheit scale of temperature measurement.
nominal data (187) The least sophisticated type of data on the scale of nominal, ordinal, interval, and
ratio; a type of qualitative data. The only thing you can do with nominal data is count, group, and take a
proportion. Examples of nominal data are hair color, gender, and ethnic groups.
normal distribution (188) A type of distribution in which the median, mean, and mode are all equal,
so half of all the observations fall below the mean and the other half fall above the mean. This phenom-
enon is naturally occurring in many datasets in our world, such as SAT scores and heights and weights of
newborn babies. When datasets follow a normal distribution, they can be standardized and compared for
easier analysis.
ordinal data (187) The second most sophisticated type of data on the scale of nominal, ordinal, interval,
and ratio; a type of qualitative data. Ordinal can be counted and categorized like nominal data and the cat-
egories can also be ranked. Examples of ordinal data include gold, silver, and bronze medals.
proportion (187) The primary statistic used with quantitative data. Proportion is calculated by counting the
number of items in a particular category, then dividing that number by the total number of observations.
qualitative data (186) Categorical data. All you can do with these data is count and group, and in some
cases, you can rank the data. Qualitative data can be further defined in two ways: nominal data and ordinal
data. There are not as many options for charting qualitative data because they are not as sophisticated as
quantitative data.
quantitative data (187) More complex than qualitative data. Quantitative data can be further defined
in two ways: interval and ratio. In all quantitative data, the intervals between data points are meaningful,
allowing the data to be not just counted, grouped, and ranked, but also to have more complex operations
performed on them such as mean, median, and standard deviation.
ratio data (187) The most sophisticated type of data on the scale of nominal, ordinal, interval, and
ratio; a type of quantitative data. They can be counted and grouped just like qualitative data, and the
ISTUDY
differences between each data point are meaningful like with interval data. Additionally, ratio data have a
meaningful 0. In other words, once a dataset approaches 0, 0 means “the absence of.” An example of ratio
data is currency.
standard normal distribution (188) A special case of the normal distribution used for standardizing
data. The standard normal distribution has 0 for its mean (and thus, for its mode and median, as well),
and 1 for its standard deviation.
standardization (188) The method used for comparing two datasets that follow the normal distribution.
By using a formula, every normal distribution can be transformed into the standard normal distribution. If
you standardize both datasets, you can place both distributions on the same chart and more swiftly come
to your insights.
206
ISTUDY
Multiple Choice Questions
®
ISTUDY
9. (LO 4-3) Exhibit 4-12 gives chart suggestions for what data you’d like to portray. Those
options include all of the following except:
a. relationship between variables.
b. geographic data.
c. outlier detection.
d. normal distribution curves.
10. (LO 4-3) What is the most appropriate chart when showing a relationship between two
variables (according to Exhibit 4-12)?
a. Scatter chart
b. Bar chart
c. Pie graph
d. Histogram
1. (LO 4-2) Explain Exhibit 4-4 and why these four dimensions are helpful in describing
information to be communicated. Exhibit 4-4 lists conceptual and data-driven as being
on two ends of the continuum. Does that make sense, or can you think of a better way
to organize and differentiate the different chart types?
2. (LO 4-3) According to Exhibit 4-12, which is the best chart for showing a distribution of
a single variable, like height? How about hair color? Major in college?
3. (LO 4-3) Box and whisker plots (or box plots) are particularly adept at showing extreme
observations and outliers. In what situations would it be important to communicate these
data to a reader? Any particular accounts on the balance sheet or income statement?
4. (LO 4-3) Based on the data from datavizcatalogue.com, a line graph is best at showing
trends, relationships, compositions, or distributions?
5. (LO 4-3) Based on the data from datavizcatalogue.com, what are some major flaws of
using word clouds to communicate the frequency of words in a document?
6. (LO 4-3) Based on the data from datavizcatalogue.com, how does a box and whisker
plot show if the data are symmetrical?
7. (LO 4-3) What would be the best chart to use to illustrate earnings per share for one
company over the past 5 years?
8. (LO 4-3) The text mentions, “If your data analysis project is more declarative than
exploratory, it is more likely that you will perform your data visualization to communi-
cate results in Excel.” In your opinion, why is this true?
9. (LO 4-3) According to the text and your own experience, why is Tableau ideal for explor-
atory data analysis?
Problems
®
1. (LO 4-3) Match the chart type to whether it is used primarily to communicate qualitative
or quantitative results:
Chart Type Quantitative or Qualitative?
1. Pie chart
2. Box and whisker plot
3. Word cloud
4. Symbol map
5. Scatter plot
6. Line chart
208
ISTUDY
2. (LO 4-3) Match the desired visualization for quantitative data to the following chart
types:
• Line charts
• Bar charts
• Box and whisker plots
• Scatter plots
• Filled geographic maps
5. Data trends for net income over the past eight quarters
3. (LO 4-2) Match the data examples to one of the following data types:
• Interval data
• Nominal data
• Ordinal data
• Ratio data
• Structured data
• Unstructured data
1. GMAT Score
2. Total Sales
7. Income Statement
9. Blogs
4. (LO 4-2) Match the definition to one of the following data terms:
• Declarative visualization
• Exploratory visualization
• Interval data
• Nominal data
ISTUDY
• Ordinal data
• Ratio data
1. M
ethod used to communicate the results after the data analy-
sis has been completed
4. N
umerical data with an equal and definitive ratio between each
data point and the value of 0 means “the absence of”
5. V
isualization used to determine best method of analysis, usu-
ally without predefined statistical models
5. (LO 4-2, LO 4-3) Identify the order sequence from least sophisticated (1) to most sophis-
ticated (4) data type.
1. Interval data
2. Ordinal data
3. Nominal data
4. Ratio data
6. (LO 4-1) Analysis: Why was the heat map associated with the opening vignette regard-
ing the 1854 cholera epidemic effective? Now that we have more sophisticated tools
and methods for visualizing data, what else could have been used to communicate this,
and would it have been more or less effective in your opinion?
7. (LO 4-1) Analysis: Evaluate the use of color in the graphic associated with the opening
vignette regarding drug overdose deaths across America. Would you consider its use
effective or ineffective? Why? How is this more or less effective than communicating the
same data in a bar chart?
8. (LO 4-3) According to Exhibit 4-12 and related chapter discussion, which is the best
chart for comparisons of earnings per share over many periods? How about for only a
few periods?
• What is the best chart category? Conceptual or data-driven?
• What is the best chart subcategory?
• What is the best chart type?
9. (LO 4-3) According to Exhibit 4-12 and related chapter discussion, which is the best
chart for static composition of a data item of the Accounts Receivable balance at the
end of the year? Which is best for showing a change in composition of Accounts Receiv-
able over two or more periods?
• What is the best chart category? Conceptual or data-driven?
• What is the best chart subcategory?
• What is the best chart type?
210
ISTUDY
10. (LO 4-3, LO 4-4) The Big Four accounting firms (Deloitte, EY, KPMG, and PwC) dominate
the audit and tax market in the United States. What chart would you use to show which
accounting firm dominates in each state in terms of audit revenues?
1. Area chart
2. Line chart
3. Column chart
4. Histogram
5. Bubble chart
8. Pie chart
9. Waterfall chart
11. (LO 4-3, LO 4-4) Datavizcatalogue.com lists seven types of maps in its listing of charts.
Which one would you use to assess geographic customer concentration by number?
Analysis: How could you show if some customers buy more than other customers on
such a map? Would you use the same chart or a different one?
1. Tree map
2. Choropleth map
3. Flow map
4. Connection map
5. Bubble map
6. Heat map
7. Dot map
12. (LO 4-4) Analysis: In your opinion, is the primary reason that analysts use inappropri-
ate scales for their charts due to an error related to naiveté (or ineffective training),
or are the inappropriate scales used so the analyst can sway the audience one way or
the other?
ISTUDY
LABS ®
212
ISTUDY
Tableau | Desktop
ISTUDY
Rev. Confirming Pages
5. Each sales order line item shows the individual quantity ordered and the
product sale price per item. Before you analyze the sales data, you will need
to calculate each line subtotal:
a. Click Data in the toolbar on the left to show your data tables.
b. Click the Sales_Order_Lines table in the list on the right.
c. In the Home tab, click New Column.
d. For the column formula, enter: Line _Subtotal = Sales_Order_
Lines[Product_Sale_Price] * Sales_Order_Lines[Sales_Order_
Quantity_Sold].
e. Click Report in the toolbar on the left to return to your report.
6. Create a visualization showing the total sales revenue by product name
(descriptive analytics):
a. In the Visualizations pane, click Stacked Bar Chart.
b. Drag Finished_Goods_Product.Product_Description to the Y-axis box.
c. Drag Sales_Order_Lines.Line_Subtotal to the X-axis box.
d. Resize the right side of the chart so it fills half of the page.
e. Click the Format visual (paintbrush) button, and click General > Title >
On.
f. Name the chart Sales Revenue by Product.
g. Take a screenshot (label it 4-1MA).
7. Create a visualization showing the total sales revenue by state (descriptive
analytics):
a. Click anywhere on the page outside of your first visualization, then in
the Visualization pane, click Filled Map.
b. Drag Customer_Master_Listing.Customer_State to the Location box.
c. Drag Sales_Order_Lines.Line_Subtotal to the Tooltips box.
d. Click the Format visual (paintbrush) button, and click Fill Colors.
e. Below Default Color, click the fx button.
f. Under Based on Field, choose Sales_Order_Lines.Line_Subtotal and
click OK.
g. Click the Format visual (paintbrush) button, and click General > Title > On.
h. Name the chart Sales Revenue by State.
i. Take a screenshot (label it 4-1MB).
8. After you answer the lab questions, save your file as Lab 4-1 Slainte Sales
Dashboard.pbix, and continue to Lab 4-1 Part 2.
Tableau | Desktop
214
ISTUDY
Rev. Confirming Pages
AQ2. What other visualizations might you use to describe sales revenue?
AQ3. What other ways might you slice sales revenue to help management understand
performance?
1. Open your Lab 4-1 Slainte Sales Dashboard.pbix file from Lab 4-1 Part 1 if
you closed it.
2. Create a visualization showing the total sales volume by month in a table:
a. Click anywhere on the page outside of your first visualization; then in
the Visualizations pane, click Matrix and drag the object to the top-right
corner of the page. Resize it by dragging the bottom to match the chart
next to it, if needed.
b. Drag Finished_Goods_Product.Product_Description to the Rows box.
c. Drag Sales_Order.Sales_Order_Date to the Columns box.
1. Click the X next to Quarter and Day to remove them from the date
heirarchy.
d. Drag Sales_Order_Lines.Sales_Order_Quantity_Sold to the Values box.
e. In the toolbar below your matrix, click Expand all down one level in the
hierarchy button (the button looks like an upside-down fork) to reveal
the sales by month.
f. Click the Format Visual (paintbrush) button and click Cell elements.
g. Turn on Data bars. The values in the table will now show proportional
bars representing the data values.
h. In the toolbar below your matrix, click More Options (. . .) and click
Sort by > Sales_Order_Quantity_Sold.
i. Click the Format Visual (paintbrush) button, and click General > Title
> On.
j. Name the chart Sales Volume by Month.
k. Take a screenshot (label it 4-1MC).
3. Create a visualization showing the total sales volume by target:
a. Click anywhere on the page outside of your first visualization; then in
the Visualization pane, click Gauge. Drag it to the bottom-right corner
of your page and resize it to fill the corner.
b. Drag Sales_Order_Lines.Sales_Order_Quantity_Sold to the Value box.
c. To add a target measure, click Home > New Measure.
1. In the formula box, type: Sales_Volume_Target = 10000 and press
Enter. The new Sales_Volume_Target measure will appear in the list
of attributes in the Customer table, though the exact location does
not matter.
216
Tableau | Desktop
1. Open your Lab 4-1 Slainte Sales Dashboard.twb file from Lab 4-1 Part 1 if
you closed it.
2. Create a visualization showing the total sales revenue by product name:
a. Click Worksheet > New Worksheet.
b. Drag Finished_Goods_Product.Product Description to the rows shelf.
c. Drag Sales_Order_Lines.Sales Order Date to the columns shelf. Click
the + next to the YEAR(Sales Order Date) pill to expand the date to
show quarter and month. Then remove the Quarter pill.
d. Drag Sales_Order_Lines.Sales Order Quantity Sold to the Columns
shelf.
e. Drag Sales_Order_Lines.Sales Order Quantity Sold to the Label button
in the Marks pane.
f. To show grand totals, click the Analytics tab in the panel on the left.
g. Drag Totals to your table and choose Column Grand Totals.
h. In the toolbar, click the Sort Descending button to show the products by
sales volume with the largest volume at the top of the chart.
i. Right-click the Sheet 3 tab and rename it “Sales Volume by Month”.
j. Take a screenshot (label it 4-1TC).
3. Create a visualization showing the total sales volume by target:
a. Click Worksheet > New Worksheet.
b. Drag Sales_Order_Lines.Sales Order Quantity Sold to the Columns
shelf.
c. To add a target measure, click the down arrow above the Tables pane on
the left in the Data tab and choose Create Parameter. . .
d. Name the parameter Sales Volume Target, set the Current value to
10000, and click OK.
Lab 4-2 P
erform Exploratory Analysis and Create
Dashboards—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As you are analyzing the Sláinte brewery data, your supervisor has asked
you to compile a series of visualizations that will help them explore the sales information
that have taken place in the past several months of the year to help predict the sales and
relationships with customers.
218
ISTUDY
When working with a data analysis project that is exploratory in nature, the data can be
set up with simple visuals that allow you to drill down and see more granular data.
Data: Lab 4-2 Slainte Dataset.zip - 106KB Zip / 114KB Excel
Tableau | Desktop
ISTUDY
Rev. Confirming Pages
220
e. Click the Analytics (magnifying glass) button, and click Forecast > On.
1. Click Options set the Forecast length to 3, and click Apply.
f. Click the Format visual (paintbrush) button, and click General > Title.
g. Name the chart Sales Revenue Forecast.
h. Drag Finished_Goods_Products.Product_Description to the Filters on
this page box.
i. Uncheck Select all and check Imperial Stout.
j. Take a screenshot (label it 4-2MB).
8. After you answer the lab questions, you may close Power BI Desktop. Save
your worksheet as Lab 4-2 Slainte Sales Explore.pbix.
Tableau | Desktop
222
ISTUDY
Lab 4-3 Create Dashboards—LendingClub
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: To better understand loans and the attributes of people borrowing money,
your manager has asked you to provide some examples of different characteristics related to
the loan amount. For example, are more loan dollars given to home owners or renters?
Data: Lab 4-3 Lending Club Transform.zip - 29MB Zip / 26MB Excel / 6MB Tableau
Tableau | Desktop
ISTUDY
Lab 4-3 Part 1 Summarize Loan Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 4-3 [Your name] [Your email address].docx.
This dashboard will focus on exploring the various attributes that underlie the borrow-
ers. It will combine a series of facts and summarized statistics with several visualizations
that break down the loan information by different dimensions.
224
ISTUDY
e. Take a screenshot (label it 4-3MA) of the four cards along the top of
the page.
7. After you answer the lab questions, save your worksheet as Lab 4-3 Lending
Club Dashboard.pbix, and continue to Lab 4-3 Part 2.
Tableau | Desktop
ISTUDY
e. Drag each of the four sheets to the dashboard so they appear in a line
along the top from left to right. Hint: Position them each on the far right
side of the previous one:
1. Total Loan Amount, Median Loan Amount, Median Interest Rate,
Median DTI
2. Resize the sheets so they fill the top row evenly. Hint: Click each
sheet and change the Standard drop-down in the toolbar to Fit
Width. You can also click the drop-down arrow on the dark toolbar
that appears on the edge of each sheet to do the same thing.
f. Take a screenshot (label it 4-3TA) of the four cards along the top of
the page.
7. After you answer the lab questions, save your worksheet as Lab 4-3 Lending
Club Dashboard.twb, and continue to Lab 4-3 Part 2.
1. Open your Lab 4-3 Lending Club Dashboard.pbix file from Lab 4-3 Part 1 if
you closed it.
2. Create a visualization showing the loan amount by term:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click 100% Stacked Bar Chart. Drag the object to
226
ISTUDY
Rev. Confirming Pages
228
ISTUDY
c. Drag Loan Amount by Term to the bottom of the page and change it
from Standard to Fit Width in the toolbar.
d. Drag Loan Amount by Ownership to the right of the Loan Amount by
Term pane.
e. Drag Loan Amount by Month to the right of the Loan Amount by
Ownership pane.
f. Drag Loan Amount by DTI to the right of the Loan Amount by Month pane.
g. Adjust your dashboard to fit everything.
h. Take a screenshot (label it 4-3TB).
7. After you answer the lab questions, you may close Tableau Desktop. Save
your workbook as 4-3 Lending Club Dashboard.twb.
ISTUDY
Rev. Confirming Pages
Tableau | Desktop
230
231
ISTUDY
Rev. Confirming Pages
232
c. For the new map visual, adjust the Fill Colors to be based on an average
of TRAN_AMT instead of a Sum:
1. Click the map visual and click the Format visual (paintbrush) button.
2. Go to Visual > Fill Colors > Color.
3. Click the Fx button.
4. Under Summarization, select Average and click OK.
5. General > Title > Text > Average Revenue by State
12. Take a screenshot (label it 4-4MB).
13. After you answer the lab questions, save your work as Lab 4-4 Dillard’s Sales
.pbix and continue to Lab 4-4 Part 2.
Tableau | Desktop
233
234
ISTUDY
Rev. Confirming Pages
1. Open your Lab 4-4 Dillards Sales.pbix file from Lab 4-4 Part 1 if you
closed it.
2. Create a second page to place your Customer State visuals on.
3. Create the following four visuals (review the detailed steps in Part 1 if
necessary):
a. Bar Chart: Sum of Online Transactions
1. Y-axis: Customer State
2. X-axis: TRAN_AMT
3. Filter: STORE is 698
b. Filled Map: Sum of Online Transactions
c. Bar Chart: Average of Online Transactions
d. Filled Map: Average of Online Transactions
4. Take a screenshot (label it 4-4MC).
5. Put it all together! To answer the questions and derive insights, you may
want to create a new dashboard with all of the visualizations on the same
page.
6. After you answer the lab questions, you may close Power BI. Save your work-
book as Lab 4-4 Dillard’s Sales.pbix.
235
1. Open your Lab 4-4 Dillard’s Sales.twb file from Lab 4-4 Part 1 if you closed it.
2. Create the following four visuals, including the proper naming of each sheet
(review the detailed steps in Part 1 if necessary):
a. Bar Chart: Sum of Online Transactions
1. Rows: TRAN_AMT
2. Columns: Customer State
3. Filter: De-select (All) and select only 698
b. Filled Map: Sum of Online Transactions
c. Bar Chart: Average of Online Transactions
d. Filled Map: Average of Online Transactions
3. Add each visualization to a dashboard.
4. Take a screenshot of your dashboard (label it 4-4TB).
5. After you answer the lab questions, you may close Tableau. Save your work-
book as 4-4 Dillard’s Sales.twb.
Lab 4-5 C
omprehensive Case: Visualize Exploratory
Data—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
236
ISTUDY
Case Summary: In Lab 4-4 you discovered that Texas has the highest sum of transactions
for both in-person and online sales in a subset of the data, and you would like to explore
these sales more to determine if the performance is the same across all of the stores in
Texas and across the different departments.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.
Tableau | Desktop
Lab 4-5 Part 1 Identify the Texas Store with the Highest
Revenue
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 4-5 [Your name] [Your email address].docx.
237
ISTUDY
Rev. Confirming Pages
In this first part of the lab, we will look at revenue across Texas cities and then drill down
to the specific store location that has the highest revenue (sum of transaction amount).
While in the previous lab we needed to limit our sample to just 5 days because of how large
the TRANSACT table is, in this lab we will analyze all of the transactions that took place
in stores in Texas.
238
Tableau | Desktop
239
1. Open your Lab 4-5 Dillard’s Exploratory Analysis file from Part 1 if you
closed it.
2. Copy your Bar Chart and Paste it to the right of your first visualization.
3. Right-click the bar associated with Dallas 716 and select Include to filter the
data to only that store.
4. The filter for the Dallas 716 store is active now, so you can remove City and
Store from the Axis—this will clear up space for you to be able to view the
department details.
5. In the fields list, drag DEPARTMENT.DEPTDEC_DESC on top of
DEPARTMENT.DEPTCENT_DESC to build a hierarchy.
6. Rename the hierarchy DEPT_HIERARCHY, then add a third-level field by
dragging and dropping the DEPARMENT.DEPT_DESC field on top of your
new hierarchy.
7. Explore the Department Hierarchy:
a. Drag DEPT_HIERARCHY to the Axis box and remove the LOCATION
fields. This provides a summary of how each Department Century has
performed.
b. If you expand the entire hierarchy (upside-down fork button), you will
see a sorted list of all departments, but they will not be grouped by cen-
tury. To see the department revenue by century, you can drill down from
each century directly:
1. In the visualization toolbar, click the Drill Up arrow to return to the
top level.
2. Right-click the bar for Ready-to-Wear and select Drill Down to see the
decades within the Ready-to-Wear century.
3. Right-click the bar for Career and select Drill Down to see the depart-
ments within the career decade.
c. Click the Format visual (paintbrush) button in the Visualizations pane to
add titles:
1. Visual > Y-axis > Title > Department
2. Visual > X-axis > Title > Sales Revenue
3. General > Title > Text > Sales Revenue by Department
8. Take a screenshot (label it 4-5MB).
240
Tableau | Desktop
1. Open your Lab 4-5 Dillard’s Exploratory Analysis file from Part 1 if you
closed it.
2. Right-click your Total Sales by Store tab and click Duplicate.
3. Rename the new sheet Total Sales by Department.
4. Right-click the bar associated with Dallas 716 and select Keep Only to filter
the data to only that store.
5. The filter for the Dallas 716 store is active now, so you can remove City and
Store from the Rows—this will clear up space for you to be able to view the
department details.
6. In the fields list, drag DEPARTMENT.Deptdec Desc on top of D EPARTMENT
.Deptcent Desc to build a hierarchy. Rename the hierarchy to Department
Hierarchy. Add the third level of the hierarchy, Dept Desc, by dragging the
DEPARTMENT.Dept Desc field beneath Deptdec Desc in the Department
Hierarchy so that it shows Deptcent Desc, Deptdec Desc, and Dept Desc in
that order.
7. Explore the Department Hierarchy:
a. Drag Department Hierarchy to the Rows shelf. This provides a summary
of how each Department Century has performed.
b. Click the plus sign on Deptcent Desc to drill down and show Deptdec Desc.
and sort in descending order.
8. Take a screenshot (label it 4-5TB).
9. Continue exploring the data by drilling up and down the hierarchy. When you
want to collapse your hierarchy back to Decade or Century, click the minus
sign on the pills, then click the plus sign to expand back down into the details.
10. Create a new dashboard and drag the Total Sales by Store and Total Sales by
Department to your dashboard.
11. After you answer the lab questions, you may close Tableau. Save your work as
Lab 4-5 Dillard’s Exploratory Analysis.
241
ISTUDY
Lab 4-5 Part 2 Analysis Questions (LO 4-3, 4-4)
AQ1. This lab is a starting point for exploring revenue across stores and departments
at Dillard’s. What next steps would you take to further explore these data?
AQ2. Some of the department centuries and decades are not readily easy to understand
if we’re not Dillard’s employees. Which attributes would you like to learn more
about?
AQ3. In this lab we used relatively simple bar charts to perform the analysis. What
other visualizations would be interesting to use to explore these data?
242
ISTUDY
ISTUDY
Chapter 5
The Modern Accounting
Environment
A Look Back
Chapter 4 completed our discussion of the IMPACT model by explaining how to communicate your results through
data visualization and through written reports. We discussed how to choose the best chart for your dataset and your
purpose. We also helped you learn how to refine your chart so that it communicates as efficiently and effectively as
possible. The chapter wrapped up by describing how to provide a written report tailored to specific audiences who
will be interested in the results of your data analysis project.
A Look Ahead
In Chapter 6, you will learn how to apply Data Analytics to the audit function and how to perform substantive audit
tests, including when and how to select samples and how to confirm account balances. Specifically, we discuss the
use of different types of descriptive, diagnostic, predictive, and prescriptive analytics as they are used to generate
computer-assisted auditing techniques.
244
ISTUDY
The large public accounting firms offer a variety of analytical
tools to their customers. Take PwC’s Halo, for example. This tool
allows auditors to interrogate a client’s data and identify patterns
and relationships within the data in a user-friendly dashboard. By
mapping the data, auditors and managers can identify inefficien-
cies in business processes, discover areas of risk exposure, and
correct data quality issues by drilling down into the individual
users, dates and times, and amounts of the entries. Tools like
Halo allow auditors to develop their audit plan by narrowing their
focus and audit scope to unusual and infrequent issues that rep-
resent high audit risk.
fizkes/Shutterstock
Source: http://halo.pwc.com
OBJECTIVES
After reading this chapter, you should be able to:
245
ISTUDY
246 Chapter 5 The Modern Accounting Environment
EXHIBIT 5-1
The IMPACT Cycle
Source: Isson, J. P., and J. S.
Harriott. Win with Advanced
Business Analytics: Creating
Business Value from Your
Data. Hoboken, NJ: Wiley,
2013.
Automation can include routine tasks, such as combining data from different sources for
analysis, and more complex actions, such as responding to natural language queries. In the
past, analytics and automation were performed by hobbyists and consultants within a firm.
In a modern environment, companies form centers of expertise where they concentrate
specialists in a single geographic location and use information and communication tech-
nologies to work in remote teams. Because the data are network-accessible, multiple users
interact with the data and complete workflows of tasks with the assistance of remote team
members and bots, or automated robotic scripts commonly called robotics process automa-
tion. The specialists manage the bots like they would normal employees, continuously evalu-
ating their performance and contribution to the company.
You’ll recall from your auditing course that assurance services are crucial to building
and maintaining trust within the capital markets. In response to increasing regulation in the
United States, the European Union, and other jurisdictions, both internal and external audi-
tors have been tasked with providing enhanced assurance while also attempting to reduce
(or at least maintain) the audit fees. This has spurred demand for more audit automation
along with an increased reliance on auditors to use their judgment and decision-making
skills to effectively interpret and support their audit findings with managers, shareholders,
and other stakeholders.
Both external and internal auditors have been applying simple Data Analytics for
decades in evaluating risk within companies. Think about how an evaluation of inventory
ISTUDY
Chapter 5 The Modern Accounting Environment 247
turnover can spur a discussion on inventory obsolescence or how working capital ratios
are used to identify significant issues with a firm’s liquidity. From an internal audit (and/
or management accounting) perspective, evaluating cost variances can help identify opera-
tional inefficiencies or unfavorable contracts with suppliers.
The audit concepts of professional skepticism and reasonable assurance are as much a
part of the modern audit as in the past. There has been a shift, however, from simply pro-
viding reasonable assurance on the processes to the additional assurance of the robots
that are performing a lot of the menial audit work. Where, before, an auditor may have
looked at samples and gathered evidence to make inferences to the population, now that
same auditor must understand the controls and parameters that have been programmed
into the robot to analyze the full population. In other words, as these automated bots do
more of the routine analytics, auditors will be free to exercise more judgment to interpret
the alarms and data while refocusing their effort on testing the parameters used by the
robots.
Auditors use Data Analytics to improve audit quality by more accurately assessing risk
and selecting better substantive procedures and tests of controls. While the exercises the
auditors conduct are fairly routine, the models can be complex and require auditor judg-
ment and interpretation. For example, if an auditor receives 1,000 notifications of a control
violation during the day, does that mean there is a control weakness or that the settings on
the automated control are too precise? Are all those notifications actual control violations
that require immediate attention, or are most of them false positives—transactions that are
flagged as exceptions but are normal and acceptable?
The auditors’ role is to make sure that the appropriate analytics are used and that the
output of those analytics—whether a dashboard, notifications of exceptions, or accuracy of
predictive models—correspond to management’s expectations and assertions.
ISTUDY
248 Chapter 5 The Modern Accounting Environment
Internal auditors are also more likely to have working knowledge of the different types of
systems implemented at their companies. They are familiar with how the general journals
from a product like Oracle’s JD Edwards actually reconcile to the general ledger in SAP
to generate financial reports and drill down into the data. Because the systems themselves
and the implementation of those systems vary across organizations (and even within orga-
nizations), internal auditors recognize that analytics are not simply a one-size-fits-all type
of strategy.
Lab Connection
Lab 5-5 has you filter data to limit the scope of data analysis to meet internal
audit objectives.
PROGRESS CHECK
1. What types of sensors do businesses use to track activity?
2. Make the case for why an internal audit is increasingly important in the mod-
ern audit. Why is it also important for external auditors and the scope of their
work?
ISTUDY
Chapter 5 The Modern Accounting Environment 249
Translator
Data
Warehouse
Data
Warehouse
Management Audit
Dashboard Program
Continuous
Monitoring
ISTUDY
250 Chapter 5 The Modern Accounting Environment
• Procure to Pay Subledger: defines purchase orders and line items, goods received and
line items, invoices received and line items, open accounts payable and adjustments,
payments, and supplier master data.
• Inventory Subledger: defines inventory location master data, product master data, inven-
tory on hand data, and inventory movement transactions as well as physical inventory
and material cost.
• Fixed Asset Subledger: defines fixed asset master data, additions, removal, and deprecia-
tion calculations.
EXHIBIT 5-3
Audit Data Standards
The audit data
standards define
common elements
needed to audit the
order-to-cash or sales
process.
Source: https://www.aicpa.
org/InterestAreas/FRC/
AssuranceAdvisoryServices/
DownloadableDocuments/
AuditDataStandards/
AuditDataStandards.O2C.
July2015.pdf
*If receivable balances are tracked by customer only (not by invoice), then Customer_Account_ID is used as a key to join tables
to the Open_Accounts_Receivable table instead of both Customer_Account_ID and Invoice_ID.
**The User_Listing table can be joined to three fields, all of which contain a user ID—Entered_By, Approved_By, Last_Modified_By.
With standard data elements in place, not only will internal auditors streamline their
access to data, but they also will be able to build analytical tools that they can share with
others within their company or professional organizations. This can foster greater collabo-
ration among auditors and increased use of Data Analytics across organizations. These data
elements will be useful when performing substantive testing in Chapter 6.
Even if the standard is never adopted by data suppliers, auditors can still take advantage
of the audit data standards as a common data model. For example, Exhibit 5-4 shows the
mapping of a set of Purchase Card data to the Procure to Pay Subledger Standard. Once
the mapping algorithm has been generated using SQL or other tool, any new data can be
analyzed quickly and easily.
Lab Connection
Lab 5-1 has you transform transaction data to conform with the AICPA’s audit
data standard. Lab 5-2 has you create a dashboard with visualizations to
explore the transformed common data.
ISTUDY
Chapter 5 The Modern Accounting Environment 251
Posted Date
MCC Description
PROGRESS CHECK
3. What are the advantages of the use of homogeneous systems? Would a merger
target be more attractive if it used a similar financial reporting system as the
potential parent company?
4. How does the use of audit data standards facilitate data transfer between audi-
tors and companies? How does it save time for both parties?
ISTUDY
252 Chapter 5 The Modern Accounting Environment
• Procedures and specific tasks that the audit team will execute to collect and analyze
evidence. These typically include tests of controls and substantive tests of transaction
details.
• Formal evaluation by the auditor and supervisors.
Because audit plans are formalized and standardized, they lend themselves to the use of
Data Analytics and, consequently, automation. For example:
• The methodology may be framed by specific standards, such as the Public Company
Accounting Oversight Board’s (PCAOB) auditing standards, the Committee of
Sponsoring Organizations’ (COSO) Enterprise Risk Management framework, the
Institute of Internal Auditors’ (IIA) International Standards for the Professional
Practice of Internal Auditing (Standards), or the Information Systems Audit and
Control Association’s (ISACA) Control Objectives for Information and Related
Technologies (COBIT) framework. Data Analytics may be used to analyze the
standards and determine which requirements apply to the organization being audited.
• The scope of the audit defines parameters that will be used to filter the records or
transactions being evaluated.
• Simple-to-complex Data Analytics can be applied to a company’s data (for an internal
audit) or a client’s data (for an external audit) during the planning stage of the audit to
identify which areas the auditor should focus on. This may include outlier detection or
other substantive tests of suspicious or risky transactions.
• Audit procedures themselves typically identify data, locations, and attributes that the
auditors will evaluate. These are the variables that will provide the input for many of the
substantive analytical procedures discussed in Chapter 6.
• The evaluation of audit data may be distilled into a risk score. This may be a function
of the volume of exceptional records or level of exposure for the functional area. If
the judgment and decision making can be easily defined, a rule-based analytic could
automatically assign a score for the auditor to review. For more complex judgments,
the increasing prevalence of artificial intelligence and machine learning discussed in
Chapter 3 may be of assistance. Historical observations of the scores auditors assign to
specific cases and outcomes may assist in the creation of an automated scoring model.
Typically, internal audit organizations that have adopted Data Analytics to enhance their
audit have done so when an individual on the team has begun tinkering with Data Analytics.
They convince their managers that there is value in using the data to direct the audit, and
the manager may become a champion in the process. Once they show the value proposition
of Data Analytics, they are given more resources to build the program and adapt the exist-
ing audit program to include more data-centric evaluation where appropriate.
Because of the potential disruption to the organization, it is more likely that an internal
auditor will adapt an existing audit plan than develop a new system from scratch. Automat-
ing the audit plan and incorporating Data Analytics involve the following steps, which are
similar to the IMPACT model:
1. Identify the questions or requirements in the existing audit plan.
2. Master the data by identifying attributes and elements that are automatable.
3. Perform the test plan, in this case by developing analytics (in the form of rules or mod-
els) for those attributes identified in step 2.
4. Address and refine results. List expected exceptions to these analytics and expected
remedial action by the auditor, if any.
5. Communicate insight by testing the rules and comparing the output of the analytics to
manual audit procedures.
6. Track outcomes by following up on alarms and refining the models as needed.
ISTUDY
Chapter 5 The Modern Accounting Environment 253
Let’s assume that an internal auditor has been tasked with implementing Data Analytics
to automate the evaluation of a segregation of duties control within SAP. The auditor evalu-
ates the audit plan and identifies a procedure for testing this control. The audit plan identifies
which tables and fields contain relevant data, such as an authorization matrix, and the specific
roles or permissions that would be incompatible. The auditor would use that information to
build a model that would search for users with incompatible roles and notify the auditors.
Lab Connection
Lab 5-4 has you review an audit plan and identify potential data tables and
attributes that would enable audit automation.
ISTUDY
254 Chapter 5 The Modern Accounting Environment
ISTUDY
Chapter 5 The Modern Accounting Environment 255
ISTUDY
256 Chapter 5 The Modern Accounting Environment
these platforms are hosted in the cloud, so members of the audit team can participate in the
various functions from any location. Smaller audit shops can build ad hoc workpaper reposi-
tories using OneDrive with Office 365, though there are fewer controls over the documents.
Lab Connection
Lab 5-3 has you upload audit data to the cloud and review changes.
PROGRESS CHECK
5. Continuous audit uses alarms to identify exceptions that might indicate an audit
issue and require additional investigation. If there are too many alarms and
exceptions based on the parameters of the continuous audit system, will continu-
ous auditing actually help or hurt the overall audit effectiveness?
6. PwC uses three systems to automate its audit process. Aura is used to direct the
audit by identifying which evidence to collect and analyze, Halo performs Data
Analytics on the collected evidence, and Connect provides the workflow pro-
cess that allows managers and partners to review and sign off on the work. How
does that line up with the steps of the IMPACT model we’ve discussed through-
out the text?
Summary
■ As accounting has evolved over the past few decades, automation has driven many of
the changes in turn enabling additional Data Analytics. Data Analytics has improved
management’s and auditors’ ability to understand their business, assess risk, inform their
opinions, and improve assurance over the processes and controls in their organizations.
(LO 5-1)
■ Enterprise data appear in many forms and the adoption of a common data model makes
it easier to analyze data from a variety of systems with ease. (LO 5-2)
■ The main impact of automation and Data Analytics on the accounting profession comes
through optimization of the management dashboard and audit plan. (LO 5-3)
■ Data Analytics and automation allow management and internal auditors to continuously
monitor and audit the systems and processes within their companies. (LO 5-4)
■ As audit procedures become increasingly technical, documentation continues to be an
essential way for internal and external auditors to increase their reliance on automated
controls and procedures. (LO 5-5)
Key Words
audit data standards (ADS) (249) The audit data standards define common tables and fields that are
needed by auditors to perform common audit tasks. The AICPA developed these standards.
ISTUDY
common data model (249) A tool used to map existing database tables and fields from various systems
to a standardized set of tables and fields for use with analytics.
continuous auditing (253) A process that provides real-time assurance over business processes and
systems.
continuous monitoring (253) A process that constantly evaluates internal controls and transactions and
is the chief responsibility of management.
continuous reporting (253) A process that provides real-time access to the system status and account-
ing information.
data warehouse (248) A data warehouse is a repository of data accumulated from internal and external
data sources, including financial data, to help management decision making.
flat file (248) A means of storing data in one place, such as in an Excel spreadsheet, as opposed to stor-
ing the data in multiple tables, such as in a relational database.
heterogeneous systems approach (248) Heterogeneous systems represent multiple installations or
instances of a system. It would be considered the opposite of a homogeneous system.
homogeneous systems approach (248) Homogeneous systems represent one single installation or
instance of a system. It would be considered the opposite of a heterogeneous system.
production or live systems (248) Production (or live) systems are those active systems that collect and
report and are directly affected by current transactions.
systems translator software (248) Systems translator software maps the various tables and fields from
varied ERP systems into a consistent format.
257
ISTUDY
Multiple Choice Questions
®
1. (LO 5-1) Under the guidance of the chief audit executive (CAE) or another manager,
internal auditors build teams to develop and implement analytical techniques to aid all
of the following audits except:
a. process efficiency and effectiveness.
b. governance, risk, and compliance, including internal controls effectiveness.
c. tax compliance.
d. support for the financial statement audit.
2. (LO 5-2) Which audit data standards ledger defines product master data, location data,
inventory on hand data, and inventory movement?
a. Order to Cash Subledger
b. Procure to Pay Subledger
c. Inventory Subledger
d. Base Subledger
3. (LO 5-2) Which audit data standards ledger identifies data needed for purchase orders,
goods received, invoices, payments, and adjustments to accounts?
a. Order to Cash Subledger
b. Procure to Pay Subledger
c. Inventory Subledger
d. Base Subledger
4. (LO 5-2) A company has two divisions, one in the United States and the other in China.
One uses Oracle and the other uses SAP for its basic accounting system. What would
we call this?
a. Homogeneous systems
b. Heterogeneous systems
c. Dual data warehouse systems
d. Dual lingo accounting systems
5. (LO 5-3) Which of the following defines the time period, the level of materiality, and the
expected time for an audit?
a. Audit scope
b. Potential risk
c. Methodology
d. Procedures and specific tasks
6. (LO 5-3) All of the following may serve as standards for the audit methodology except:
a. PCAOB’s auditing standards.
b. COSO’s ERM framework.
c. ISACA’s COBIT framework.
d. FASB’s accounting standards.
7. (LO 5-4) When there is an alarm in a continuous audit, but it is associated with a normal
event, we would call that a:
a. false negative.
b. true negative.
c. true positive.
d. false positive.
258
ISTUDY
8. (LO 5-4) When there is no alarm in a continuous audit, but there is an abnormal event,
we would call that a:
a. false negative.
b. true negative.
c. true positive.
d. false positive.
9. (LO 5-4) If purchase orders are monitored for unauthorized activity in real time while
month-end adjusting entries are evaluated once a month, those transactions monitored
in real time would be an example of a:
a. traditional audit.
b. periodic test of internal controls.
c. continuous audit.
d. continuous monitoring.
10. (LO 5-2) Who is most likely to have a working knowledge of the various enterprise
systems that are in use in the company?
a. Chief executive officer
b. External auditor
c. Internal auditor
d. IT staff
1. (LO 5-1) Why has most innovation in Data Analytics originated more in an internal audit
than an external audit? Or if not, why not?
2. (LO 5-2) Is it possible for a firm to have general journals from a product like JD Edwards
actually reconcile to the general ledger in SAP to generate financial reports or drill
down to see underlying transactions? Why or why not?
3. (LO 5-2) Is it possible for multinational firms to have many different financial reporting
systems and enterprise systems packages all in use at the same time?
4. (LO 5-2) How does the systems translator software work? How does it store the merged
data into a data warehouse?
5. (LO 5-2) Why is it better to extract data from a data warehouse than a production or live
system directly?
6. (LO 5-2) Would an auditor view heterogeneous systems as an audit risk? Why or
why not?
7. (LO 5-5) Why would audit firms prefer to use proprietary workpapers rather than just
storing working papers on the cloud?
Problems
®
1. (LO 5-2) Match the description of the data standard to each of the current audit data
standards:
• Base
• General Ledger
• Order-to-Cash Subledger
• Procure-to-Pay Subledger
259
ISTUDY
Data Standard Description Current Audit Data Standards
4. Chart of accounts
5. File formats
6. Shipments to customers
2. (LO 5-3) Accounting has a great number of standards-setting bodies, who generally are
referred to using an acronym. Match the relevant standards or the responsibility of each
to the acronym of these standards-setting bodies.
• AICPA
• PCAOB
• FASB
• COBIT
• SEC
• COSO
• IIA
2. S
tandard-setting body for external auditing
standards
3. R
equires submission of 10-K and 10-Q’s by
publicly held companies
4. A
rticulates key concepts to enhance internal
controls and deter fraud
5. P
rovides guidance for companies that use
information technology
3. (LO 5-4) In each of the following situations, identify which situation exists with regard to
normal and abnormal events, and alarms or lack of alarms from a continuous monitoring
system with these indicators:
• True negative
• False negative
• False positive
• True positive
No Alarm Alarm
1. Normal event
2. Abnormal event
260
ISTUDY
4. (LO 5-1, 5-2, 5-4, 5-5) Match the definitions to these real-time terms:
• Continuous reporting
• Continuous auditing
• Continuous monitoring
1. T
otal sales recognized by the company today
disclosed to shareholders
5. R
eal-time evaluation of the size and nature of
each payroll transaction recorded
6. R
eal-time assurance over the recording of
expenses
System: Homogeneous?
Feature Heterogeneous? Or Both?
2. A
llows auditors to review data in a data
warehouse
4. H
as the same ERP systems in multiple locations
6. (LO 5-2) Analysis: What are the advantages of the use of homogeneous systems?
Would a merger target be more attractive if it used a similar financial reporting system
as the potential parent company?
7. (LO 5-2) Multiple Choice: Consider Exhibit 5-3. Looking at the audit data standards
order-to-cash process, which of the following describes the purpose of the AR Adjust-
ments table?
a. Shows the balance in accounts receivable
b. Shows manual changes to accounts receivables, such as credit and debit memos
c. Shows the sales transactions and cash collections that affect the accounts receiv-
able balance
d. Shows the customer who owes the company money
8. (LO 5-2) Analysis: Who developed the audit data standards? In your opinion, why is it
the right group to develop and maintain them rather than, say, the Big Four firms or a
small practitioner? What is purpose of the data standards and to whom are the stan-
dards applicable?
261
ISTUDY
9. (LO 5-1) Multiple Choice: Auditors can apply simple to complex Data Analytics to a
client’s data. At which stage would DA be applied to identify which areas the auditor
should focus on?
a. Continuous
b. Planning
c. Remote
d. Reporting
e. Performance
10. (LO 5-4) Multiple Choice: What actions should be taken by either internal auditors or
management if a company’s continuous audit system has too many alarms that are false
positive?
a. Abandon the system.
b. Change the parameters to focus on lower-risk items.
c. Change the parameters to focus on higher-risk items.
d. Ignore the alarms.
11. (LO 5-4) Multiple Choice: What actions should be taken by either internal auditors or
management if a company’s continuous audit system has too many missed abnormal
events (such as false negatives)?
a. Ignore the alarms.
b. Abandon the system.
c. Change the parameters to focus on lower-risk items.
d. Change the parameters to focus on higher-risk items.
12. (LO 5-3) Analysis: Implementing continuous auditing procedures is similar to automat-
ing an audit plan with the additional step of scheduling the automated procedures to
match the timing and frequency of the data being evaluated and the notification to the
auditor when exceptions occur. In your opinion, will the traditional audit be replaced by
continuous auditing? Support your answer.
262
ISTUDY
LABS ®
Microsoft Excel
LAB 5-1M Example of Transforming Data in Microsoft Power Query
263
ISTUDY
Tableau | Prep
264
ISTUDY
AMOUNT -> Purchase_Order_Amount_Local
MERCHANT -> Supplier_Account_Name
TRANSACTION_DATE -> Purchase_Order_Date
POST_DATE -> Entered_Date
MCC_DESCRIPTION -> Supplier_Group
ROWID -> Purchase_Order_ID
b. Take a screenshot (label it 5-1MA) of your data with the renamed Field
Names.
3. Now create new columns to transform the current data to match the audit data
standard:
a. Create custom columns to rename existing columns:
1. From the ribbon, click Add Column, then click Custom Column.
2. Refer to the table below to enter the New Column Name (e.g., Purchase_
Order_Fiscal_Year) and Custom Column Formula (e.g., = “2020”). If an
existing attribute is given, you can double-click the value from the Avail-
able Columns list to add it to the formula. If the value is given in quotes,
include the quotes to fill the column with that value.
3. Click OK to add the new custom column.
4. Repeat steps 1–3 for the remaining columns.
New Column Name (ADS Destination) Custom Column Formula (From PCard Source) Type
4. Finally, remove the original columns so that only your new ADS Columns remain:
a. Right-click the following fields, and choose Remove:
1. CALENDAR_YEAR
2. CALENDAR_MONTH
3. LAST_NAME
4. FIRST_NAME
b. Take a screenshot (label it 5-1MB) of your new columns formatted to
match the ADS.
4. From the ribbon, click Home, then click Close & Load.
5. When you are finished answering the lab questions, you may close Excel. Save
your file as Lab 5-1 OK PCard ADS.xlsx.
Tableau | Prep
Lab Note: Tableau Prep takes extra time to process large datasets.
1. Open Tableau Prep Builder and connect to your data:
a. Click Connect to Data > To a File > Text File.
b. Locate the Lab 5-1 OK PCard FY2020.csv file on your computer and click
Open.
265
ISTUDY
2. Now remove, rename, and create new columns to transform the current data to
match the audit data standard:
a. Uncheck the following unneeded fields:
1. CALENDAR_YEAR
2. CALENDAR_MONTH
b. Rename the following fields by double-clicking the Field Name:
AGENCYNBR -> Business_Unit_Code
AGENCYNAME -> Business_Unit_Description
ITEM_DESCR -> Purchase_Order_Line_Product_Description
AMOUNT -> Purchase_Order_Amount_Local
MERCHANT -> Supplier_Account_Name
TRANSACTION_DATE -> Purchase_Order_Date
POST_DATE -> Entered_Date
MCC_DESCRIPTION -> Supplier_Group
ROWID -> Purchase_Order_ID
c. Take a screenshot (label it 5-1TA) of your flow with the renamed list of
Field Names.
3. Click the + next to Lab 5-1 OK PCa. . . in the flow and choose Add Clean Step.
a. From the toolbar, click Create Calculated Field. . . .
b. Refer to the table below to enter the Field Name (e.g., Purchase_Order_
Fiscal_Year) and Formula (e.g., “2020”). If the value is given in quotes,
include the quotes to fill the column with that value.
c. Click Apply to add the new custom column.
d. Repeat steps a–c for the remaining columns.
266
ISTUDY
6. In the flow pane, right-click Clean 1 and choose Rename and name the step
Add Columns.
7. Take a screenshot (label it 5-1TB) of your cleaned data file, showing the
columns.
8. Click the + next to your Add Columns task and choose Output.
9. In the Output pane, click Browse:
a. Navigate to your preferred location to save the file.
b. Name your output file Lab 5-1 OK PCard ADS.hyper.
c. Click Accept.
10. In the Output box next to Add Columns, click the arrow to Run Flow. When it
is finished processing, click Done. It will show you the total number or records.
11. When you are finished answering the lab questions, you may close Tableau
Prep. Save your flow process file as Lab 5-1 OK PCard ADS.tfl.
267
ISTUDY
Lab 5-2 Example Output
By the end of this lab, you will transform data to fit a common data model. While your
results will include different data values, your work should look similar to this:
Tableau | Desktop
268
ISTUDY
Rev. Confirming Pages
269
d. Add a stacked column chart with the Total Purchases by Day and resize
it to fit the bottom-left corner of the page:
1. Y-axis: Purchase_Order_Amount_Local
2. X-axis: Purchase_Order_Date > Date Hierarchy > Day
3. Legend: Supplier_Group
4. Format:
a. Visual > Y-axis > Title > Total Purchases
b. Visual > X-axis > Title > Day
c. General > Title > Text > Total Purchases by Day
e. Add a tree map of Total Purchases by Business Unit and resize it to fit
the bottom-right corner of the page:
1. Values: Purchase_Order_Amount_Local
2. Category: Business_Unit_Description
3. Format:
a. General > Title > Text > Total Purchases by Business Unit
6. Take a screenshot (label it 5-2MB).
7. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 5-2 OK PCard Analysis.pbix.
Tableau | Desktop
270
271
ISTUDY
Lab 5-3 Set Up a Cloud Folder and Review Changes—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You have rotated into the internal audit department at Sláinte. Your
team is still using company email to send evidence back and forth, usually in the form of
documents and spreadsheets. There is a lot of duplication of these files, and no one is quite
sure which version is the latest. You see an opportunity to streamline this process using
OneDrive.
Data: Lab 5-3 Slainte Audit Files.zip - 294KB Zip / 360KB Files
Microsoft | OneDrive
Microsoft OneDrive
LAB 5-3M Example of Working Papers on Microsoft OneDrive
272
ISTUDY
Microsoft | OneDrive
273
ISTUDY
Lab 5-3 Part 2 Review Changes to Working Papers
The goal of a shared folder is that other members of the audit team can contribute and edit
the documents. Commercial software provides an approval workflow and additional inter-
nal controls over the documents to reduce manipulation of audit evidence, for example.
For consumer cloud platforms, one control appears in the versioning of documents. As
revisions are made, old versions of the documents are kept so that they can be reverted to,
if needed.
Microsoft | OneDrive
274
ISTUDY
Lab 5-4 Identify Audit Data Requirements—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As the new member of the internal audit team at Sláinte, you have been
given access to the cloud folder for your team. The chief audit executive is interested in
using Data Analytics and automation to make the audit more efficient. Your internal audit
manager agrees and has tasked you with reviewing the audit plan. She has provided three
“audit action sheets” with procedures that they have been using for the past 3 years to evalu-
ate the procure-to-pay (purchasing) process and is interested in your thoughts for modern-
izing them.
Data: Refer to your shared OneDrive folder from Lab 5-3 or Lab 5-4 Slainte Audit Files.zip -
262KB Zip / 290KB Files
Microsoft | Excel
Microsoft Excel
LAB 5-4M Example of Summarized Audit Automation Data Requirements in Microsoft Excel
Microsoft | Excel
1. Open your Slainte Working Papers folder on OneDrive or the Lab 5-4 Slainte
Files Revised.zip file.
275
ISTUDY
2. Evaluate your audit action sheets:
a. Look inside the Master File for the document titled Audit Action Sheets
and open it to edit it.
b. Use the Yellow highlighter to identify any master or transaction tables,
such as “Vendors” or “Purchase Orders.”
c. Use the Green highlighter to identify any fields or attributes, such as
“Name” or “Date.”
d. Use the Blue highlighter to identify any specific values or rules, such as
“TRUE,” “January 1st,” “Greater than . . .”
3. Summarize your highlighted data:
a. Create a new Excel workbook in your Master File to summarize your
highlighted data elements from the three audit action sheets. Use the
following headers:
276
ISTUDY
You have shared how increasing the frequency of some of the tests would provide a better
control for the process and allow the auditor to respond quickly to the exceptions. Your inter-
nal audit manager has asked you to propose a new schedule for the three audit action sheets.
Microsoft | Excel
Auto/Manual Frequency
b. For each element and rule you identified in Part 1, determine whether it
requires manual review or can be performed automatically and alert audi-
tors when exceptions occur. Add either Auto or Manual to that column.
c. Finally, determine how frequently the data should be evaluated. Indicate
Daily, Weekly, Monthly, Annually, or During Audit. Think about when
the data are being generated. For example, transactions occur every day,
but new employees are added every few months.
d. Take a screenshot (label it 5-4MC) of the All tables sheet.
3. When you are finished answering the lab questions, you may save and close
your file.
277
ISTUDY
Lab 5-5 Example Output
By the end of this lab, you will create a dashboard to visualize exploratory data. While your
results will include different data values, your work should look similar to this:
Tableau | Desktop
278
ISTUDY
Auditor:
As an auditor, you have been tasked with evaluating discounted transactions that occurred in
the first quarter of 2016 to see if they match the stated sales revenue on the financial statement.
Manager:
As a manager you would like to evaluate store performance in North Carolina during the
holiday shopping season of November and December 2015.
Financial Accountant:
As you are preparing your income statement, you need to calculate the total net sales for all
stores in 2015.
Tax Preparer:
Working in conjunction with state tax authorities, you need to calculate the sales tax liabil-
ity for your monthly tax remittance for March 2016.
279
ISTUDY
c. If prompted to enter credentials, you can keep the default to “Use my
current credentials” and click Connect.
d. If prompted with an Encryption Support warning, click OK to move past it.
e. In the Navigator window, check the following tables and click Transform
Data:
1. CUSTOMER, DEPARTMENT, SKU, SKU_STORE, STORE,
TRANSACT
2. In Power Query, filter your data:
a. Click the STORE query and filter the STATE to show only NC.
b. Click the TRANSACT query and filter the TRAN_DATE to show only a
range of dates from November 1, 2015, to December 31, 2015. Note: Power
Query shows only a sample of the filtered data.
3. Take a screenshot (label it 5-5MA) of the TRANSACT query view.
4. Click Close & Apply.
5. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 5-5 Dillard’s Scope.pbix.
Tableau | Desktop
280
ISTUDY
Lab 5-5 Part 2 Analysis Questions (LO 5-1, 5-2)
AQ1. Why would you want to set filters before you begin creating your analyses?
AQ2. How does limiting the range of values improve the performance of your model?
AQ3. What other filters might you include to satisfy the requirements of the auditor,
financial accountant, or tax preparer from Part 1?
281
ISTUDY
Chapter 6
Audit Data Analytics
A Look Back
In Chapter 5, we introduced Data Analytics in auditing by considering how both internal and external auditors are
using technology in general, and audit analytics specifically, to evaluate firm data and generate support for manage-
ment assertions. We emphasized audit planning, audit data standards, continuous auditing, and audit working papers.
A Look Ahead
Chapter 7 explains how to apply Data Analytics to measure performance for management accountants. By measuring
past performance and comparing it to targeted goals, we are able to assess how well a company is working toward a
goal and recommend actions to correct unexpected patterns.
282
ISTUDY
Internal auditors at Hewlett-Packard Co. (HP) understand how
Data Analytics can improve processes and controls. Manage-
ment identified abnormal behavior with manual journal entries,
and the internal audit department responded by working with
various governance and compliance teams to develop dash-
boards that would allow them to monitor accounting activity. The
dashboard made it easier for management and the auditors to
follow trends, identify spikes in activity, and drill down to iden-
tify the individuals posting entries. Leveraging accounting data
allows the internal audit function to focus on the risks facing HP
and act on data in real time by implementing better controls.
ra2studio/Shutterstock
Audit data analytics provides an enhanced level of control that is
missing from a traditional periodic audit.
OBJECTIVES
After reading this chapter, you should be able to:
LO 6-1 Understand different types of analysis for auditing and when to use
them.
LO 6-2 Explain basic descriptive analytic techniques used in auditing.
LO 6-3 Define and describe diagnostic analytics that are used in auditing.
LO 6-4 Characterize the predictive and prescriptive analytics used in auditing.
283
ISTUDY
284 Chapter 6 Audit Data Analytics
1
J .-H. Lim, J. Park, G. F. Peters, and V. J. Richardson, “Examining the Potential Benefits of Audit Data
Analytics,” University of Arkansas working paper, 2021.
ISTUDY
Chapter 6 Audit Data Analytics 285
from the standards shown in Exhibit 6-1. An auditor interested in user activity would want
to focus on the Sales_Order_ID, Sales_Order_Date, Entered_By, Entered_Date, Entered_
Time, Approved_By, Approved_Date, Approved_Time, and Sales_Order_Amount_Local
attributes. These may give insight into transactions on unusual dates, such as weekends, or
unusually high volume by specific users.
ISTUDY
286 Chapter 6 Audit Data Analytics
There are also many pieces of data that have traditionally evaded scrutiny, including
handwritten logs, manuals, handbooks, and other paper or text-heavy documentation.
Essentially, manual tasks including observation and inspection are generally areas where
Data Analytics may not apply. While there have been significant advancements in artificial
intelligence, there is still a need for auditors to exercise their judgment, and data cannot
always supersede the auditor’s reading of human behavior or a sense that something may
not be quite right even when the data say it is. At least not yet.
Data may also be found in unlikely places. An auditor may be tasked with determining
whether the steps of a process are being followed. Traditional evaluation would involve the
auditor observing or interviewing the employee performing the work. Now that most pro-
cesses are handled through online systems, an auditor can perform Data Analytics on the
time stamps of the tasks and determine the sequence of approvals in a workflow along with
the amount of time spent on each task. This form of process mining enables insight into pro-
cesses used to diagnose problems and suggest improvements where greater efficiency may
be applied. Likewise, data stored in paper documents, such as invoices received from ven-
dors, can be scanned and converted to tabular data using specialized software. These new
pieces of data can be joined to other transactional data to enable new, thoughtful analytics.
There is an increasing promise of working with unstructured Big Data to provide addi-
tional insight into the economic events being evaluated by the auditors, such as surveillance
video or text from email, but those are still outside the scope of current Data Analytics that
an auditor would develop.
ISTUDY
Chapter 6 Audit Data Analytics 287
Finally, the auditor may generate prescriptive analytics that identify a course of action to
take based on the actions taken in similar situations in the past. These analytics can assist
future auditors who encounter similar behavior. Using artificial intelligence and machine
learning, these analytics become decision support tools for auditors who may lack experience
to find potential audit issues. For example, when a new sales order is created for a customer
who has been inactive for more than 12 months, a prescriptive analytic would allow an auditor
to ask questions about the transaction to learn whether this new account is potentially fake,
whether the employee is likely to create other fake accounts, and whether the account and/
or employee should be suspended or not. The auditor would take the output, apply judgment,
and proceed with what they felt was the appropriate action.
Most auditors will perform descriptive and diagnostic analytics as part of their audit
plan. On rare occasions, they may experiment with predictive and prescriptive analytics
directly. More likely, they may identify opportunities for the latter analytics and work with
data scientists to build those for future use.
Some examples of CAATs and audit procedures related to the descriptive, diagnostic,
predictive, and prescriptive analytics can be found in Exhibit 6-2.
Descriptive—summarize Age analysis—groups balances by date Analysis of new accounts opened and employee
activity or master Sorting—identifies largest or smallest values and helps bonuses by employee and location.
data based on certain identify patterns Count the number/dollar amount of transactions that
attributes Summary statistics—mean, median, min, max, count, occur outside normal business hours or at the end/
sum beginning of the period.
Sampling—random and monetary unit
Diagnostic—detect Z-score—outlier detection Analysis of new accounts reveals that an agent has an
correlations and patterns t-Tests—a statistical test used to determine if there is unusual number of new accounts opened for customers
of interest and compare a significant difference between the means of two who have been inactive for more than 12 months.
them to a benchmark groups, or two datasets. An auditor assigns an expected Benford’s value
Benford’s law—identifies transactions or users with non- to purchase transactions, then averages them by
typical activity based on the distribution of digits employee to identify employees with unusually large
purchases.
Drill-down—explores the details behind the values
An auditor filters out transactions that are below a
Exact and fuzzy matching—joins tables and identifies
materiality threshold.
plausible relationships
Sequence check—detects gaps in records and duplicates
entries
Stratification—groups data by categories
Clustering—groups records by non-obvious similarities
Predictive—identify Regression—predicts specific dependent values based Analysis of new accounts opened for customers who
common attributes or on independent variable inputs have been inactive for more than 12 months collects
patterns that may be used Classification—predicts a category for a record data that are common to new account opening, such as
to identify similar activity account type, demographics, and employee incentives.
Probability—uses a rank score to evaluate the strength
of classification Predict the probability of bankruptcy and the ability to
continue as a going concern for a client.
Sentiment analysis—evaluates text for positive or
negative sentiment to predict positive or negative Predict probability of fraudulent financial statements.
outcomes Assess and predict management bias, given sentiment
of conference call transcripts.
Prescriptive—recommend What-if analysis—decision support systems Analysis determines procedures to follow when new
action based on Applied statistics—predicts a specific outcome or class accounts are opened for inactive customers, such as
previously observed requiring approval.
Artificial intelligence—uses observations of past actions
actions to predict future actions for similar events
ISTUDY
288 Chapter 6 Audit Data Analytics
While many of these analyses can be performed using Excel, most CAATs are built on
generalized audit software (GAS), such as IDEA, ACL, or TeamMate Analytics. The GAS
software has two main advantages over traditional spreadsheet software. First, it enables
analysis of very large datasets. Second, it automates several common analytical routines, so
an auditor can click a few buttons to get to the results rather than writing a complex set of
formulas. GAS is also scriptable and enables auditors to record or program common analy-
ses that may be reused on future engagements.
Communicate Insights
Many analytics can be adapted to create an audit dashboard for measuring risk in transac-
tions or exceptions to control rules, particularly if the firm has adopted continuous auditing.
The primary output of CAATs is evidence that may be used to test management asser-
tions about the processes, controls, and data quality. This evidence is included in the audit
workpapers.
Track Outcomes
The detection and resolution of audit exceptions may be a valuable measure of the effi-
ciency and effectiveness of the internal audit function itself. Additional analytics may track
the number of exceptions over time and the time taken to report and resolve the issues.
For the CAATs involved, a periodic validation process should occur to ensure that they
continue to function as expected.
PROGRESS CHECK
1. Using Exhibit 6-2 as a guide, compare and contrast descriptive and diagnostic
analytics. How might these be used in an audit?
2. In a continuous audit, how would a dashboard help to communicate audit findings
and spur a response?
ISTUDY
Chapter 6 Audit Data Analytics 289
There are many ways to calculate aging in Excel, including the use of pivot tables. If you
have a simple list of accounts and balances, you can calculate a simple age of accounts in
Excel using the following procedure.
Sorting
Sometimes, simply viewing the largest or smallest values can provide meaningful insight.
Sorting in ascending order shows the smallest number values first. Sorting in descending
order shows the largest values first. The type of data that lends itself to sorting is any numer-
ical, date, or text data of interest.
Summary Statistics
Summary statistics provide insight into the relative size of a number compared with the
population. The mean indicates the average value, while the median produces the middle
value when all the transactions are lined up in a row. The min shows the smallest value,
while the max shows the largest. Finally, a count tells how many records exist, where the
sum adds up the values to find a total. Once summary statistics are calculated, you have a
reference point for an individual record. Is the amount above or below average? What per-
centage of the total does a group of transactions make up? Excel’s Data Analysis Toolpak
offers an easy option to show summary statistics for underlying data.
The type of data that lends itself to calculating summary statistics is any numerical data,
such as a dollar amount of quantity. Summary statistics are limited for categorical data;
however, you can still calculate proportions and counts of groups in categorical data.
Lab Connection
Lab 6-1 has you identify summary statistics, trend lines, and clustering to
explore your audit data.
Sampling
Sampling is useful when you have manual audit procedures, such as testing transaction
details or evaluating source documents. The idea is that if the sample is an appropriate
size, the features of the sample can be confidently generalized to the population. So, if the
sample has no errors (misstatement), then the population is unlikely to have errors as well.
Of course, sampling has its limitations. The confidence level is not a guarantee that you
ISTUDY
290 Chapter 6 Audit Data Analytics
won’t miss something critical like fraud. But it does limit the scope of the work the auditor
must perform.
To determine appropriate sample size, there are three determinants, including desired
confidence level, tolerable misstatement, and estimated misstatement.
Lab Connection
Lab 6-4 has you pull a random sample of a large dataset.
PROGRESS CHECK
3. What type of descriptive analytics would you use to find negative numbers that
were entered in error?
4. How does the computation of summary statistics (mean, mode, median, max,
min, etc.) meet the definition of descriptive analytics?
Z-score
The Z-score, introduced in Chapter 3, assigns a value to a number based on how many stan-
dard deviations it stands from the mean, shown in Exhibit 6-4. By setting the mean to 0, you
can see how far a point of interest is above or below it. For example, a point with a Z-score
of 2.5 is two-and-a-half standard deviations above the mean. Because most values that come
from a large population tend to be normally distributed (frequently skewed toward smaller
values in the case of financial transactions), nearly all (99 percent) of the values should
be within plus-or-minus three standard deviations. If a value has a Z-score of 3.9, it is very
likely an anomaly or outlier that warrants scrutiny and potentially additional analysis.
t-Tests
A t-test is used as a statistic to determine if there is a significant difference between the means
of two groups, or two datasets. A t-test allows the auditor to compare the average values of
ISTUDY
Chapter 6 Audit Data Analytics 291
the two datasets and determine if they came from the same population. The t-test takes a
sample from each of the two datasets and establishes the problem statement by assuming
a null hypothesis that the two means are equal. As part of the test, certain values are calcu-
lated, t-statistics are computed and compared against the standard values, and the assumed
null hypothesis is either rejected or unable to be rejected. In an audit setting, a t-test may be
used to determine if there is a difference in the mean travel and expense reports of certain
executives as compared to other executives. A t-test may also be used to test a hypothesis
regarding related party transactions, by testing whether related party transactions are statisti-
cally different in magnitude than similar transactions from independent parties.
ISTUDY
292 Chapter 6 Audit Data Analytics
Benford’s Law
Benford’s law states that when you have a large set of naturally occurring numbers, the
leading digit(s) is (are) more likely to be small. The economic intuition behind it is that
people are more likely to make $10, $100, or $1,000 purchases than $90, $900, or $9,000
purchases. According to Benford’s law, more numbers in a population of numbers start
with 1 than any other digit, followed by those that begin with 2, then 3, and so on (as shown
in Exhibit 6-5). This law has been shown in many settings, such as the amount of electricity
bills, street addresses, and GDP figures from around the world.
20%
15%
10%
5%
0%
1 2 3 4 5 6 7 8 9
Purchases GDP 2016 Benford’s Predicted
In auditing, we can use Benford’s law to identify transactions or users with nontypical
activity based on the distribution of the first digits of the number. For example, assume that
purchases over $500 require manager approval. A cunning employee might try to make large
purchases that are just under the approval limit to avoid suspicion. She will even be clever and
make the numbers look random: $495, $463, $488, and so on. What she doesn’t realize is that
the frequency of the leading digit 4 is going to be much higher than it should be, shown in
Exhibit 6-6. Benford’s law can also detect random computer-generated numbers because those
will have equally distributed first digits. Adding additional leading digits refines the analysis.
EXHIBIT 6-6 35%
Using Benford’s Law
Structured purchases 30%
may look normal,
but they alter the 25%
distribution under
Benford’s law. 20%
15%
10%
5%
0%
1 2 3 4 5 6 7 8 9
Purchases Benford’s Predicted
ISTUDY
Chapter 6 Audit Data Analytics 293
Data that lend themselves to applying Benford’s law tend to be large sets of numerical
data, such as monetary amounts or quantities.
Once an auditor uses Benford’s law to identify a potential issue, they can further ana-
lyze individual groupings of transactions, for example, by individual or location. An auditor
would append the expected probability of each transaction, for example, transactions start-
ing with 1 are assigned 0.301 or 30.1 percent. Then the auditor would calculate the average
estimated Benford’s law probability over the group of transactions. Given that the expected
average of all Benford’s law probabilities is 0.111 or 11.1 percent, individuals with an aver-
age estimated value over 11.1 percent would tend to have transactions with more leading
digits of 1, 2, 3, and so on. Conversely, individuals with an average estimated value below
11.1 percent would tend to have transactions with more leading digits of 7, 8, 9, and so on.
This analysis allows auditors to narrow down the individual employees, customers, depart-
ments, or locations that may be engaging in abnormal transactions.
While Benford’s law can apply to large sets of transactional data, it doesn’t hold for
purposefully assigned numbers, such as sequential numbers on checks, account numbers,
or other categorical data. However, auditors have found similar patterns when applying
Benford’s law to text, looking at the first letter of words used in financial statement com-
munication or email messages.
Lab Connection
Lab 6-2 and Lab 6-5 have you perform a Benford’s Law analysis to identify
outlier data.
Drill-Down
The most modern Data Analytics software allows auditors to drill down into specific values
by simply double-clicking a value. This lets you see the underlying transactions that gave
you the summary amount. For example, you might click the total sales amount in an income
statement to see the sales general ledger summarizing the daily totals. Click a daily amount
to see the individual transactions from that day.
ISTUDY
294 Chapter 6 Audit Data Analytics
including matching and nonmatching ones. Fuzzy matching finds matches that may be
less than 100 percent matching by finding correspondences between portions of the text or
other entries.
Fuzzy matching data requires that you have two tables/sheets with a common attribute,
such as a primary key/foreign key, name, or address.
Lab Connection
Lab 6-3 has filter data to identify duplicate payments based on matching
values.
Sequence Check
Another substantive procedure is the sequence check. This is used to validate data integ-
rity and test the completeness assertion, making sure that all relevant transactions are
accounted for. Simply put, sequence checks are useful for finding gaps, such as a missing
check in the cash disbursements journal, or duplicate transactions, such as duplicate pay-
ments to vendors. This is a fairly simple procedure that can be deployed quickly and easily
with great success. Begin by sorting your data by identification number.
PROGRESS CHECK
5. A sequence check will help us to see if there is a duplicate payment to vendors.
Why is that important for the auditor to find?
6. Let’s say a company has nine divisions, and each division has a different check
number based on its division—so one starts with “1,” another with “2,” and so on.
Would Benford’s law work in this situation?
ISTUDY
Chapter 6 Audit Data Analytics 295
Regression
Regression allows an auditor to predict a specific dependent value based on various inde-
pendent variable inputs. In other words, based on the level of various independent variables,
we can establish an expectation for the dependent variable and see if the actual outcome is
different from that predicted. In auditing, for example, we could evaluate overtime booked
for workers against productivity or the value of inventory shrinkage given environmental
factors.
Regression might also be used for the auditor to predict the level of their client’s allow-
ance for doubtful accounts receivable, given various macroeconomic variables and bor-
rower characteristics that might affect the client’s customers’ ability to pay amounts owed.
The auditor can then assess if the client’s allowance is different from the predicted amount.
Classification
Classification in auditing is going to be mainly focused on risk assessment. The predicted
classes may be low risk or high risk, where an individual transaction is classified in either
group. In the case of known fraud, auditors would classify those cases or transactions as
fraud/not fraud and develop a classification model that could predict whether similar trans-
actions might also be potentially fraudulent.
There is a longstanding classification method used to predict whether a company is
expected to go bankrupt or not. Altman’s Z is a calculated score that helps predict bank-
ruptcy and might be useful for auditors to evaluate a company’s ability to continue as a
going concern.2 Beneish (1999) also has developed a classification model predicting firms
that have committed financial statement fraud, that might be used by auditors to help detect
fraud in their financial statement audits.3
When using classification models, it is important to remember that large training sets
are needed to generate relatively accurate models. Initially, this requires significant manual
classification by the auditors or business process owner so that the model can be useful for
the audit.
Probability
When talking about classification, the strength of the class can be important to the auditor,
especially when trying to limit the scope (e.g., evaluate only the 10 riskiest transactions).
Classifiers that use a rank score can identify the strength of classification by measuring the
distance from the mean. That rank order focuses the auditor’s efforts on the items of poten-
tially greatest significance, where additional substantive testing might be needed.
Sentiment Analysis
By evaluating text of key financial documents (e.g., 10-K or annual report), an auditor could
see if positive or negative sentiment in the text is predictive of positive or negative out-
comes. Such analysis may reveal a bias from management. Is management trying to influ-
ence current or potential investors by the sentiment expressed in their financial statements,
management discussion, analysis (as part of their 10-K filing), or in conference calls? Is
management too optimistic or pessimistic because they have stock options or bonuses on
the line?
2
dward Altman, “Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy,”
E
The Journal of Finance 23, no. 4 (1968), pp. 589–609.
3
M. D. Beneish, “The Detection of Earnings Manipulation,” Financial Analysts Journal 55, no. 5 (1999),
pp. 24–36.
ISTUDY
296 Chapter 6 Audit Data Analytics
Such sentiment analysis may affect an auditor’s assessment of audit risk or alert them to
management’s aggressiveness. There is more discussion on sentiment analysis in Chapter 8.
Applied Statistics
Additional mixed distributions and nontraditional statistics may also provide insight to the
auditor. For example, an audit of inventory may reveal errors in the amount recorded in
the system. The difference between the error amounts and the actual amounts may provide
some valuable insight into how significant or material the problem may be. Auditors can
plot the frequency distribution of errors and use Z-scores to hone in on the cause of the
most significant or outlier errors.
Artificial Intelligence
As the audit team generates more data and takes specific action, the action itself can be
modeled in a way that allows an algorithm to predict expected behavior. Artificial intel-
ligence is designed around the idea that computers can learn about action or behavior from
the past and predict the course of action for the future. Assume that an experienced auditor
questions management about the estimate of allowance for doubtful accounts. The human
auditor evaluates a number of inputs, such as the estimate calculation, market factors,
and the possibility of income smoothing by management. Given these inputs, the auditor
decides to challenge management’s estimate. If the auditor consistently takes this action
and it is recorded by the computer, the computer learns from this action and makes a rec-
ommendation when a new inexperienced auditor faces a similar situation.
Decision support systems that accountants have relied upon for years (e.g., TurboTax)
are based on a formal set of rules and then updated based on what the user decides given
several choices. Artificial intelligence can be used as a helpful assistant to auditors and may
potentially be called upon to make judgment decisions itself.
Additional Analyses
The list of Data Analytics presented in this chapter is not exhaustive by any means. There
are many other approaches to identifying interesting patterns and anomalies in enterprise
data. Many ingenious auditors have developed automated scripts that can simplify several
of the audit tasks presented here. Excel add-ins like TeamMate Analytics provide many
different techniques that apply specifically to the audit of fixed assets, inventory, sales and
purchase transactions, and so on. Auditors will combine these tools with other techniques,
such as periodically testing the effectiveness of automated tools by adding erroneous or
fraudulent transactions, to enhance their audit process.
PROGRESS CHECK
7. Why would a bankruptcy prediction be considered classification? And why would
it be useful to auditors?
8. If sentiment analysis is used on a product advertisement, would you guess the
overall sentiment would be positive or negative?
ISTUDY
Summary
This chapter discusses a number of analytical techniques that auditors use to gather insights
about controls and transaction data. (LO 6-1)
These include:
■ Descriptive analytics that are used to summarize and gain insight into the data. For
example, by analyzing the aging of accounts receivable through descriptive analytics an
auditor can determine if they are fairly stated. (LO 6-2)
■ Diagnostic analytics that identify patterns in the data that may not be immediately obvi-
ous. For example, Benford’s law allows you to look at a naturally occurring set of num-
bers and identify if there are potential anomalies. (LO 6-3)
■ Predictive analytics that look for common attributes of problematic data to help identify
similar events in the future. For example, using classification an auditor can better pre-
dict if a firm is more closely related to firms that have historically gone bankrupt versus
those that have not. This may influence the auditor’s decision of where to audit and
whether there are going concern issues. (LO 6-4)
■ Prescriptive analytics that provide decision support to auditors as they work to resolve
issues with the processes and controls. For example, decision support systems allow
auditors to employ rules to scenarios for more thorough and objective guidance in their
auditing. (LO 6-4)
Key Words
Benford’s law (292) The principle that in any large, randomly produced set of natural numbers, there is
an expected distribution of the first, or leading, digit with 1 being the most common, 2 the next most, and
down successively to the number 9.
computer-assisted audit techniques (CAATs) (286) Automated scripts that can be used to validate
data, test controls, and enable substantive testing of transaction details or account balances and generate
supporting evidence for the audit.
descriptive analytics (286) Procedures that summarize existing data to determine what has happened
in the past. Some examples include summary statistics (e.g., Count, Min, Max, Average, Median), distribu-
tions, and proportions.
diagnostic analytics (286) Procedures that explore the current data to determine why something has
happened the way it has, typically comparing the data to a benchmark. As an example, these allow users to
drill down in the data and see how they compare to a budget, a competitor, or trend.
fuzzy matching (287) Process that finds matches that may be less than 100 percent matching by finding
correspondences between portions of the text or other entries.
predictive analytics (286) Procedures used to generate a model that can be used to determine what is
likely to happen in the future. Examples include regression analysis, forecasting, classification, and other
predictive modeling.
prescriptive analytics (287) Procedures that work to identify the best possible options given constraints
or changing conditions. These typically include developing more advanced machine learning and artificial
intelligence models to recommend a course of action, or optimize, based on constraints and/or changing
conditions.
process mining (286) Analysis technique of business processes used to diagnose problems and suggest
improvements where greater efficiency may be applied.
t-test (290) A statistical test used to determine if there is a significant difference between the means of
two groups, or two datasets.
297
ISTUDY
ANSWERS TO PROGRESS CHECKS
1. Descriptive analytics summarize activity by computing basic descriptive statistics such
as means, medians, minimums, maximums, and standard deviations. Diagnostic analytics
compare variables or data items to each other and try to find co-occurrence or correla-
tion to find patterns of interest. Both of these approaches look at historic data. An auditor
might use descriptive analytics to understand what they are auditing and diagnostic ana-
lytics to determine whether there is risk of misstatement based on the expected value or
why the numbers are they way they are.
2. Use of a dashboard to highlight and communicate findings will help identify alarms for
issues that are occurring on a real-time basis. This will allow issues to be addressed
immediately.
3. By computing minimum values or by sorting, you can find the lowest reported value and,
thus, potential negative numbers that might have been entered erroneously into the sys-
tem and require further investigation.
4. Descriptive analytics address the questions of “What has happened?” or “What is happen-
ing?” Summary statistics (mean, mode, median, max, min, etc.) give the auditor a view of
what has occurred that may facilitate further analysis.
5. Duplicate payments to vendors suggest that there is a gap in the internal controls around
payments. After the first payment was made, why did the accounting system allow a sec-
ond payment? Were both transactions authorized? Who signed the checks or authorized
payments? How can we prevent this from happening in the future?
6. Benford’s law works best on naturally occurring numbers. If the company dictates the first
number of its check sequence, Benford’s law will not work the same way and thus would
not be effective in finding potential issues with the check numbers.
7. Bankruptcy prediction predicts two conditions for a company: bankrupt or not bankrupt.
Thus, it would be considered a classification activity. Auditors are required to assess a cli-
ent’s ability to continue as a going concern and the bankruptcy prediction helps with that.
8. Most product advertisements are very positive in nature and would have positive
sentiment.
1. (LO 6-1) Which items would be currently out of the scope of Data Analytics?
a. Direct observation of processes
b. Evaluation of time stamps to evaluate workflow
c. Evaluation of phantom vendors
d. Duplicate payment of invoices
2. (LO 6-2) Which audit technique is used to test completeness?
a. Benford’s law
b. Sequence check
c. Summary statistics
d. Drill-down
298
ISTUDY
3. (LO 6-3) Benford’s law suggests that the first digit of naturally occurring numerical data-
sets follow an expected distribution where:
a. the leading digit of 4 is more common than 3.
b. the leading digit of 9 is more common than 2.
c. the leading digit of 8 is more common than 9.
d. the leading digit of 6 is more common than 5.
4. (LO 6-1) The determinants for sample size include all of the following except:
a. confidence level.
b. tolerable misstatement.
c. potential risk of account.
d. estimated misstatement.
5. (LO 6-1) CAATs are automated scripts that can be used to validate data, test controls,
and enable substantive testing of transaction details or account balances and generate
supporting evidence for the audit. What does CAAT stand for?
a. Computer-aided audit techniques
b. Computer-assisted audit techniques
c. Computerized audit and accounting techniques
d. Computerized audit aids and tests
6. (LO 6-2, 6-3, 6-4) Which type of audit analytics might be used to find hidden patterns or
variables linked to abnormal behavior?
a. Prescriptive analytics
b. Predictive analytics
c. Diagnostic analytics
d. Descriptive analytics
7. (LO 6-3) What describes finding correspondences between at least two types of text or
entries that may not match perfectly?
a. Incomplete linkages
b. Algorithmic matching
c. Fuzzy matching
d. Incomplete matching
8. (LO 6-4) Which testing approach would be used to predict whether certain cases should
be evaluated as having fraud or no fraud?
a. Classification
b. Probability
c. Sentiment analysis
d. Artificial intelligence
9. (LO 6-4) Which testing approach would be useful in assessing the value of inventory
shrinkage given multiple environmental factors?
a. Probability
b. Sentiment analysis
c. Regression
d. Applied statistics
10. (LO 6-3) What type of analysis would help auditors find missing checks?
a. Sequence check
b. Benford’s law analysis
c. Fuzzy matching
d. Decision support systems
299
ISTUDY
Discussion and Analysis
®
1. (LO 6-1) How do nature, extent, and timing of audit procedures help us identify when to
apply Data Analytics to the audit process?
2. (LO 6-1) When do you believe Data Analytics will add value to the audit process? How
can it most help?
3. (LO 6-3) Using Table 6-2 as a guide, compare and contrast predictive and prescriptive
analytics. How might these be used in an audit? Or a continuous audit?
4. (LO 6-4) Prescriptive analytics rely on models based on past actions to suggest recom-
mended actions for new, similar situations. For example, auditors might review manag-
ers’ approval of new credit applications for inactive customers. If auditors know the
variables and values that were common among past approvals and denials, they could
compare the action recommended by the model with the response of the manager.
How else might this prescriptive analytics help auditors assess risk or test audit issues?
5. (LO 6-2) One type of descriptive analytics is simply sorting data. Why is seeing extreme
values helpful (minimums, maximums, counts, etc.) in evaluating accuracy and com-
pleteness and in potentially finding errors and fraud and the like?
Problems
®
1. (LO 6-1) Match the analytics type (descriptive, diagnostic, predictive, or prescriptive) to
each analytics technique.
2. (LO 6-1) Match the analytics type (descriptive, diagnostic, predictive, or prescriptive) to
each analytics technique.
3. (LO 6-3) Match the diagnostic analytics questions to the following diagnostic analytics
techniques.
• Z-score
• t-Test
• Benford’s law
• Drill-down
300
ISTUDY
• Fuzzy matching
• Sequence check
4. (LO 6-3) Match the analytics question to the following predictive and prescriptive ana-
lytics techniques.
• Classification
• Regression
• Probability
• Sentiment analysis
• What-if analysis
• Artificial Intelligence
5. (LO 6-2) One type of descriptive analytics is age analysis, which shows how old open
accounts receivable and accounts payable are. How would age analysis be useful in the
following situations? Select whether these situations would facilitate Account manage-
ment, Continuous auditing, or Manual auditing.
Situation Type
1. Auditors receive alerts when aging buckets
go outside a set range.
2. Auditor tests the aging values and deter-
mines whether they are appropriate.
3. Owners determine whether certain ac-
counts should be written off.
301
ISTUDY
6. (LO 6-2) Analysis: Why are auditors particularly interested in the aging of accounts
receivable? How does this analysis help evaluate management judgment on collectabil-
ity of receivables? Would a dashboard item reflecting this aging be useful in a continu-
ous audit?
7. (LO 6-1) Analysis: One of the benefits of Data Analytics is the ability to see and test the
full population. In that case, why is sampling still used, and how is it useful?
8. (LO 6-3) Analysis: How is a Z-score greater than 3.0 (or –3.0) useful in finding extreme
values? What type of analysis should we do when we find extreme or outlier values?
9. (LO 6-3) What are some patterns that could be found using diagnostic analysis?
10. (LO 6-2, 6-3) In a certain company, one accountant records most of the adjusting jour-
nal entries at the end of the month. What type of analysis could be used to identify that
this happens, as well as the cumulative size of the transactions the accountant records?
11. (LO 6-3) Which distributions would you recommend be tested using Benford’s law?
12. (LO 6-3) Analysis: What would a Benford’s law evaluation of sales transaction amounts
potentially show? What would a test of vendor numbers or employee numbers show?
Anything different from a test of invoice or check numbers? Are there any cases where
Benford’s law wouldn’t work?
302
ISTUDY
13. (LO 6-4) Which of the following methods illustrate the use of artificial intelligence to
evaluate the allowance for doubtful accounts?
14. (LO 6-4) Analysis: How could artificial intelligence be used to help with the evalua-
tion of the estimate for the allowance for doubtful accounts? Could past allowances be
tested for their predictive ability that might be able to help set allowances in the current
period?
15. (LO 6-4) Analysis: How do you think sentiment analysis of the 10-K might assess the
level of bias (positive or negative) of the annual reports? If management is too positive
about the results of the company, can that be viewed as being neutral or impartial?
303
ISTUDY
LABS ®
304
ISTUDY
Tableau | Desktop
305
ISTUDY
Rev. Confirming Pages
306
1. Columns:
a. Purchase_Order_ID
b. Purchase_Order_Date > Click the drop-down and select
Purchase_Order_Date.
c. Entered_By
d. Supplier_Account_Name
e. Supplier_Group
f. Purchase_Order_Amount_Local
f. Finally, use the Format visual (paintbrush) icon to give each visual a
friendly Title, X-axis title, Y-axis title, and Legend Name.
3. Take a screenshot (label it 6-1MB).
4. When you are finished answering the lab questions, continue to the next part.
Save your file as Lab 6-1 OK PCard Audit Dashboard.pbix.
Tableau | Desktop
307
308
ISTUDY
Rev. Confirming Pages
1. Open the Lab 6-1 OK PCard Audit Dashboard.pbix file from Part 1 if it’s not
already.
2. On your dashboard, resize the top two visualizations so you have space for
a new third visual on the top-right corner of your page (click each card and
drag the edges or bottom-right corner to resize).
a. Add a scatter chart to your report to show the total purchases and
orders by individual:
1. Click the Scatter Chart visualization to add it to your report.
2. In the Fields pane, drag the following values to the appropriate
fields, then click the drop-down menu next to each to set the
summary measure (e.g., Sum, Average, Count):
a. X-axis: Purchase_Order_ID > Count
b. Y-axis: Purchase_Order_Amount_Local > Sum
c. Values: Entered_By
309
Tableau | Desktop
1. Open the Lab 6-1 OK PCard Audit Dashboard.twb file from Part 1 if it’s not
already.
2. Create a new worksheet with a scatter plot to show the groupings of interest
rate by average loan amount and average debt-to-income ratio:
a. Rename the sheet Purchase Clusters.
b. Columns: Purchase_Order_ID > Measure > Count
c. Rows: Purchase Order Amount Local > Measure > Sum
d. Marks > Detail: Entered By
3. Finally, add clusters to your model:
a. Click the Analytics tab on the left side of the screen.
b. Drag Cluster onto your scatter plot.
c. Set the number of clusters to 6 and close the pane.
4. Return to your dashboard and add your new clusters page to the top-right
corner and resize the other two visualizations so there are now three across
the top. Tip: Change the dashboard size from Fixed Size to Automatic if it
feels cramped on your screen.
5. Click the Purchase Clusters visual on your dashboard and click the funnel
icon to Use as Filter.
6. Click on one of the individuals to filter the data on the remaining visuals.
7. Take a screenshot (label it 6-1TC) of your chart and data.
8. Answer the lab questions, and then close Tableau. Save your worksheet as 6-1
OK PCard Audit Dashboard.twb.
310
ISTUDY
Lab 6-1 Part 2 Objective Questions (LO 6-1, 6-3)
OQ1. In the clusters chart, click the individual (entered by) with the highest purchase
amount. What is the individual’s name?
OQ2. In the clusters chart, click the individual with the highest purchase amount.
What category or supplier group received the most purchases?
OQ3. In the clusters chart, click the individual with the highest number of
transactions. What is the individual’s name?
OQ4. In the clusters chart, click the individual with the highest number of trans-
actions. What is the category or supplier group with the highest number of
transactions?
311
ISTUDY
Microsoft | Power BI Desktop
Tableau | Desktop
312
ISTUDY
Lab 6-2 Applying Benford’s Law to Purchase
Transactions and Individuals
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 6-2 [Your name] [Your email address].docx.
Start by pulling out the first digits and calculating the frequency distribution. Benford’s
law has an expected distribution of values as a percentage of the total. For example, values
starting with a 1 are estimated to occur about 30 percent of the time. We’ll compare the
frequency distribution of actual transactions to the expected distribution in this lab.
Taking Benford’s law a step further, we can calculate the average expected value by indi-
vidual to determine whether they are likely to have a lot of transactions that start with 7, 8,
and 9, or whether they meet the standard distribution. The overall average of the expected
Benford’s law values is 11.11 percent, assuming an even distribution of transactions start-
ing with 1 through 9. If an individual’s transactions average is higher than 11 percent, they
will tend to have more 1, 2, and 3 leading digits. If the individual average is lower than 11
percent, they will tend to have more 7, 8, and 9 leading digits. In this dashboard we’ll look
specifically at those individuals with low averages.
313
ISTUDY
Rev. Confirming Pages
d. Set the correct data types for your new columns (Transform > Data Type):
1. Leading_Digit > Whole Number
2. Benford_Expected > Percentage
3. Purchase_Order_Amount_Local > Decimal
e. Click Home > Close & Apply to return to Power BI.
3. Go to Page 1 (rename it Dashboard) and create the following visualizations:
a. Add a slicer to act as a filter and move it to the right side of the screen:
1. Field: Business_Unit_Description
a. Click the menu in the top corner of the slicer and choose Dropdown.
b. Choose DEPARTMENT OF TRANSPORTATION from the Busi-
ness_Unit_Description list.
b. Add a table with the Purchase Details showing the transaction details and
move it to the bottom of your page so it fills the full width of the page:
1. Columns:
a. Purchase_Order_ID
b. Purchase_Order_Date > Purchase_Order_Date
c. Entered_By
d. Supplier_Account_Name
e. Supplier_Group
f. Purchase_Order_Amount_Local
c. Add a line and clustered column chart for your Benford’s law analysis
and have it fill the remaining space at the top of the page:
1. X-axis: Leading_Digit
2. Click the three dots on the card, then Sort by > Leading Digit, then
Sort Ascending.
3. Column Y-axis: Purchase_Order_Amount_Local > Count > Show
value as > Percent of Grand Total
4. Line Y-axis: Benford_Expected > Minimum
5. Click the Format visual (paintbrush) icon:
a. Visual > X-axis: Type > Categorical
b. Visual > Lines > Shapes Stepped > On
d. Take a screenshot (label it 6-2MA).
e. Resize the Benford’s analysis chart so it fills the top-left corner of the page.
f. Click in the blank space and add a new Stacked Bar Chart in the top-
right corner:
1. Y-axis: Entered_By
2. X-axis: Benford_Expected > Average
3. Tooltips: Purchase_Order_ID > Count
4. Sort > Ascending
g. Take a screenshot (label it 6-2MB) of your dashboard.
h. For each visual, click the Format visual (paintbrush) icon and add a
friendly title.
4. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 6-2 OK PCard Benford.pbix.
314
315
ISTUDY
a. Value: MIN(Benford Expected)
b. Click OK.
f. Take a screenshot (label it 6-2TA).
4. Click Worksheet > New worksheet to add a second visual showing the aver-
age Benford Expected by individual. Name the sheet Benford Individual:
a. Columns: Benford Expected > Measure > Average
b. Rows: Entered By > Sort Ascending
c. Marks > Text: Purchase Order ID > Measure > Count
5. Create a final worksheet showing the transaction details. Name the sheet
Purchase Details:
a. Rows:
1. Purchase_Order_ID
2. Purchase_Order_Date > Attribute
3. Entered_By
4. Supplier_Account_Name
5. Supplier_Group
b. Marks > Text: Purchase_Order_Amount_Local
c. Click the Analytics tab, then drag Totals to your table as Column Grand
Totals.
6. Create a new Dashboard tab called Benfords Analysis.
a. Drag each of the three visualizations you created above into your
dashboard from the pane on the left. Place Benford in the top-left
corner, Benford Average in the top-right corner, and Purchase Details
along the entire bottom.
b. Click the Benford visual and click the Filter icon to set Use as filter.
c. Click the Benford Average visual and click the Filter icon to set Use as
filter.
7. Take a screenshot (label it 6-2TB) of your dashboard.
8. When you are finished answering the lab questions, close your workbook
and save your file as Lab 6-2 OK PCard Audit Dashboard.twb.
316
ISTUDY
OQ5. Hover over the individuals in the Benford Average visualization. Click the bar
for the first individual with more than 10 transactions to filter the dashboard.
What digit do most of that individual’s transactions begin with?
317
ISTUDY
Tableau | Desktop
318
ISTUDY
Rev. Confirming Pages
d. In Power Query, change the data types of the following fields to Text
(click the column, then click Transform > Data Type > Text; if prompt-
ed, click Replace Current):
1. Payment ID
2. Payment Account
3. Invoice Reference
4. Prepared By
5. Approved By
e. While you’re still in Power Query, add two new columns to determine
whether a transaction is a duplicate or not:
1. Click Add Column > Custom Column to combine the invoice number
and payment amount into a single value:
a. New column name: Invoice_Payment_Combo
b. Custom column formula: = [Invoice_Reference]&Text.
From([Payment_Amount])
c. Note: This formula converts the payment amount to text using
the Text.From function, then combines it with the Invoice
Reference to create a single value.
d. Click OK.
2. Click Transform > Group By to identify the count of duplicates:
a. Click Advanced.
b. Field grouping: Invoice_Payment_Combo
c. New column name: Duplicate_Count
d. Operation: Count Rows
e. Click Add aggregation.
f. New column name: Detail
g. Operation: All Rows
h. Click OK.
3. Finally, expand the Detail table. Click the expand (double arrow)
icon in the Detail column header.
4. Uncheck Use original column name as prefix and click OK.
f. Click Home > Close & Apply to return to Power BI.
2. Go to Page 1 (rename it Duplicates) and create a new table:
a. Columns:
1. Payment ID
2. Payment Date > Payment_Date
3. Prepared By
4. Invoice Reference
5. Payment Amount
6. Duplicate_Count
3. Take a screenshot (label it 6-3MA).
319
Tableau | Desktop
320
ISTUDY
3. Prepared By
4. Invoice Reference
5. Duplicate?
b. Marks > Text: Payment Amount
4. Take a screenshot (label it 6-3TA).
5. Now filter your values to only show the duplicates:
a. Right-click the Duplicate? field and choose Show Filter.
b. Uncheck No in the filter pane.
6. Take a screenshot (label it 6-3TB).
7. Answer the lab questions and then close Tableau. Save your file as Lab 6-3
Slainte Duplicates.twb.
321
ISTUDY
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.
Microsoft Excel
LAB 6-4M Example of a Random Sample in Microsoft Power BI Desktop
322
ISTUDY
Microsoft | Power BI Desktop
323
ISTUDY
Rev. Confirming Pages
4. In the Power Query Editor, you can view how your sample is distributed.
a. From the View tab in the ribbon, place a check mark next to Column
distribution and Column profile.
b. Click through the columns to see summary statistics for each column.
c. Click the TRAN_DATE column and take a screenshot (label it
6-4MA).
5. From the Home tab in the ribbon, click Close & Apply to create a report.
This step may also take several minutes to run.
6. Create a line chart showing the total sales revenue by day:
a. In the Visualizations pane, click Line chart.
b. Drag TRAN_DATE to the X-axis, then click the drop-down to change it
from the Date Hierarchy to TRAN_DATE.
c. Drag TRAN_AMT to the Y-axis.
7. Resize the visualization (or click Focus Mode) to see the line chart more
clearly.
8. Take a screenshot (label it 6-4MB).
9. Keep in mind that, because this is a random sample, no two samples will
be identical, so if you refresh your query, you will get a new batch of 10,000
records.
324
325
ISTUDY
Lab 6-5 Example Output
By the end of this lab, you will create a sample of transactions from sales data. While your
results will include different data values, your work should look similar to this:
Microsoft Excel
LAB 6-5M Example Benford’s Analysis in Microsoft Excel + Power Query
Tableau | Desktop
326
ISTUDY
Lab 6-5 Part 1 Compare Actual Leading Digits to
Expected Leading Digits
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 6-5 [Your name] [Your email address].docx.
In Part 1 of this lab you will connect to the Dillard’s data and load them into either Excel
or Tableau Desktop. Then you will create a calculated field to isolate the leading digit from
every transaction. Finally, you will construct a chart to compare the leading digit distribu-
tion to that of the expected Benford’s law distribution.
327
ISTUDY
c. Drag and drop Leading Digit into the Rows, and then repeat that action
by dragging and dropping it into Values.
d. The values will likely default to a raw count of the number of transac-
tions associated with each leading digit. We’d rather view this as a
percentage of the grand total, so right-click one of the values in the
PivotTable, select Show Values As, and select % of Grand Total.
6. The next step is to compare these percentages to the expected values from
Benford’s law by first creating a new calculation column next to your
PivotTable, and then creating a Column Chart to visualize the comparison
between the expected and actual values.
a. In Cell C4 (the first free cell next to the count of the leading digits in
your PivotTable), create the logarithmic formula to calculate the Ben-
ford’s law value for each leading digit: =LOG(1+1/A4).
b. Copy the formula down the column to view the Benford’s law value for
each leading digit.
c. Copy and paste the values of the PivotTable and the new Benford’s values
to a different part of the spreadsheet. This will enable you to build a visu-
alization to compare the actual with the expected (Benford’s law) values.
d. Ensure that your cursor is in one of the cells in the new copied range of
data. From the Insert tab in the ribbon, select Recommended Charts and
then click the Clustered Column option.
7. After you answer the Objective and Analysis questions, continue to Lab 6-5
Part 2.
Tableau | Desktop
1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is; click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
e. Click OK.
3. Click Sheet 1 to create your calculated fields and visualize the comparison
between actual and expected values.
a. To isolate the leading digit, click Analysis > Create Calculated Field.
Tableau has text functions that help in isolating a certain number of
328
ISTUDY
characters in a referenced cell. For our purposes, we need to return
the first (or the left-most) digit of each transaction amount. You will
use a LEFT function to reference the TRAN_AMT field. The LEFT
function has two arguments, the first of which is the attribute refer-
ence, and the second which indicates how many characters you want
returned. Because some of the transaction amounts are less than a
dollar and Benford’s law does not take into account the magnitude of
the number (e.g., 1 versus 100), we will also multiply each transaction
amount by 100.
1. Title: Leading Digit
2. Calculation: left(str([TRAN_AMT]*100),1)
3. Click OK.
b. To create the Expected Benford’s Law values, click Analysis > Create
Calculated Field again to create a new field.
1. Title: Benford’s Law
2. Calculation: LOG(INT([Leading Digit])+1)-LOG(INT([Leading
Digit]))
3. Click OK.
c. To create the Column Chart:
1. Columns: Leading Digit and Benford’s Law
a. Right-click the SUM(Benford’s Law) pill and select Measure
(Sum) > Minimum.
2. Rows: Custom SQL Query (Count)
a. Right-click the CNT(Custom SQL Query) pill and select Quick
Table Calculation > Percent of Total.
3. From the Show Me tab, change the visualization to a Side by Side
Bar Chart.
4. Take a screenshot (label it 6-5TA).
5. Answer the lab questions, then continue to the next part.
329
ISTUDY
Lab 6-5 Part 2 Construct Fictitious Data and Assess It for
Outliers
The assumption of Benford’s law is that falsified data would not conform to Benford’s law.
To test this, we can add a randomly generated dataset that is based on the range of transac-
tion amounts in your query results.
For the Microsoft Track, you will return to your Query output of the Excel workbook
to generate random values. For the Tableau Track, you cannot currently generate random
values in Tableau Prep or Tableau Desktop, so we have provided a spreadsheet of random
numbers to work with.
1. Return to your Query output in the Excel workbook (the spreadsheet will be
labeled Query1). The calculations in the next few steps may each take a few
minutes to run because you are working with a very large dataset.
2. In cell Q1, add a new column to your table by typing Random Digit and then
click Enter.
3. In cell Q2, enter =LEFT(RANDBETWEEN(1,100),1). This will generate a
random number between 1 and 100 for each row in the table and extract the
leading digit.
4. When you create a random number set in Excel, Excel will update the ran-
dom numbers every time you make a change to the spreadsheet. For this
reason, you need to replace the data in your new Random Number column
with values.
a. Select and Copy all of the values in your new Random Digit column.
b. From the Home tab, click on the bottom half of the Paste button and
select Paste Values, then click Enter. This replaces the formula in each
cell with only the resulting values.
5. Repeat steps 5 and 6 from Part 1 of this lab to compare the actual values from
your randomly generated number set to the expected Benford’s law values.
a. When you create your PivotTable, you may need to refresh the data
to see both of your new columns. In the PivotTable Analyze tab, click
Refresh, then click into the PivotTable placeholder to see your field list.
b. Even though you multiplied the random numbers by 100, some of the
randomly generated numbers were so small that they still have a lead-
ing digit of 0. When you create your logarithmic functions to view the
Benford’s law expected values, skip the row for the 0 values.
6. Take a screenshot of your range of data showing Expected and Actual val-
ues and the column chart (label it 6-5MB).
Tableau | Desktop
1. Open a new instance of Tableau Desktop and connect to the file Lab 6-5
Fictitious Data.xlsx.
330
ISTUDY
2. Repeat step 3 in Part 1 of this lab to create the Benford’s Law Expected Val-
ues and compare them to the fictitious files in the Excel workbook.
a. The count variable from Part 1 (Custom SQL Query (Count)) will be
labeled Sheet 1 (Count) in this instance instead.
3. Take a screenshot (label it 6-5TB).
331
ISTUDY
3. Take a screenshot (label it 6-5MC) of the table and the p-value.
4. Repeat steps 1 and 2 on your fictitious dataset to calculate the Chi-Squared
Test for that set of data.
5. Take a screenshot (label it 6-5MD) of the table and the p-value.
Tableau
There is no Tableau track for Part 3
332
ISTUDY
ISTUDY
Chapter 7
Managerial Analytics
A Look Back
In Chapter 6, we focused on substantive testing within the audit setting. We highlighted discussion of the audit plan,
and account balances were checked. We also highlighted the use of statistical analysis to find errors or fraud in the
audit setting. In addition, we discussed the use of clustering to detect outliers and the use of Benford’s analysis.
A Look Ahead
In Chapter 8, we will focus on how to access and analyze financial statement data. Through analysis of ratios and
trends we identify how companies appear to stakeholders. We also discuss how to analyze financial performance, and
how visualizations help find insight into the data. Finally, we discuss the use of text mining to analyze the sentiment
in financial reporting data.
334
ISTUDY
For years, Kenya Red Cross had attempted to refine its strategy and align its daily activities with its overall strategic goals.
It had annual strategic planning meetings with external consultants that always resulted in the consultants presenting a
new strategy to the organization that the Red Cross didn’t have a particularly strong buy-in to, and the Red Cross never
felt confident in what was developed or what it would mean for its future. When Kenya Red Cross went through a Data
Analytics–backed Balanced Scorecard planning process for the first time, though, it immediately felt like its organization’s
mission and vision were involved in the strategic planning and that “strategy” was no longer so vague. The Balanced
Scorecard approach helped the Kenya Red Cross align its goals into measurable metrics. The organization prided itself
on being “first in and last out” but hadn’t actively measured its success in that goal, nor had the organization fully ana-
lyzed how being the first in and last out of disaster scenarios affected other goals and areas of its organization.
Using Data Analytics to refine its strategy and assign measurable performance metrics to its goals, Kenya Red
Cross felt confident that its everyday activities were linked to measurable goals that would help the organization
reach its goals and maintain a strong positive reputation and impact through its service. Exhibit 7-1 gives an illustra-
tion of the Balanced Scorecard at the Kenya Red Cross.
OBJECTIVES
After reading this chapter, you should be able to:
LO 7-1 Explain how the IMPACT model applies to management accounting problems.
LO 7-2 Explain typical descriptive and diagnostic analytics in management accounting.
LO 7-3 Evaluate the use of KPIs as part of a Balanced Scorecard.
LO 7-4 Assess the underlying quality of data used in dashboards as part of
management accounting analytics.
LO 7-5 Understand how to address and refine results to arrive at useful
information provided to management and other decision makers.
335
ISTUDY
336 Chapter 7 Managerial Analytics
ISTUDY
Chapter 7 Managerial Analytics 337
Source: “The 7 Data Science Skills That Will Change the Accounting Career,” Origin World,
March 27, 2020, https://www.originworld.com/2020/03/27/7-data-science-skills-that-will-
change-accounting-career/, (accessed January 2, 2021).
Master Data
The data to address management accounting questions include both data from the financial
reporting system as well as data from (internal) operational systems (including manufactur-
ing (production), human resource, and supply chain data).
ISTUDY
338 Chapter 7 Managerial Analytics
indirect or overhead costs that have been used in the past, and might be useful for allocating
costs in the future.
Prescriptive analytics might be employed to perform marginal what-if analysis determin-
ing whether to own or lease a building. Goal-seek analysis might be employed to determine
break-even levels. Cash-flow analysis (or capital budgeting) might be used to decide which
investments will pay off using net present value or internal rate of return calculations.
ISTUDY
Chapter 7 Managerial Analytics 339
be used to test how the outputs might change when the underlying assumptions/estimates used
in the inputs vary. For example, if we assume the estimate on the cost of capital used in cash-
flow analysis changes, then we can see if the new results might affect the decision. After
addressing and refining results, we are ready to report the findings to management.
ISTUDY
340 Chapter 7 Managerial Analytics
Lab Connection
Lab 7-3 and Lab 7-4 have you compare sales performance with benchmarks
over time.
Variance analysis allows managers to evaluate the KPIs and how far they vary from
the expected outcome. For example, managers compare actual results to budgeted results
to determine whether a variance is favorable or unfavorable, similar to that shown in
Exhibit 7-3. The ability to use these types of bullet charts to not only identify the benchmark
but also to see the relative distance from the goal helps managers identify root causes of
the variance (e.g., the price we pay for a raw material or the increased volume of sales) and
drill down to determine the good performance to replicate and the poor performance to
eliminate.
EXHIBIT 7-3
Variance Analysis
Identifies Favorable and
Unfavorable Variances
Lab Connection
Lab 7-1 has you calculate job costs and variances.
Cost Behavior
Managers must also understand what is driving the costs and profits to plan for the future
and apply to budgets or use as input for lean accounting processes. For example, they must
evaluate mixed costs to predict the portion of fixed and variable costs for a given period. Pre-
dictive analytics, such as regression analysis, might evaluate actual production volume and
total costs to estimate the mixed cost line equation, such as the one shown in Exhibit 7-4.
ISTUDY
Chapter 7 Managerial Analytics 341
EXHIBIT 7-4
Regression Analysis of
Mixed Costs
This example was calculated using a scatter plot chart over a 12-month period in Excel.
The mixed costs can be interpreted as consisting of fixed costs of approximately $181,480
per month (the intercept) and variable costs of approximately $13.30 per unit produced.
The R2 value of 0.84 tells us that this line fits the data pretty well and will predict the correct
value 84 percent of the time.
Regression and other predictive techniques help managers identify outliers, anomalies,
and poor performers so they can act accordingly. They also rely on more observations so
the prediction is much more accurate than other rudimentary accounting calculations, such
as the high-low method. These same trend analyses inform the master budget from sales to
cash and can be combined with sensitivity or what-if analyses to predict a range of values.
PROGRESS CHECK
1. If a manager is trying to decide whether to discontinue a product or division, they
would look at the contribution margin of that object. What are some examples of
relevant data that would be useful in this calculation? Irrelevant data?
2. A bullet chart (as shown in Exhibit 7-3) uses a reference line to show actual per-
formance relative to a benchmark. What advantages does a bullet graph have
over a gauge, such as a fan with red, yellow, and green zones and a needle
pointing to the current value?
ISTUDY
342 Chapter 7 Managerial Analytics
The public dashboard detailing student diversity at the Walton College can be used by
prospective students to learn more about the university and by the university itself to assess
how it is doing in meeting goals. If the university has a goal of increasing gender balance
in enrollment, for example, then monitoring the “Diverse Walton” metrics, pictured in
Exhibit 7-5, can help the university understand how it is doing at reaching that goal.
EXHIBIT 7-5
Walton College Digital
Dashboard—Diverse
Walton
Digital dashboards provide interesting information, but their value is maximized when
the metrics provided on the dashboard are used to affect decision making and action. One
iteration of a digital dashboard is the Balanced Scorecard. The Balanced Scorecard was
created by Robert S. Kaplan and David P. Norton in 1996 to help companies turn their
strategic goals into action by identifying the most important metrics to measure, as well as
identifying target goals to compare metrics against.
The Balanced Scorecard is comprised of four components: financial (or stewardship),
customer (or stakeholder), internal process, and organizational capacity (or learning and
growth). As depicted in Exhibit 7-6, the measures in each category affect other categories,
and all four should be directly related to the strategic objectives of an organization.
For each of the four components, objectives, measures, targets, and initiatives are iden-
tified. Objectives should be aligned with strategic goals of the organization, measures are
the KPIs that show how well the organization is doing at meeting its objective, and tar-
gets should be achievable goals toward which to move the metric. Initiatives should be the
actions that an organization can take to move its specified metrics in the direction of their
stated target goal. Exhibit 7-7 is an example of different objectives that an organization
ISTUDY
Chapter 7 Managerial Analytics 343
EXHIBIT 7-6
Components of the
Balanced Scorecard
EXHIBIT 7-7
An Example of a
Balanced Scorecard
Reprinted with permission
from Balanced Scorecard
Institute, a Strategy
Management Group
Company. Copyright
2008–2017.
might identify for each component. You can see how certain objectives relate to other
objectives—for example, if the organization increases process efficiency (in the internal pro-
cess component row), that should help with the objective of lowering cost in the financial
component row.
Understanding how the four components interact to answer different types of questions
and meet different strategic goals is critical when it comes to identifying the right measures
to include in the dashboard, as well as using those measures to help with decision making.
Creating a Balanced Scorecard or any type of digital dashboard to present KPIs for decision
making follows the IMPACT model.
ISTUDY
344 Chapter 7 Managerial Analytics
Lab Connection
Lab 7-2 has you create a Balanced Scorecard dashboard to evaluate four KPIs.
1
https://www.linkedin.com/pulse/20130905053105-64875646-the-75-kpis-every-manager-needs-
to-know.
ISTUDY
Chapter 7 Managerial Analytics 345
PROGRESS CHECK
3. To illustrate what KPIs emphasize in “what gets measured, gets done,” Walmart
has a goal of a “zero waste future.”2 How does reporting Walmart’s waste recy-
cling rate help the organization figure out if it is getting closer to its goal? Do you
believe it helps the organization accomplish its goals?
4. How can management identify useful KPIs? How could Data Analytics help
with that?
ISTUDY
346 Chapter 7 Managerial Analytics
In addition to working through the same data request process that is detailed in Chapter 2,
there are two other questions to consider when obtaining data and evaluating their quality:
1. How often do the data get updated in the system? This will help you be aware of how
up-to-date your metrics are so that you interpret the changes over time appropriately.
2. Additionally, how often do you need to see updated data? If the data in the system are
updated on a near-real-time basis, it may not be necessary for you to have new updates
pushed to your scorecard as frequently. For example, if your team will assess their
progress only in a once-a-week meeting, there is no need to have a constantly updating
scorecard.
While the data for calculating KPIs are likely stored in the company’s enterprise system
or accounting information system, the digital dashboard containing the KPIs for data analy-
sis should be created in a data visualization tool, such as Power BI or Tableau. Loading the
data into these tools should be done with precision and should be validated to ensure the
data imported were complete and accurate.
Designing data visualizations and selecting the right way to express data (as whole
numbers, percentages, absolute values, etc.) was discussed in Chapter 4. Specifically for
digital dashboards, the format of your dashboard can follow the pattern of a Balanced
Scorecard with a strategy map, or it can take on a different format. Exhibit 7-9 shows a
template for building out the objectives, measures, targets, and initiatives into a Balanced
Scorecard format.
EXHIBIT 7-9 Business Objectives and Strategy Map Measures Targets Initiatives
Balanced Scorecard Financial (List 3–4 (List the
Strategy Map Template KPIs to initiatives that
(In each circle, list the objectives associated
support (Use the arrows are in line with
with Measures, with each different component)
to express if the
each helping the
Targets, and Initiatives component metric should organization
here) increase or meet the listed
decrease to meet targets)
the goal, and
then indicate by
how much)
Customer - - -
Internal - - -
Processes
Organizational - - -
Capacity
If the dashboard is not following the strategy map template, the most important KPIs
should be placed in the top-left corner, as our eyes are most naturally drawn to that part of
any page that we are reading.
Lab Connection
Lab 7-5 has you create a dashboard with advanced models to evaluate sales
performance.
ISTUDY
Chapter 7 Managerial Analytics 347
PROGRESS CHECK
5. How often would you need to see the KPI of Waste Recycling Rate to know if you
are making progress? Any different for the KPI of ROA?
6. Why does the location of individual visuals on a dashboard matter?
PROGRESS CHECK
7. Why are digital dashboards for KPIs an effective way to address and refine
results, as well as communicate insights and track outcomes?
8. Consider the opening vignette of the Kenya Red Cross. How do KPIs help the
organization prepare and carry out its goal of being the “first in and last out”?
ISTUDY
Summary
■ Management accountants must use descriptive analytics to understand and direct activ-
ity, diagnostic analytics to compare with a benchmark and control costs, predictive ana-
lytics to plan for the future, and prescriptive analytics to guide their decision process.
(LO 7-1)
■ Relevant costs and data help inform decisions, variance analysis and bullet graphs help
determine where the company is, and regression helps managers understand and predict
costs. (LO 7-2)
■ Because data are increasingly available and affordable for companies to access and
store, and because the growth in technology has created robust and affordable business
intelligence tools, data and information are becoming the key components for decision
making, replacing gut response. (LO 7-2)
■ Performance metrics are defined, compiled from the data, and used for decision mak-
ing. A specific type of performance metrics, key performance indicators (KPIs)—or
“key” metrics that influence decision making and strategy—are the most important.
(LO 7-3)
■ One of the most common ways to communicate a variety of KPIs is through a digi-
tal dashboard. A digital dashboard is an interactive report showing the most important
metrics to help users understand how a company or an organization is performing.
Its value is maximized when the metrics provided on the dashboard are used to affect
decision making and action. (LO 7-3)
■ One iteration of a digital dashboard is the Balanced Scorecard, which is used to help
companies turn their strategic goals into action by identifying the most important
metrics to measure, as well as identifying target goals to compare metrics against. The
Balanced Scorecard is comprised of four components: financial (or stewardship), cus-
tomer (or stakeholder), internal process, and organizational capacity (or learning and
growth). (LO 7-3, 7-4)
■ For each of the four components, objectives, measures, targets, and initiatives are iden-
tified. Objectives should be aligned with strategic goals of the organization, measures
are the KPIs that show how well the organization is doing at meeting its objective, and
targets should be achievable goals toward which to move the metric. Initiatives should be
the actions that an organization can take to move its specified metrics in the direction of
its stated target goal. (LO 7-3, 7-4)
■ Regardless of whether you are creating a Balanced Scorecard or another type of digital
dashboard to showcase performance metrics and KPIs, the IMPACT model should be
used to complete the project. (LO 7-3, 7-4, 7-5)
Key Words
Balanced Scorecard (342) A particular type of digital dashboard that is made up of strategic objectives,
as well as KPIs, target measures, and initiatives, to help the organization reach its target measures in line
with strategic goals.
digital dashboard (341) An interactive report showing the most important metrics to help users
understand how a company or an organization is performing. Often created using Excel or Tableau.
key performance indicator (KPI) (339) A particular type of performance metric that an organization
deems the most important and influential on decision making.
performance metric (339) Any calculation measuring how an organization is performing, particularly
when that measure is compared to a baseline.
348
ISTUDY
ANSWERS TO PROGRESS CHECKS
1. The contribution margin includes the revenues and variable costs that are traceable to
that division or product. Those data would be relevant. Other relevant data may be the
types of customers and sentiment toward the product, products that are sold in conjunc-
tion with that product, or market size. Shared or allocated costs would not be relevant.
2. A bullet graph uses a small amount of space to evaluate a large number of metrics.
Gauges are more visually engaging and easier to understand, but waste a lot of space.
3. If waste reduction is an important goal for Walmart, having a KPI and, potentially, a digital
dashboard that reports how well the organization is doing will likely be useful in helping
accomplish its goal. Using a digital dashboard helps an organization to see if it is making
progress.
4. The KPIs that are the most helpful are those that are consistent with the company’s strategy
and measure how well the company is doing in meeting its goals. Data Analytics will help
gather and report the necessary data to report on the KPIs. The Data A nalytics IMPACT
model introduced in Chapter 1—from identifying the question to tracking outcomes—will
be helpful in getting the necessary data.
5. The frequency of updating KPIs is always a good question. One determinant will be how
often the data get updated in the system, and the second determinant is how often the
data will be considered by those looking at the data. Whichever of those two determi-
nants takes longer is probably the correct frequency for updating KPIs.
6. The most important KPIs should be placed in the top-left corner because our eyes are
most naturally drawn to that part of any page that we are reading
7. By identifying the KPIs that are most important to corporate strategy and finding the nec-
essary data to support them and then reporting on them in a digital dashboard, deci-
sion makers will have the necessary information to make effective decisions and track
outcomes.
8. As noted in the opening vignette, using Data Analytics to refine its strategy and assign
measurable performance metrics to its goals, Kenya Red Cross felt confident that its
everyday activities were linked to measurable goals that would help the organization
reach its goals and maintain a strong positive reputation and impact through its service.
1. (LO 7-1) Which of the following would not be considered a prescriptive analytics
technique?
a. Sensitivity Analysis Evaluating Assumptions of Future Performance
b. Crosstabulation Analyzing Past Performance
c. Breakeven Level in Sales
d. Capital Budgeting
2. (LO 7-3) What would you consider to be an operational KPI?
a. Inventory Shrinkage Rate
b. Brand Equity
c. CAPEX to Sales Ratio
d. Revenue per Employee
349
ISTUDY
3. (LO 7-3) What does KPI stand for?
a. Key performance index
b. Key performance indicator
c. Key paired index
d. Key paired indicator
4. (LO 7-4) The most important KPIs should be placed in the corner of the page
even if we are not following a strategy map template.
a. bottom right
b. bottom left
c. top left
d. top right
5. (LO 7-4) According to the text, which of these are not helpful in refining a dashboard?
a. Which metric are you using most frequently to help you make decisions?
b. Are you downloading the data to do any additional analysis after working with the
dashboard, and if so, can the dashboard be improved to save those extra steps?
c. Are there any metrics that you do not use? If so, why aren’t they helpful?
d. Which data are the easiest to access or least costly to collect?
6. (LO 7-4) On a Balanced Scorecard, which is not included as a component?
a. Financial Performance
b. Customer/Stakeholder
c. Internal Process
d. Employee Capacity
7. (LO 7-4) Which of the following would be considered to be a diagnostic analytics tech-
nique in managerial accounting?
a. Summary Statistics
b. Computation of Job Order Costing
c. Price and Rate Variance Analysis
d. Sales Forecasts
8. (LO 7-3) What is defined as an interactive report showing the most important metrics to
help users understand how a company or an organization is performing?
a. KPI
b. Performance metric
c. Digital dashboard
d. Balanced Scorecard
9. (LO 7-1) What would you consider to be a prescriptive analytics technique in manage-
ment accounting?
a. Computation of KPIs
b. Capital Budgeting
c. Comparison of Actual Performance to Budgeted Performance
d. Cash Flow Forecasts
10. (LO 7-2) What would you consider to be a diagnostic analytics technique in manage-
ment accounting?
a. Computation of Rate Variances
b. Sales Forecasts based on Time Series Analysis
c. Breakeven Level in Sales
d. Computation of Product Sales in Prior Period
350
ISTUDY
Discussion and Analysis
®
1. (LO 7-1) In the article “The 7 Data Science Skills That Will Change the Accounting
Career,” Robert Hernandez suggests that two critical skills are (1) revenue analytics and
(2) optimizing costs and revenues. Would these skills represent descriptive, diagnostic,
predictive, or prescriptive analytics? Why?
2. (LO 7-3, 7-4) We know that a Balanced Scorecard is comprised of four components:
financial (or stewardship), customer (or stakeholder), internal process, and organiza-
tional capacity (or learning and growth). What would you include in a dashboard for the
financial and customer components?
3. (LO 7-3, 7-4) We know that a Balanced Scorecard is comprised of four components:
financial (or stewardship), customer (or stakeholder), internal process, and organiza-
tional capacity (or learning and growth). What would you include in a dashboard for the
internal process and organizational capacity components? How do digital dashboards
make KPIs easier to track?
4. (LO 7-3) Amazon, in our opinion, has cared less about profitability in the short run but has
cared about gaining market share. Arguably, Amazon gains market share by taking care
of the customer. Given the 75 KPIs that every manager needs to know in Exhibit 7-8, what
would be a natural KPI for the customer aspect for Amazon?
5. (LO 7-3) For an accounting firm like PwC, how would the Balanced Scorecard help bal-
ance the desire to be profitable for its partners with keeping the focus on its customers?
6. (LO 7-3) For a company like Walmart, how would the Balanced Scorecard help balance
the desire to be profitable for its shareholders with continuing to develop organizational
capacity to compete with Amazon (and other online retailers)?
7. (LO 7-3) Why is Customer Retention Rate a great KPI for understanding Tesla’s
customers?
8. (LO 7-5) Assuming you have access to data that are updated in real time, are there situ-
ations when you would not want to update your digital dashboard in real time? Why or
why not?
9. (LO 7-2) In which of the four components of a Balanced Scorecard would you put the
Walton College’s diversity initiative? Why do you think this is important for a public insti-
tution of higher learning?
Problems
®
1. (LO 7-1, 7-2) Match the description of the management accounting question to the data
analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
Managerial Accounting Question Data Analytics Type
1. How much did Job #318 cost per unit?
2. What is driving the rate variance on manufacturing?
3. How can revenues be maximized (or costs be minimized) if there is
an increase in VAT tax in Ireland?
4. What is the appropriate cost driver?
5. What is the forecasted cash balance at the end of September?
6. What is the level of fixed and variable costs for the manufacture of
Product 317?
351
ISTUDY
2. (LO 7-1, 7-2) Match the description of the management accounting technique to the
data analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
3. (LO 7-3) Match the following KPIs to one of the following KPI types:
• Financial Performance
• Operational
• Customer
• Employee Performance
• Marketing
• Environmental and Social Sustainability
4. (LO 7-3) Match the following KPIs to one of the following KPI types:
• Financial Performance
• Operational
• Customer
• Employee Performance
• Marketing
• Environmental and Social Sustainability
352
ISTUDY
(Continued)
KPI KPI Type
5. Return on Capital Employed
6. Quality Index
7. Cost per Lead
8. Energy Consumption
5. (LO 7-3) Of the list of KPIs shown below, indicate which would be considered to be
financial performance KPIs, and which would not be.
6. (LO 7-3) Analysis: From Exhibit 7-8, choose five financial performance KPIs to answer the
following three questions. This URL (https://www.linkedin.com/pulse/20130905053105-
64875646-the-75-kpis-every-manager-needs-to-know) provides background informa-
tion for each individual KPI that may be helpful in understanding the individual KPIs and
answering the questions.
6A. Identify the equation/relationship/data needed to calculate the KPI. If you need
data, how frequently would the data need to be incorporated to be most useful?
6B. Describe a simple visualization that would help a manager track the KPI.
6C. Identify a benchmark for the KPI from the Internet. Choose an industry and find the
average, if possible. This is for context only.
7. (LO 7-3) Of the list of KPIs shown below, indicate which would be considered to be
employee performance KPIs, and which would not be.
8. (LO 7-3) Analysis: From Exhibit 7-8, choose 10 employee performance KPIs
to answer the following three questions. This URL (https://www.linkedin.com/
pulse/20130905053105-64875646-the-75-kpis-every-manager-needs-to-know)
353
ISTUDY
provides background information for each individual KPI that may be helpful in under-
standing the individual KPIs and answering the questions.
8A. Identify the equation/relationship/data needed to calculate the KPI. How frequently
would it need to be incorporated to be most useful?
8B. Describe a simple visualization that would help a manager track the KPI.
8C. Identify a benchmark for the KPI from the Internet. Choose an industry and find the
average, if possible. This is for context only.
9. (LO 7-3) Of the list of KPIs shown below, indicate which would be considered to be
marketing performance KPIs, and which would not be.
10. (LO 7-3) Analysis: From Exhibit 7-8, choose 10 marketing KPIs to answer the follow-
ing three questions. This URL (https://www.linkedin.com/pulse/20130905053105-
64875646-the-75-kpis-every-manager-needs-to-know) provides background information
for each individual KPI that may be helpful in understanding the individual KPIs and
answering the questions.
10A. Identify the equation/relationship/data needed to calculate the KPI. How fre-
quently would it need to be incorporated to be most useful?
10B. Describe a simple visualization that would help a manager track the KPI.
10C. Identify a benchmark for the KPI from the Internet. Choose an industry and find
the average, if possible. This is for context only.
11. (LO 7-4) Analysis: How does Data Analytics help facilitate the use of the Balanced
Scorecard and tracking KPIs? Does it make the data more timely? Are you able to
access more information easier or faster, or what capabilities does it give?
12. (LO 7-3) Analysis: If ROA is considered a key KPI for a company, what would be an
appropriate benchmark? The industry’s ROA? The average ROA for the company for the
past five years? The competitors’ ROA?
12A. How will you know if the company is making progress?
12B. How might Data Analytics help with this?
12C. How often would you need a measure of ROA? Monthly? Quarterly? Annually?
13. (LO 7-3) Analysis: If Time to Market is considered a key KPI for a company, what would
be an appropriate benchmark? The industry’s time to market? The average time to mar-
ket for the company for the past five years? The competitors’ time to market?
13A. How will you know if the company is making progress?
13B. How might Data Analytics help with this?
13C. How often would you need a measure of Time to Market? Monthly? Quarterly?
Annually?
14. (LO 7-3) Analysis: Why is Order Fulfillment Cycle Time an appropriate KPI for a com-
pany like Wayfair (which sells furniture online)? How long does Wayfair think customers
will be willing to wait if Amazon Prime promises items delivered to its customers in two
business days? Might this be an important basis for competition?
354
ISTUDY
Rev. Confirming Pages
355
Tableau | Desktop
356
Job_Orders and Job_Rates, click the relationship, then click Edit and
change the Cross filter direction to Both.
2. Rename Sheet 1 to Job Composition.
3. Next, create calculated fields for the following category subtotals that you will
include in your graphs and tables. Click the Job_Orders table in the fields
list to make that the home of your new measures, then click Modeling > New
Measure. Enter each of the formulas below as a new measure. Note: If the
measure shows up in an unintended table, click the measure and change the
Home Table in the ribbon to Job_Orders.
a. Actual Revenue = SUM(Job_Orders[Job_Revenue])
b. Actual DM Cost = SUM(Material_Requisition[Material_Cost])
c. Actual DL Cost = SUM(Time_Record[Hours])*SUM(Job_Rates[Direct_
Labor_Rate])
d. Actual OH Cost = SUM(Time_Record[Hours])*MIN(Job_
Rates[Overhead_Rate])
e. Actual Profit = [Actual Revenue]-[Actual DM Cost]-[Actual DL Cost]-
[Actual OH Cost]
f. Actual Profit Margin = [Actual Profit]/[Actual Revenue]
4. To enable benchmarks for comparison, add the following measures for the
budgeted amounts:
a. Budgeted DM Cost = SUM(Job_Orders[Job_Budgeted_DM_Cost])
b. Budgeted DL Cost = SUM(Job_Rates[Direct_Labor_Rate])*SUM(Job_
Orders[Job_Budgeted_Hours])
c. Budgeted OH Cost = SUM(Job_Rates[Overhead_Rate])*SUM(Job_
Orders[Job_Budgeted_Hours])
d. Budgeted Profit = [Actual Revenue]-[Budgeted DM Cost]-[Budgeted DL
Cost]-[Budgeted OH Cost]
5. To enable the use of color on your graphs to show favorable and unfavorable
analyses, create some additional measures based on IF . . . THEN . . . logic.
To make unfavorable variances appear in orange use the color hex value
#F28E2B in the THEN part of the formula. Power BI will apply conditional
formatting with that color. To make favorable variances appear in blue use the
color hex value #4E79A7 in the ELSE part of the formula. Remember: More
cost is unfavorable and more profit is favorable, so pay attention to the signs.
TIP: Straight quotation marks must be used as shown below. Curly quotation
marks, also known as smart quotes, will not work in Power BI formulas.
a. Favorable DM = IF([Actual DM Cost]>[Budgeted DM
Cost],"#F28E2B","#4E79A7")
b. Favorable DL = IF([Actual DL Cost]>[Budgeted DL
Cost],"#F28E2B","#4E79A7")
c. Favorable OH = IF([Actual OH Cost]>[Budgeted OH
Cost],"#F28E2B","#4E79A7")
d. Favorable Profit = IF([Actual Profit]<[Budgeted Profit],"#F28E2B",
"#4E79A7")
6. Finally when evaluating the jobs individually, you should compare the profit
margin to a target. This requires two more measures:
a. Target Profit Margin = .20
357
Tableau | Desktop
358
359
ISTUDY
Rev. Confirming Pages
1. Open your Lab 7-1 Slainte Job Costs.pbix file created in Part 1 and go to the
Job Composition tab.
2. Add a new Stacked Bar Chart to your page and resize it so it fills the entire
page. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:
a. Y-axis: Job_Orders.Job_No, Job_Rates.Job_Type
b. X-axis (all from Job_Orders): Actual Profit, Actual DM Cost, Actual DL
Cost, Actual OH Cost
3. Note: At this point you will see the bar chart, but the values will be incorrect.
Does it make sense that half of the jobs have negative profit? Power BI will
try to add all of the values from linked tables, in this case the Direct_Labor_
Rate and Overhead_Rate, unless you add a field from that table and have it
appear on your visual. Fix the problem by clicking Expand all down one level
in the hierarchy (down arrow fork icon) in the top-right corner of your chart.
This will now show the Job_Type value next to the job number and use the
correct rates for your DL and OH calculations. You should now see that
most of the jobs show a positive profit.
4. Click the Format visual (paintbrush) icon to give your chart some friendly
titles:
a. General > Title > Text: Job Cost Composition
b. Visual > Y-axis > Title: Job Number and Type
c. Visual > X-axis > Title: Off
5. Take a screenshot (label it 7-1MB) of your Job Composition page.
6. Save your file as Lab 7-1 Slainte Job Costs.pbix. Answer the questions for
this part and then continue to the next part.
Tableau | Desktop
1. Open your Lab 7-1 Slainte Job Costs.twb file created in Part 1 and go to the
Job Composition tab.
2. Create a new Stacked Bar Chart to your page. Drag the following fields from
the Job_Orders and Job_Rates tables to their respective boxes in the Visual-
izations pane:
a. Rows: Job_Orders.Job_No, Job_Rates.Job_Type
b. Columns: Measure Values (at the very bottom of the field list)
c. Marks > Color: Measure Names
d. Filters: Measure Values
1. Right-click the Measure Values filter and choose Edit Filter.
2. Click None to uncheck all of the values in this list.
360
1. Open your Lab 7-1 Slainte Job Costs.pbix file from Part 2 and create a new
page called Job Cost Dashboard.
2. Add a new Matrix Table to your page and resize it so it fills the entire width
of your page and the top third.
a. Click the Build visual icon in the Visualizations pane and drag the fol-
lowing fields from the Job_Orders and Job_Rates tables:
361
1. Columns: Job_Orders.Job_No
2. Values: Job_Rates.Job_Type. Job_Orders.Actual Revenue, Job_Or-
ders.Actual DM Cost, Job_Orders.Actual DL Cost, Job_Orders.
Actual OH Cost, Job_Orders.Actual Profit, Job_Orders.Actual Profit
Margin
b. Now change the format so that attributes appear on rows instead of as
headers. Click the Format visual (paintbrush) icon in the Visualizations
pane and adjust the following:
1. Visual > Values > Options > Switch values to rows: On
c. Click the Format visual (paintbrush) icon to change the text color to
show favorable (blue) and unfavorable (orange) profit margin values:
1. Visual > Cell elements > Apply settings to: Actual Profit Margin >
Font color: On
2. Click Conditional formatting (fx), enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Favorable Profit Margin
3. The table should now show orange profit margin values for any value
below the 20% you defined in Part 1.
d. Take a screenshot (label it 7-1MC) of your Job Cost table.
3. Click on the blank part of the page and add a new Clustered Bar Chart below
your table. Resize it so it fits the top-left quarter of the remaining space.
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:
1. Y-axis: Job_Orders.Job_No
2. X-axis (all from Job_Orders): Actual DM Cost
3. Tooltips: Budgeted DM Cost
b. Click the Format visual (paintbrush) icon to give your chart some
friendly titles and color based on whether the value is favorable (blue) or
unfavorable (orange):
1. General > Title > Text: Direct Material Cost
2. Visual > Y-axis > Title: Job Number
3. X-axis > Title: Off
4. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following and click OK:
a. Format by: Field value
b. What field should we base this on?: Favorable DM
c. Take a screenshot (label it 7-1MD) of your dashboard with the table
and bar chart.
4. Click on the blank part of the page and add a new Clustered Bar Chart to
the right of your Direct Material Cost chart. Resize it so it fits the top-right
quarter of the remaining space.
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:
362
363
c. Click the Format visual (paintbrush) icon to give your chart some
friendly titles and color based on whether the value is favorable (blue) or
unfavorable (orange):
1. General > Title > Text: Profit
2. Visual > Y-axis > Title: Job Number
3. Visual > X-axis > Title: Off
4. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following, and click OK:
a. Format by: Field value
b. What field should we base this on?: Favorable Profit
d. Take a screenshot (label it 7-1ME) of your completed dashboard.
7. Save your file as Lab 7-1 Slainte Job Costs.pbix. Answer the questions for
this part and then close Power BI Desktop.
Tableau | Desktop
1. Open your Lab 7-1 Slainte Job Costs.twb file from Part 2 and create a new
Page called Job Cost.
2. Create a Summary Table to show your costs associated with each job:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Measure Names
2. Rows: Job_Orders.Job No, Job_Rates.Job Type
3. Marks > Color: Favorable Profit Margin
4. Marks > Text:Measure Values
5. Filters: Measure Values
a. Right-click the Measure Values filter and choose Edit Filter.
b. Click None to uncheck all of the values in this list.
c. Check the following and click OK:
i. Actual DL Cost
ii. Actual DM Cost
iii. Actual OH Cost
iv. Actual Profit
v. Actual Profit Margin
vi. Actual Revenue
6. To show the profit margin as a percentage, right-click the
AGG(Actual Profit Margin) pill in the Measure Values shelf and
choose Format number. Click Percentage and then click back onto
your sheet.
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
364
365
ISTUDY
5. Create a new worksheet called Overhead Cost and add the following:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Actual OH Cost
2. Rows: Job_Orders.Job No
3. Marks > Detail:Budgeted OH Cost
4. Marks > Color: Favorable OH
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Cell button your chart.
3. Set the line Value to Budgeted OH Cost.
6. Create a new worksheet called Profit and add the following:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Actual Profit
2. Rows: Job_Orders.Job No
3. Marks > Detail:Budgeted Profit
4. Marks > Color: Favorable Profit
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Cell button your chart.
3. Set the line Value to Budgeted Profit.
7. Finally, create a new dashboard tab called Job Cost Dashboard and add your
charts from this part of the lab:
a. Change the size from Desktop Browser > Fixed Size to Automatic.
b. Drag the Direct Material Cost sheet to the dashboard.
c. Drag the Direct Labor Cost sheet to the right side of the dashboard.
d. Drag the Overhead Cost sheet to the bottom-left corner of the dash-
board.
366
ISTUDY
e. Drag the Profit sheet to the bottom-right corner of the dashboard.
f. Drag the Job Cost sheet along the entire top of the dashboard and resize
it to remove extra space.
g. In the top-right corner of each sheet on your new dashboard, click Use
as Filter (funnel icon) to connect the visualizations so you can drill
down into the data.
h. Take a screenshot (label it 7-1TE) of your completed dashboard.
8. Save your file as Lab 7-1 Slainte Job Costs.twb. Answer the questions for this
part and then close Tableau Desktop.
367
ISTUDY
Microsoft | Power BI Desktop
Tableau | Desktop
368
ISTUDY
Lab 7-2 Part 1 Identify KPI Targets and Colors
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 7-2 [Your name] [Your email address].docx.
The dashboard you will prepare in this lab requires a little preparation so you can define
key performance indicator targets and create some benchmarks for your evaluation. Once
you have these in place, you can create your visualizations. To simplify the process, here are
four KPIs that management has identified as high priorities:
Finance: Which products provide the highest amount of profit? The goal is 13 percent
return on sales. Use Profit ratio = Total profit/Total sales.
Process: How long does it take to ship our product to each state on average? Manage-
ment would like to see five days or less. Use Delivery time in days = Ship date − Order date.
Customers: Which customers spend the most on average? Management would like to
make sure those customers are satisfied. Average sales amount by average transaction count.
Employees: Who are our top-performing employees by sales each month? Rank the total
number of sales by employee.
369
ISTUDY
4. To enable KPI targets to use as benchmarks for comparison, add the fol-
lowing measures. This will set your expected return on sales to 13 percent,
delivery days to 5, top salespeople to show the first top-ranking individual,
and the target average revenue to $600. You can always edit these measures
later to change the benchmarks:
a. KPI Target Return on Sales = .13
b. KPI Target Delivery Days = 5
c. KPI Target Top Salespeople = 1
d. KPI Target Average Revenue = 600
5. To enable the use of color on your graphs to show KPIs that exceed the target,
create some additional calculated fields based on IF . . . THEN . . . logic. To
make values that miss the target appear in orange (#F28E2B) use the color
hex value in the THEN part of the formula. Power BI will apply conditional
formatting with that color. To make values that exceed the target variances
appear in blue (#4E79A7) use the color hex value in the ELSE part of the
formula. Remember: Sometimes smaller values are better than larger values
so watch the signs.
a. Actual vs Target Return on Sales = IF([Profit Ratio]>=[KPI Target
Return on Sales],“#4e79a7”,“#f28e2b”)
b. Actual vs Target Delivery Days = IF([Delivery Time Days]<=[KPI Target
Delivery Days],“#4e79a7”,“#f28e2b”)
c. Actual vs Target Seller = IF([Rank]=[KPI Target Top Salespeople],“#4e
79a7”,“#f28e2b”)
d. Actual vs Target Average Sale = IF(AVERAGE(Sales_Order_Lines
[Product_Line_Total_Revenue])>=[KPI Target Average Revenue],“#4e7
9a7”,“#f28e2b”)
6. Scroll your field list to show your new calculated values and take a
screenshot (label it 7-2MA). Note: Your report should still be blank at this
point.
7. Save your file as Lab 7-2 Slainte Balanced Scorecard.pbix. Answer the ques-
tions for this part and then continue to the next part.
Tableau | Desktop
370
ISTUDY
2. Rename Sheet 1 to Finance - Return on Sales.
3. Next, create calculated fields for the following category subtotals that you
will include in your graphs and tables. The Profit Ratio is simply profit
divided by sales. The Deliver Time Days calculates the number of days
between the order date and the ship date for each order. The Rank identifies
the top salespeople based on sales. Click Analysis > Create Calculated Field
for each of the following (name: formula):
a. Profit Ratio: SUM([Product Line Profit])/SUM([Product Line Total
Revenue])
b. Delivery Time Days: DATEDIFF(‘day’,[Sales Order Date], [Ship Date])
c. Rank: INDEX()
4. To enable KPI targets to use as benchmarks for comparison, add the follow-
ing measures. Click the down arrow at the top of the data tab and choose
Create Parameter. Enter the name and the current value below and click
OK. This will set your expected return on sales to 13 percent, delivery days
to 5, top salespeople to show the first top-ranking individual, and the target
average revenue to $600. You can always edit these measures later to change
the benchmarks:
a. KPI Target Return on Sales = .13
b. KPI Target Delivery Days = 5
c. KPI Target Top Salespeople = 1
d. KPI Target Average Revenue = 600
5. To enable the use of color on your graphs to show favorable and unfavorable
analyses, create some additional calculated fields based on IF . . . THEN . . .
logic. Remember: More cost is unfavorable and more profit is favorable, so
pay attention to the signs.
a. Actual vs Target Return on Sales = IF([Profit Ratio]>=[KPI Target
Return on Sales]) THEN ‘Favorable’ ELSE ‘Unfavorable’ END
b. Actual vs Target Delivery Days = IF(AVG([Delivery Time Days])<=[KPI
Target Delivery Days]) THEN ‘Favorable’ ELSE ‘Unfavorable’ END
c. Actual vs Target Seller = IF([Rank]=[KPI Target Top Salespeople])
THEN ‘Favorable’ ELSE ‘Unfavorable’ END
d. Actual vs Target Average Sale = IF(AVG([Product Line Total
Revenue])>=[KPI Target Average Revenue]) THEN ‘Favorable’ ELSE
‘Unfavorable’ END
6. Scroll your field list to show your new calculated values and take a
screenshot (label it 7-2TA). Note: Your report should still be blank at this
point.
7. Save your file as Lab 7-2 Slainte Balanced Scorecard.twb. Answer the
questions for this part and then continue to the next part.
371
ISTUDY
Rev. Confirming Pages
1. Open your Lab 7-2 Slainte Balanced Scorecard.pbix file created in Part 1 and
go to the Balanced Scorecard tab.
2. Add a new Slicer to your page and resize it so it fills a narrow column on the
far right side of the page.
a. Expand the Sales_Order table and check the Sales_Order_Date field to
add it to the slicer.
b. Check only 2020 > Qtr 1 > February 2020 in the slicer to filter the data.
3. Click on the blank part of the page and add a new Clustered Column Chart
to the page for your Finance visual. Resize it so that it fills the top-left quar-
ter of the remaining space on the page. Note: Be sure to click in the blank
space before adding a new element.
a. Drag the following fields from the Finished_Good_Products and Sales_
Order_Lines tables to their respective boxes:
1. X-axis: Finished_Good_Products.Product_Description
2. Y-axis: Profit Ratio
b. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > X-axis > Title: Off
2. Visual > Y-axis > Title: Profit Ratio
3. Visual > Columns > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Return on Sales
4. General > Title > Text: Finance - Return on Sales
372
4. Click on the blank part of the page and add a new Filled Map to the page
for your Process visual. Resize it so that it fills the top-right quarter of the
remaining space on the page.
a. Drag the following fields from the Customer_Master_Listing and Sales_
Order_Lines tables to their respective boxes:
1. Location: Customer_Master_Listing.Customer_State
2. Tooltips: Delivery Time Days
b. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > Fill colors > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Delivery
Days
2. General > Title > Text: Process - Delivery Time
c. Take a screenshot (label it 7-2MB) of your finance and process
visuals.
5. Click on the blank part of the page and add a new Clustered Bar Chart to the
page for your Growth visual. Resize it so that it fills the bottom-left quarter
of the remaining space on the page.
a. Drag the following fields from the Employee_Listing and Sales_Order_
Lines tables to their respective boxes:
1. Y-axis: Employee_Listing.Employee_First_Name and Employee_List-
ing.Employee_Last_Name
2. X-axis: Sales_Order_Lines.Product_Line_Total_Revenue
b. To show both names, click Expand all down one level in the hierarchy
(down arrow fork icon at the top of the chart).
c. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > Y-axis > Title: Salesperson
2. Visual > X-axis > Title: Sales Revenue
3. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Seller
4. General > Title > Text: Growth - Top Salesperson
6. Click on the blank part of the page and add a new Scatter Chart to the page
for your Customer visual. Resize it so that it fills the bottom-right quarter of
the remaining space on the page.
a. Drag the following fields from the Customer_Master_Listing and Sales_
Order_Lines tables to their respective boxes:
1. Values: Customer_Master_Listing.Business_Name
2. X-axis: Sales_Order_Lines.Product_Line_Total_Revenue > Average
3. Y-axis: Sales_Order_Lines.Sales_Order_Quantity_Sold > Average
373
b. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Format > X-axis > Title: Average Order Revenue
2. Format > Y-axis > Title: Average Order Quantity
3. Format > Markers > Color > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Average Sale
4. General > Title > Text: Customer - Best Customers
c. Take a screenshot (label it 7-2MC) of your completed dashboard.
7. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 7-2 Slainte Balanced Scorecard.pbix.
Tableau | Desktop
1. Open your Lab 7-2 Slainte Balanced Scorecard.twb file from Part 1 and go to
the Finance - Return on Sales sheet. Add the following:
a. Drag the following fields from the Finished_Good_Products and Sales_
Order_Lines tables to their respective boxes:
1. Columns: Finished_Good_Products.Product Description
2. Rows: Profit Ratio
3. Marks > Color: Actual vs Target Return on Sales
4. Marks > Detail: KPI Target Return on Sales
5. Sort by: Profit Ratio > Descending
6. Filters: Sales Order Date > Month / Year > February 2020 > OK
a. Right-click > Show filter.
b. Right-click > Apply to Worksheets > All Using This Data Source.
b. Click the drop-down menu in the top-right corner of the Favorable/
Unfavorable legend on the right side of the screen and choose Edit
Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Table button your chart.
3. Set the line Value to KPI Target Return on Sales.
d. Take a screenshot (label it 7-2TB) of your Direct Materials Finance
worksheet.
374
375
ISTUDY
3. Marks > Details: Customer_Master_Listing.Business Name
4. Marks > Color: Actual vs Target Average Sale
b. To center your chart, right-click both axes and choose Edit Axis, then
uncheck Include zero.
c. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
5. Finally, create a new dashboard tab called Balanced Scorecard and add your
charts from this part of the lab:
a. Change the size from Desktop Browser > Fixed Size to Automatic.
b. Drag the Finance sheet to the dashboard.
c. Drag the Process sheet to the right side of the dashboard.
d. Drag the Growth sheet to the bottom-left corner of the dashboard.
e. Drag the Customer sheet to the bottom-right corner of the dashboard.
f. In the top-right corner of each sheet on your new dashboard, click Use
as Filter (funnel icon) to connect the visualizations so you can drill
down into the data.
6. Take a screenshot (label it 7-2TC) of your completed dashboard.
7. When you are finished answering the lab questions you may close Tableau
Desktop. Save your file as Lab 7-2 Slainte Balanced Scorecard.twb.
376
ISTUDY
Lab 7-3 omprehensive Case: Analyze Time Series
C
Data—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Time series analysis is a form of predictive analysis to forecast sales.
The goal is to identify trends, seasonality, or cycles in the data so that you can better plan
production.
In this lab, you will create two dashboards to identify trends and seasonality in Dillard’s
Sales data. The first dashboard will focus on monthly seasonality and the second dashboard
will focus on day of the week seasonality. Both dashboards should show the same general
trend.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.
Microsoft | Excel
Microsoft Excel
LAB 7-3M Example Time Series Dashboard in Microsoft Excel
377
ISTUDY
Tableau | Desktop
Microsoft | Excel
378
ISTUDY
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT TRAN_DATE, STATE, STORE.STORE, SUM(TRAN_AMT)
AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE TRAN_TYPE = ‘P’
GROUP BY TRAN_DATE, STATE, STORE.STORE
3. Click OK, then Load to load the data directly into Excel. This may take a few
minutes because the dataset you are loading is large.
4. Once the data have loaded, you will add the data to the Power Pivot data
model. Power Pivot is an Excel add-in that allows us to supercharge our data
models and PivotTables by creating relationships in Excel (like we can do in
Power BI), improve our Conditional Formatting by creating KPIs with base
and target measures, and create date tables to work better with time series
analysis, among many other things.
a. Enable Power Pivot by navigating to File > Options > Add-ins.
1. Change the dropdown for Manage: to COM Add-ins.
2. Click Go. . .
3. Place a check mark next to Microsoft Power Pivot for Excel and
click OK.
5. From the new Power Pivot tab on the ribbon, select Add to Data Model
(ensure your active cell is somewhere in the dataset you just loaded into
Excel). In the Power Pivot window, you will create a date table and then load
your data to a PivotTable:
a. Create the date table: Design tab > Date Table > New.
b. Load to PivotTable: Home tab > PivotTable.
c. Click OK in the Create PivotTable pop-up window to add the PivotTable
to a new worksheet.
6. Take a few moments to explore the new PivotTable field list. There are two
tables to expand.
a. Calendar is your new date table. It has a date hierarchy and also many
different date parts in the More Fields section. We will work with the
Date Hierarchy as well as the Year, Month, and Day of Week fields from
the More Fields section.
b. Query1 is the data you loaded from SQL Server. We will primarily work
with the AMOUNT, STATE, and STORE fields.
7. Expand the Calendar and More Fields section as well as the Query1 fields
and take a screenshot (label it 7-3MA).
8. Create a highlight table (PivotTable with conditional formatting) to visualize
previous month and prior month comparisons:
a. Create the PivotTable:
1. Columns: Calendar.Year (Expand More Fields to find Year)
379
ISTUDY
2. Rows: Calendar.Month (Expand More Fields to find Month)
3. Values: Query1.AMOUNT
4. You will likely be prompted that relationships between tables need to
be created. Click Auto-Detect, then Close the Auto-Detect Relation-
ships window.
5. Remove the Grand Totals: PivotTable Design tab > Grand Totals >
Off for Rows and Columns.
b. Add in Conditional Formatting:
1. Select all of the numerical data in your PivotTable (not the year
labels or month labels).
2. From the Home tab in the ribbon, select Conditional Formatting >
Color Scales > Green-Yellow-Red Color Scale (the first option
provided).
9. The gradient fill across month and across years provides you with a means
to quickly analyze how months have performed year-over-year and how each
month compares to the months before and after it.
10. Add Sparklines to your highlight table. Sparklines can help add an additional
visual to the highlight table to show the movement over time in a line chart
so that you and your audience do not have to rely only on the gradient colors.
a. Similar to your action for Conditional Formatting, select all of the
numerical data in your PivotTable (not the year labels or month labels).
b. From the Insert tab in the ribbon, select Line (from the Sparkline
section).
1. Data Range: auto-populated from your selection
2. Location Range: F5:F16 (the cells to the immediate right of your
PivotTable values)
3. Click OK.
11. Take a screenshot of your highlight table with sparklines (label it 7-3MB).
12. Answer the lab questions, then continue to Part 2.
Tableau | Desktop
1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is; click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT TRAN_DATE, STATE, STORE.STORE, SUM(TRAN_AMT)
AS AMOUNT
FROM TRANSACT
380
ISTUDY
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE TRAN_TYPE = ‘P’
GROUP BY TRAN_DATE, STATE, STORE.STORE
e. Click OK.
3. Click Sheet 1 to create your highlight table. First, rename Sheet 1 to
Highlight Table.
a. Columns: Tran_Date
1. It should default to Years.
b. Rows: Tran_Date
1. Right-click the Tran_Date pill from the Rows shelf and select Month.
c. Drag Amount to Text (Marks shelf).
d. Select Highlight Table from the Show Me tab.
1. Tableau likely moved MONTH (TRAN_DATE) to the Columns
when you changed the visualization. Just drag it back down to Rows.
2. Tableau defaults the color scheme to Blue gradient. Change this
gradient to stoplight colors by clicking Color on the Marks shelf >
Edit Colors > from the drop-down select Red-Green-Gold Diverging
and click OK.
e. The gradient fill across month and across years provides you with a
means to quickly analyze how months have performed year-over-year and
how each month compares to the months before and after it.
4. Take a screenshot (label it 7-3A).
5. Create a Cycle Plot. Cycle Plots communicate similar information as
highlight tables, but in a more visual way that does not rely on the gradient
color fill. Cycle Plots make it easy to see month-over-month distribution
and year-over-year distribution at once so that you can spot seasonality and
trends.
a. Right-click the Highlight Table sheet tab and select Duplicate. Rename
the new duplicate sheet Cycle Plot.
b. Adjust the visualization to a lines (discrete) chart from the Show Me tab.
c. Swap the order of the two pills in columns so that MONTH(Tran_Date)
is first and YEAR(tran_date) is second.
d. Right-click the amount Y-axis and select Add reference line.
e. Leave all of the defaults except for Label. Change this default to None
and click OK.
6. Take a screenshot (label it 7-3TB).
7. Answer the lab questions, then continue to Part 2.
381
ISTUDY
Lab 7-3 Part 1 Analysis Questions (LO 7-1, 7-5)
AQ1. Why do you think October 2016 has such a comparatively low sales amount?
AQ2. Which trends stand out the most to you? What next steps would be interesting
to take?
Microsoft | Excel
382
ISTUDY
c. Reposition both slicers so that they are to the right of the PivotTable.
d. For now, the slicers are only connected to the PivotTable or PivotChart
that you created them for. Adjust the setting on each slicer so that they
filter every part of your dashboard.
1. Right-click the heading of each slicer and select Report Connections. . .
2. Place a check mark next to each dashboard component and click OK.
3. Adjust the Year slicer so that you can select multiple years by pressing
the multi-select button (looks like a checklist next to the Filter icon).
4. Rename your worksheet by right-clicking the Sheet tab and selecting Rename.
Name it Months Dashboard.
5. Take a screenshot (label it 7-3MC) that includes all of your charts and the
two slicers.
6. Answer the lab questions, then continue to Part 3.
Tableau | Desktop
383
ISTUDY
Lab 7-3 Part 2 Analysis Questions (LO 7-1, 7-5)
AQ1. Which month strikes you as the most interesting and why?
AQ2. What would you recommend that Dillard’s do regarding the past performance
exhibited in the Monthly dashboard?
AQ3. How do you think each different chart (Highlight Table, Sparklines/Cycle Plot,
Multi-Line Chart, or Single Line Chart) provide different value to your analysis?
Microsoft | Excel
Tableau | Desktop
384
ISTUDY
2. In each of your new sheets, locate where the MONTH(tran_date) pill is
located and replace it with Weekday by right-clicking MONTH(tran_date)
and selecting More > Weekday.
3. Take a screenshot (label it 7-3TD) of the All tables sheet.
4. When you are finished answering the lab questions, you may close Tableau
and save your file as Lab 7-3 Time Series Dashboard.twb.
385
ISTUDY
Rev. Confirming Pages
3. Create a KPI to signal performance of the measure in comparison to the baseline, and
determine the range of values that indicate poor performance, good performance, and
great performance.
To calculate the sum of each year’s sales transactions, we need to create three new measures.
Microsoft | Excel
1. From the Insert tab in the ribbon, create a new PivotTable. This time, do
not place the PivotTable on the same worksheet; select New Worksheet and
click OK.
2. In the Power Pivot tab in the Excel ribbon, create two new measures:
a. If you have closed and re-opened your Excel workbook, you likely will
need to add the Power Pivot add-in back. Do so if you don’t see the
Power Pivot tab in the ribbon.
b. Click Measures > New Measure. . .
1. Table name: Query1 (this is the default)
2. Measure name: Current Year
3. Formula: =SUM([Amount])
4. Category: General
c. Click OK.
d. Click Measures > New Measure. . .
1. Table name: Query1
2. Measure name: Last Year
3. Formula: =CALCULATE(sum([Amount]), SAMEPERIODLASTYE
AR('Calendar'[Date]))
3. In the Power Pivot tab in the Excel ribbon, create a new KPI:
a. Click KPIs > New KPI. . .
1. KPI base field (value): Current Year
2. Measure: Last Year
3. Adjust the status thresholds so that:
a. Anything below 98 percent of last year’s sales (the target) is red.
b. Anything between 98 percent and 102 percent of the target is yellow.
c. Anything above 102 percent of the target is green.
b. Click OK.
4. Notice that the PivotTable has automatically placed your new measures and
your new KPI into the Values section. The Current Year and Last Year val-
ues will make more sense when we pull in the Date Hierarchy. The field
labeled “Current Year Status” is your KPI field, although it has defaulted to
showing a number (1) instead of a stoplight.
a. From your PivotTable field list, remove Current Year Status from the Values.
b. From your PivotTable field list, scroll down until you see the stoplight
Current Year values (they are within the Query1 fields), and place
Status back in the Values.
386
5. This KPI will function only with the Date Hierarchy (not with the date parts).
a. Drag Date Hierarchy to the Rows of your PivotTable field list.
6. Take a screenshot (label it 7-3M5) of your PivotTable.
7. Currently, our KPI is comparing years to prior year and months to prior
month (such as September 2016 to September 2015). To compare months to
previous months (September 2016 to August 2016, for example), we will cre-
ate another pair of measures and a new KPI. (Even though the calculation
for current month is the same as the calculation for current year, we have to
create a new measure to use as the KPI’s base; each base measure can have
only one KPI assigned to it.)
8. In the Power Pivot tab in the Excel ribbon, create two new measures and a
new KPI:
a. If you have closed and re-opened your Excel workbook, you likely will
need to add the Power Pivot add-in back. Do so if you don’t see the
Power Pivot tab in the ribbon.
b. Click Measures > New Measure. . .
1. Table name: Query1 (this is the default)
2. Measure name: Current Month
3. Formula: =SUM([Amount])
4. Category: General
c. Click OK.
d. Click Measures > New Measure. . .
1. Table name: Query1
2. Measure name: Previous Month
3. Formula: =CALCULATE(sum([Amount]), PREVIOUSMONTH
('Calendar'[Date]))
9. In the Power Pivot tab in the Excel ribbon, create a new KPI:
a. Click KPIs > New KPI. . .
1. KPI base field (value): Current Month
2. Measure: Previous Month
3. Adjust the status thresholds so that:
a. Anything below 98 percent of last year’s sales (the target) is red.
b. Anything between 98 percent and 102 percent of the target is
yellow.
c. Anything above 102 percent of the target is green.
b. Click OK.
10. Add this KPI status to your PivotTable (if it automatically added in, remove
it along with the measures, and add the KPI Status back in to view the stop-
lights instead of 0, 1, and –1).
11. Take a screenshot (label it 7-3MF) of your PivotTable with the new Cur-
rent Month Status KPI included.
This report may be useful at a very high level, but for state-level and store-
level analysis, the level is too high. Next, we will add in two slicers to help
filter the data based on state and store.
387
388
389
ISTUDY
Microsoft | Power BI Desktop
Tableau | Desktop
390
ISTUDY
Rev. Confirming Pages
391
Tableau | Desktop
392
393
ISTUDY
Rev. Confirming Pages
1. Open your Lab 7-4 Dillard’s Prior Period.pbix from Part 1 if you closed it and
go to Page 1.
2. Name your page: Prior Period.
3. Create a Clustered Bar Chart showing store performance compared to the
KPI target average sales and resize it so it fills the left side of your page, then
drag the fields to their corresponding place:
a. Y-axis: Location
b. X-axis: TRANSACT.TRAN_AMOUNT > Average
c. Tooltips: KPI Target Avg Tran Amount
d. Click the Format visual (paintbrush) icon and adjust the styles:
1. Visual > Y-axis > Title: Location
2. Visual > X-axis > Title: Average Transaction Amount
3. Visual > Bar > Color > Conditional formatting (fx button) or click
the vertical three dots next to Default color. Choose the following
and click OK:
a. Format style: Field value
b. What field should we base this on?: Current vs Target Avg Tran
Amount
4. General > Title > Text: KPI Average Tran Amount by Location
e. Take a screenshot (label it 7-4MC).
4. Now adjust your sheet to show filters and drill down into values:
a. In the right-hand corner of the clustered bar chart you just made, click
Expand all down one level in the hierarchy (forked arrow icon) to show
the store.
b. In the visualization click More Options (three dots) and choose Sort By
> STATE STORE. Then choose Sort Ascending.
5. Take a screenshot (label it 7-4MD) of the page.
6. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 7-4 Dillard’s Prior Period.pbix and continue to
Part 3.
394
1. Open your Lab 7-4 Dillard’s Prior Period.twb from Part 1 if you closed it and
go to Sheet 1.
2. Name your sheet: KPI Average Amount.
3. Create a bullet chart showing store performance compared to the KPI target
average sales. Note: A bullet chart shows the actual results as a bar and the
benchmark as a reference line.
a. Drag your Location hierarchy to the Rows shelf.
b. Drag Tran Amt Current Year to the Columns shelf. Change it to Average,
and sort your chart in descending order by average transaction amount.
c. Click the Analytics tab and drag Reference Line to your Table and
click OK.
1. Scope: Entire Table
2. Value: KPI Target Avg Tran Amount
d. Return to the Data tab and drag KPI Avg Current > Target to the Color
button in the Marks pane.
e. Take a screenshot (label it 7-4TC).
4. Now adjust your sheet to show filters and drill down into values:
a. Right-click the Select a Year parameter and click Show Parameter.
b. Drag Tran Date to the Filters pane and choose Months, then click Next.
Choose Select from List and check all months, then click OK.
c. In the Rows shelf, click the + next to State to expand the values to show
the average transaction amount by store.
d. Edit the colors in the panel on the right:
1. True: Blue
2. False: Orange
3. Null: Red
5. Take a screenshot (label it 7-4TD) of your expanded sheet.
6. After you answer the lab questions, save your work as Lab 7-4 Dillard’s Prior
Period.twb and continue to Part 3.
395
ISTUDY
Rev. Confirming Pages
AQ3. What do you notice about the automatic color selections for the graph? What
colors would you choose to make the chart more readable?
1. Open your Lab 7-4 Dillard’s KPI Part 1.pbix from Part 2 if you closed it.
2. Click the blank area on your Prior Period page and create a new Clustered
Bar Chart showing store performance compared to the previous year perfor-
mance. Resize it so it fills the right side of the page between the first chart
and the slicers:
a. Y-axis: Location
b. X-axis: TRANSACT.TRAN_AMOUNT > Average
c. TRANSACT.TRAN_AMOUNT_PREV_YEAR
d. Click the Format visual (paintbrush) icon and adjust the styles:
1. Visual > Y-axis > Title: Location
2. Visual > X-axis > Title: Average Transaction Amount
3. General > Title > Text: Average Tran Amount by Location Prior Period
e. Take a screenshot (label it 7-4ME).
3. Now adjust your sheet to show filters and drill down into values:
a. In the visualization, click Expand all down one level in the hierarchy
(forked arrow icon) to show the store.
b. In the visualization click More Options (three dots) and choose Sort By
> STATE STORE. Then choose Sort Ascending.
4. Take a screenshot (label it 7-4MF) of the page.
5. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 7-4 Dillard’s Prior Period.pbix.
Tableau | Desktop
1. Open your Lab 7-4 Dillard’s KPI Part 1.twb from Part 2 if you closed it and
create a new sheet. Go to Worksheet > New Worksheet.
2. Name your sheet: PP Average Amount.
396
397
ISTUDY
Lab 7-4 Part 3 Analysis Questions (LO 7-2, 7-3)
AQ1. In Tableau, the benchmark colors show below 60 percent as red, below
80 percent as yellow, and below 100 percent as green. Why might a manager
consider 80–100 percent of the previous year’s results acceptable?
AQ2. What additional available measures or calculated fields might you use to evalu-
ate store performance?
Lab 7-5 C
omprehensive Case: Advanced Performance
Models—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Management at Dillard’s corporate office is looking at new ways to
incentivize managers at the various store locations both to boost low-performing stores and
to reward high-performing stores. The retail manager has asked you to evaluate store perfor-
mance over a two-week period and identify those stores.
Data: Dillard’s sales data are available only on the University of Arkansas Remote
Desktop (waltonlab.uark.edu). See your instructor for login credentials.
398
ISTUDY
Rev. Confirming Pages
Tableau | Desktop
399
Tableau | Desktop
1. Load the following data from SQL Server (see Lab 1-5 for instructions):
a. Database: WCOB_DILLARDS
b. Tables: TRANSACT, STORE
c. Date range: 3/1/2015 to 3/14/2015
2. Create a new worksheet called Store Clusters:
a. Columns: TRANSACT.Transaction ID > Measure > Count
b. Rows: TRANSACT.Tran Amt > Measure > Average
c. Marks:
1. TRANSACT.Store > Color
d. Let the query run at this point.
e. You’ll notice an outlier in the top-right corner. In this case, it is the
online division of Dillard’s. Because we’re evaluating brick-and-mortar
stores, we want to exclude this one.
f. Right-click on the outlier (Store 698) and click Exclude.
g. To create clusters, click the Analytics tab and drag Cluster to the scatter
plot.
1. Number of clusters: 12
h. Right-click your scatter plot and choose View Data. . . Then drag the
data window down so you can see both the scatter plot and the data.
3. Take a screenshot (label it 7-5TA).
4. Answer the questions and continue to Part 2.
400
ISTUDY
Rev. Confirming Pages
Tableau | Desktop
401
1. Type: Bar
2. TRANSACT.Tran Date > Color > Day
a. Discrete
b. Day <- There are two day options in the drop-down. Choose the
top one without a year.
3. TRANSACT.Tran Date > Label > Day
a. Sort: Descending
d. Let the query run at this point.
e. Now filter your results. Right-click outside the work area and click
Filters > Store.
f. Now let’s narrow in on the high-performing stores we identified in our
cluster analysis.
1. Uncheck All in the Store filter list.
2. Check the stores from the cluster you identified in Part 2.
g. Hover over Tran Amt in the chart and click the Sort Descending icon.
2. Take a screenshot (label it 7-5TB).
3. Answer the questions and continue to Part 3.
402
Tableau | Desktop
1. No columns or rows
2. Marks:
A. TRANSACT.Tran Amt > Size > SUM
B. TRANSACT.Tran Amt > Color > SUM
C. STORE.State > Label
3. To drill down to subcategories, we can create a hierarchy in Tableau.
a. In the attributes list, drag STORE.City onto STORE.State to create a
hierarchy and click OK.
b. Now in the Marks list, click the + next to State to show the cities in each
state.
4. Create a new dashboard and drag the three sheets onto the page.
5. Take a screenshot (label it 7-5TC).
6. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 7-5 Dillard’s Advanced Models.twb.
403
A Look Back
Chapter 7 focused on generating and evaluating key performance metrics that are used primarily in managerial
accounting. By measuring past performance and comparing it to targeted goals, we are able to assess how well a com-
pany is working toward a goal. Also, we can determine required adjustments to how decisions are made or how busi-
ness processes are run, if any.
A Look Ahead
In Chapter 9, we highlight the use of Data Analytics for the tax function. We also discuss the application of Data
Analytics to tax questions and look at tax data sources and how they may differ depending on the tax user (a tax
department, an accounting firm, or a regulatory body) and tax needs. We also investigate how visualizations are
useful components of tax analytics. Finally, we consider how data analysis might be used to assist in tax planning.
404
ISTUDY
Sometimes the future is now. The StockSnips app uses sentiment analysis,
machine learning, and artificial intelligence to aggregate and analyze news
related to publicly traded companies on Nasdaq and the New York Stock
Exchange to “gain stock insights and track a company’s financial and business
operations.” The use of Data Analytics helps classify the news to help predict
revenue, earnings, and cash flows, which are in turn used to predict the stock
performance. What will Data Analytics do next?
OBJECTIVES
After reading this chapter, you should be able to:
405
ISTUDY
406 Chapter 8 Financial Statement Analytics
EXHIBIT 8-1
Financial Statement Analysis Questions and Data Analytics Techniques by Analytics Type
Descriptive—summarize activity or master What is the firm’s debt relative to its assets in Summary statistics (Sums, Totals, Averages,
data based on certain attributes to address the past year? Medians, Bar Charts, Histograms, etc.)
questions of this type: What is the company’s cash flow from Ratio analysis
What happened? What is happening? operating activities for each of the past three Vertical, horizontal analysis
years?
What are the profit margins during the
current quarter?
What are the component parts of return on
equity?
Diagnostic—detect correlations and How leveraged is the company compared to Performance comparisons to past,
patterns of interest and compare them to a the industry or its competitors? competitor, industry, stock market, overall
benchmark to address questions of this type: Are the R&D investments last quarter economy
Why did it happen? What are the reasons associated with increased sales and earnings Drill-down analytics to determine relations/
for past results? Can we explain why it this quarter? patterns/linkages between variables
happened? Why did the collectability of receivables Regression and correlation analysis
(amount our customers owe to the company)
fall in the current quarter as compared to the
prior quarters?
Predictive—identify common attributes or What are the expected sales, earnings, and Sales, earnings, and cash-flow forecasting
patterns that may be used to forecast similar cash flows over the next five years? using:
activity to address the following questions: How much would the company have earned — Time series
Will it happen in the future? What is the if there had not been a business interruption — Analyst forecasts
probability something will happen? Is it (due to fire or natural disaster)? — Competitor and industry performance
forecastable?
— Macroeconomic forecasts
Drill-down analytics to determine relations/
patterns/linkages between variables
Regression and correlation analysis
Prescriptive—recommend action based on What is the intrinsic value of the stock based Cash-flow analysis
previously observed actions to address on forecasted financial performance and Calculation of net present value of future
questions of this type: uncertainty? cash flows
What should we do based on what we expect How sensitive is the stock’s intrinsic value to Sensitivity analysis
will happen? How do we optimize our assumptions and estimates made?
performance based on potential constraints?
ISTUDY
Chapter 8 Financial Statement Analytics 407
1
PCAOB, AS 2305, https://pcaobus.org/Standards/Auditing/Pages/AS2305.aspx.
ISTUDY
408 Chapter 8 Financial Statement Analytics
EXHIBIT 8-2
Vertical Analysis of a
Common Size Financial
Statement
Microsoft Excel
Lab Connection
Lab 8-1 and Lab 8-2 have you perform vertical and horizontal analyses.
Ratio Analysis
For other indicators of financial health, there are four main types of ratios: liquidity, activity,
solvency (or financing), and profitability. In practice, these ratios may vary slightly depending
on which accounts the user decides to include or exclude.
Liquidity is the ability to satisfy the company’s short-term obligations using assets that
can be most readily converted into cash. Liquidity ratios help determine a company’s ability
to meet short-term obligations. Here are some common liquidity ratios:
ISTUDY
Chapter 8 Financial Statement Analytics 409
income statement (duration) accounts with balance sheet (point in time) accounts, you
need to average the balance sheet accounts to match the period. Also, for turnover ratios,
analysts may use 365 days or round down to 360 days depending on preference.
We use solvency (sometimes called financing) ratios to help assess a company’s ability
to pay its debts and stay in business. In other words, we assess the company’s financial
risk—that is, the risk resulting from a company’s choice of financing the business using debt
or equity. Debt-to-equity, long-term debt-to-equity, and times interest earned ratios are also
useful in assessing the level of solvency.
Profitability ratios are a common calculation when assessing a company. They are used
to provide information on the profitability of a company and its prospects for the future.
Profitability ratios are commonly associated with the DuPont ratio. The DuPont ratio
was developed by the DuPont Corporation to measure performance as a decomposition of
the return on equity ratio in this way:
The DuPont ratio decomposes return on equity into three different types of ratios: prof-
itability (profit margin), activity (asset turnover), and solvency (equity multiplier) ratios.
This allows a more complete evaluation of performance.
Lab Connection
Lab 8-3 has you analyze financial ratios.
ISTUDY
410 Chapter 8 Financial Statement Analytics
EXHIBIT 8-3
Comparison of Ratios
among Microsoft
(MSFT), Apple
(AAPL), and Facebook
(FB)
Microsoft Excel
ISTUDY
Chapter 8 Financial Statement Analytics 411
EXHIBIT 8-4
Horizontal Analysis of a
Common Size Financial
Statement
Microsoft Excel
A horizontal analysis is an analysis that shows the change of a value from one period to
the next. This is sometimes called a trend analysis. When you have two or more periods,
you calculate the proportional change in value from one to the next similar to a ratio analy-
sis. In Exhibit 8-4, we take Revenue in 2020 and divide it by Revenue in 2019 to show a
113.65 percent change or a 13.65 percent increase from one year to the next for Microsoft.
Other horizontal analyses are provided for Apple as well. The key question seems to be if
the trends from the past are expected to persist into the future that would allow the analyst
to forecast future performance.
Horizontal analysis can be used to calculate trends from one period to the next or
over time.
When you calculate the trend over a large period of time relative to a single base year,
you create an index. An index is a metric that shows how much any given subsequent year
has changed relative to the base year. The formula is the same as above, but we lock the
base year value when creating our formula, shown in Exhibit 8-5.
Using these trends and indices, we can better understand how a company performs over
time, calculate the average amount of change, and predict what the value is likely to be in
the next period.
ISTUDY
412 Chapter 8 Financial Statement Analytics
EXHIBIT 8-5
Index Showing Change
in Value Relative to
Base Year
Microsoft Excel
ISTUDY
Chapter 8 Financial Statement Analytics 413
Source: https://seekingalpha.com/article/4392734-sharing-economy-come-home-ipo-
of-airbnb and https://www.reuters.com/article/airbnb-ipo/airbnb-valuation-surges-past-
100-billion-in-biggest-u-s-ipo-of-2020-idUSKBN28K261 (accessed April 7, 2021).
PROGRESS CHECK
1. Which ratios would a financial institution be most interested in when determining
whether to grant a loan to a business?
2. What would a horizontal trend tell you about a firm’s performance?
ISTUDY
414 Chapter 8 Financial Statement Analytics
EXHIBIT 8-6
Visualizing Financial
Data with Heat Maps
and Sparklines
Microsoft Excel
Visualizing Hierarchy
A balance sheet, on the other hand, has an inherent hierarchy of accounts that is a good
candidate for a sunburst diagram. As shown in Exhibit 8-7, the center of the ring shows the
main sections of the balance sheet and their proportional size. As you move out, you see the
subgroups and individual accounts that make up the balance sheet.
EXHIBIT 8-7
Accoun
Sunburst Diagram
Cur
C h,
Oth t...
Showing Composition
An al...
Eq ash
s
Ca
ren
of a Balance Sheet
er
...
uiv
Sh d
Te rt
o
Inv rm
t...
Liab
Curr ies
No es
ilit
ent
Lia ncu
bil r...
itie
s L Non
iab cur
ilit ...
Li
ies
ab
t
ren
ili
Cur ets
tie
Ass
s
Retaine...
Assets Other
Current
Assets
k
mon Stoc
Com ck ty
Sto ui
Eq
... A
tio In Re cco
No sets
i ce un
r
d - l
he
As
P ap ...
r...
C
Go
od
to P ut...
Prop
attr ity
wi
Net t...
aren
Other.
Equ
ll
ib
er
of
..
ISTUDY
Chapter 8 Financial Statement Analytics 415
For some additional examples of visualizations that show financial data, including tree
diagrams, geographic maps, chord diagrams, and heat maps for word frequency in man-
agement discussion and analysis, explore the following website: rankandfiled.com, then
Explore Filers and click through to drill down to available visualizations.
PROGRESS CHECK
3. How might sparklines be used to enhance the DuPont analysis? Would you show
the sparklines for each component of the DuPont ROE disaggregation, or would
you propose it be shown only for the total?
2
Tim Loughran and Bill McDonald, “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and
10-Ks,” Journal of Finance 66, no. 1 (2011), pp. 35–65.
ISTUDY
416 Chapter 8 Financial Statement Analytics
EXHIBIT 8-8
Stock Market Reaction
(Excess Return) of
Companies Sorted by
Proportion of Negative
Words
The lines represent the
words from a financial
dictionary (Fin-Neg)
and a standard English
dictionary (H4N-inf).
EXHIBIT 8-9
Frequency of Words
Included in Microsoft’s
Annual Report
Microsoft Excel
Lab Connection
Lab 8-4 has you analyze the sentiment of Forms 10-K for various companies
and create a visualization. Specifically, you will be looking for the differences
between the positive and negative words over time.
ISTUDY
Chapter 8 Financial Statement Analytics 417
EXHIBIT 8-10
Visual Representation
of Positive and Negative
Words for Microsoft by
Year with a Line Chart
Showing the Change
in Positive-to-Negative
Ratio for the Same
Period
The bar chart represents
the number of positive
and negative words
and the line represents
the positive-to-negative
Microsoft Excel
ratio from a financial
dictionary (Fin-Neg).
PROGRESS CHECK
4. Which do you predict would have more positive sentiment in a 10-K, the foot-
notes to the financial statements or the MD&A (management discussion and
analysis) of the financial statements?
5. Why would you guess the results between the proportion of negative words and
the stock market reaction to the 10-K issuance differ between the Fin-Neg and
the H4N-Inf dictionary?
ISTUDY
418 Chapter 8 Financial Statement Analytics
EXHIBIT 8-11
Creating an XBRL SEC Central
Index Key
Instance Document
1. Taxonomy
(e.g., us-gaap-2020) Reporting
Periods
4. XBRL
Instance
3. Financial
Document Extension
Statements (Interactive
Financial Calculations
Statements)
2. Extension
Schema Extension
(e.g., abc-company) Presentation
5. Document
Validation
Extension
Labels
EDGAR Filing
(SEC)
The XBRL tag for cash and cash equivalents footnote disclosure is labeled as “CashAnd-
CashEquivalentsDisclosureTextBlock” and is defined as follows:
3
https://xbrl.us/xbrl-taxonomy/2017-us-gaap/.
ISTUDY
Chapter 8 Financial Statement Analytics 419
The entire disclosure for cash and cash equivalent footnotes, which may include the
types of deposits and money market instruments, applicable carrying amounts, restricted
amounts and compensating balance arrangements. Cash and equivalents include: (1)
currency on hand; (2) demand deposits with banks or financial institutions; (3) other
kinds of accounts that have the general characteristics of demand deposits; (4) short-
term, highly liquid investments that are both readily convertible to known amounts of
cash and so near their maturity that they present insignificant risk of changes in value
because of changes in interest rates. Generally, only investments maturing within
three months from the date of acquisition qualify.4
The use of tags allows data to be quickly transmitted and received, and the tags serve as
an input for financial analysts valuing a company, an auditor finding areas where an error
might occur, or regulators seeing if firms are in compliance with various regulations and
laws (like the SEC or IRS).
Preparers of the XBRL instance document compare the financial statement figures with
the tags in the taxonomy. When a tag does not exist, the preparer can extend the taxonomy
with their own custom tags. The taxonomy and extension schema are combined with the
financial data to generate the XBRL instance document, which is then validated for errors
and submitted to the regulatory authority.
Investors and analysts have been reluctant to use the data because of concerns about
its accuracy, consistency, and reliability. Inconsistent or incorrect data tagging,
including the use of custom tags in lieu of standard tags and input mistakes, causes
translation errors, which make automated analysis of the data unduly difficult.5
Part of the problem is that most companies outsource the preparation of XBRL financial
statements to other companies, such as Fujitsu and R.R. Donnelley, and they don’t validate
the data themselves. Another problem is that ambiguity in the taxonomy leads SEC filing
companies to select incorrect tags or use extension tags where a standard tag exists. Because
these statements are not audited, there is little incentive to improve data quality unless
stricter validation measures are put in place.
Despite these limitations, there is still value in analyzing XBRL data, and some provid-
ers have additional solutions to make XBRL data comparable. Outside data vendors cre-
ate standardized metrics to make the company reported XBRL data more comparable. For
example, Calcbench, a data vendor that eases financial analysis for XBRL users, makes
standardized metrics, noting:
4
https://xbrl.us/xbrl-taxonomy/2017-us-gaap/.
5
https://xbrl.us/data-quality/.
ISTUDY
420 Chapter 8 Financial Statement Analytics
IBM labels revenue as “Total revenue” and uses the tag “Revenues”, whereas Apple,
labels their revenue as “Net sales” and uses the tag “SalesRevenueNet”. This is a
relatively simple case, because both companies used tags from the FASB taxonomy.
Users are typically not interested in the subtle differences of how companies tag or
label information. In the previous example, most users would want Apple and IBM’s
revenue, regardless of how it was tagged. To that end, we create standardized metrics.6
Different data vendors such as XBRLAnalyst and Calcbench both provide a trace
function that allows you to trace the standardized metric back to the original source to see
which XBRL tags are referenced or used to make up the standardized metric.7
Exhibit 8-13 shows what a report using standardized metrics looks like for Boeing’s
balance sheet. Note the standardized XBRL tags used for Boeing could also be used to
access the financial statements for other SEC registrants.
6
https://knowledge.calcbench.com/hc/en-us/articles/230017408-what-is-a-standardized-metric
(accessed August 2017).
7
https://knowledge.calcbench.com/hc/en-us/articles/230017408-What-is-a-standardized-metric.
ISTUDY
Chapter 8 Financial Statement Analytics 421
EXHIBIT 8-13
Balance Sheet from
XBRL Data
Note the XBRL tag names
in the far left column.
Source: https://www.
calcbench.com/xbrl_to_excel
ISTUDY
422 Chapter 8 Financial Statement Analytics
EXHIBIT 8-14
DuPont Ratios Using
XBRL Data
Source: https://www.
calcbench.com/xbrl_to_excel
You’ll note for the Quarter 2 analysis in 2009, for DuPont (Ticker Symbol = DD), if you
take its profit margin, 0.294, multiplied by operating leverage of 20.1 percent multiplied by
the financial leverage of 471.7 percent, you get a return on equity of 27.8 percent.
PROGRESS CHECK
6. How does XBRL facilitate Data Analytics by analysts?
7. How might standardized XBRL metrics be useful in comparing the financial state-
ments of General Motors, Alphabet, and Alibaba?
8. Assuming XBRL-GL is able to disseminate real-time financial reports, which real-
time financial elements (account names) might be most useful to decision makers?
And which information might not be useful?
9. Using Exhibit 8-14 as the source of data and using the raw accounts, show the
components of profit margin, operating leverage, and financial leverage and
how they are combined to equal ROE for Q1 2014 for DuPont (Ticker = DD).
Summary
Data Analytics extends to the financial accounting and financial reporting space.
■ Financial statement analytics include descriptive analytics, such as financial ratios and
vertical analysis; diagnostic analytics, where we compare those to benchmarks from
prior periods or competitors; predictive analytics, including horizontal trend analysis;
and prescriptive analytics. (LO 8-1)
■ Sparklines and trendlines are efficient and effective tools to visualize firm performance,
and sunburst diagrams and heat maps help highlight values of interest. (LO 8-2)
■ Sentiment analysis could be used with financial statements, other financial reports, and
other financially related information to gauge positive and negative meaning from other-
wise text-heavy notes. (LO 8-3)
■ The XBRL taxonomy provides tags for more than 19,000 financial elements and allows
for the use of company-defined tags when the normal XBRL tags are not suitable. (LO 8-4)
ISTUDY
■ By tagging financial elements in a computer-readable manner, XBRL facilitates the accu-
rate and timely transmission of financial reporting to all interested stakeholders. (LO 8-4)
■ XBRL and Data Analytics allow timely analysis of the financial statements and the com-
putation of financial ratios. We illustrated their use by showing the DuPont ratio frame-
work. (LO 8-4)
Key Words
common size financial statement (407) A type of financial statement that contains only basic
accounts that are common across companies.
DuPont ratio (409) Developed by the DuPont Corporation to decompose performance (particularly
return on equity [ROE]) into its component parts.
financial statement analysis (406) Used by investors, analysts, auditors, and other interested
stakeholders to review and evaluate a company’s financial statements and financial performance.
heat map (414) A visualization that shows the relative size of values by applying a color scale to the data.
horizontal analysis (411) An analysis that shows the change of a value from one period to the next.
index (411) A metric that shows how much any given subsequent year has changed relative to the base year.
ratio analysis (407) A tool used to evaluate relationships among different financial statement items to
help understand a company’s financial and operating performance.
sparkline (413) A small visual trendline or bar chart that efficiently summarizes numbers or statistics
in a single spreadsheet cell.
standardized metrics (419) Metrics used by data vendors to allow easier comparison of company-
reported XBRL data.
sunburst diagram (414) A visualization that shows inherent hierarchy.
vertical analysis (407) An analysis that shows the proportional value of accounts to a primary
account, such as Revenue.
XBRL (eXtensible Business Reporting Language) (417) A global standard for exchanging
financial reporting information that uses XML.
XBRL-GL (420) Stands for XBRL–General Ledger; relates to the ability of an enterprise system to tag
financial elements within the firm’s financial reporting system.
XBRL taxonomy (417) Defines and describes each key data element (like cash or accounts payable).
The taxonomy also defines the relationships between each element (like inventory is a component of current
assets and current assets is a component of total assets).
ISTUDY
4. The MD&A section of the 10-K has management reporting on what happened in the most
recent period and what they expect will happen in the coming year. They are usually
upbeat and generally optimistic about the future. The footnotes are generally backward
looking in time and would be much more fact-based, careful, and conservative. We would
expect the MD&A section to be much more optimistic than the footnotes.
5. Accounting has its own lingo. Words that might seem negative for the English language
are not necessarily negative for financial reports. For this reason, the results differ based
on whether the standard English usage dictionary (H4N-Inf) or the financial dictionary
(Fin-Neg) is used. The relationship between the excess stock market return and the finan-
cial dictionary is what we would expect.
6. By each company providing tags for each piece of its financial data as computer readable,
XBRL allows immediate access to all types of financial statement users, be they financial
analysts, investors, or lenders, for their own specific use.
7. When journal entries and transactions are made in an XBRL-GL system, there is the possi-
bility of real-time financial reporting. In our opinion, income statement information (includ-
ing sales, cost of goods sold, and SG&A expenditures) would be useful to financial users
on a real-time basis. Any information that does not change frequently would not be as
useful. Examples include real-time financial elements, including goodwill; long-term debt;
and property, plant, and equipment.
8. Standardized metrics are useful for comparing companies because they allow for similar
accounts to have the same title regardless of the account names used by the various com-
panies. They allow for ease of comparison across multiple companies.
9. Profit margin = (Revenues – Cost of revenue)/Revenues = ($10.145B – $6.000B)/
$10.145B = 40.9%
Operating leverage (or Asset turnover) = Sales/Assets = ($10.145B/$47.800B) = 21.2%
Equity multiplier (or Financial leverage) = Assets/Equity = $47.800B/$16.442B = 290.7%
ROE = Profit margin × Operating leverage (or Asset turnover) × Financial leverage =
0.409 × 0.212 × 2.907 = 0.252
1. (LO 8-1) The DuPont analysis of return on equity (ROE) includes all of the following com-
ponent ratios except:
a. asset turnover.
b. inventory turnover.
c. financial leverage.
d. profit margin.
2. (LO 8-1) What type of ratios measure a firm’s operating efficiency?
a. DuPont ratios
b. Liquidity ratios
c. Activity ratios
d. Solvency ratios
3. (LO 8-1) Performance comparisons to a company’s own past or to its competition would
be considered _____ analytics.
a. prescriptive
b. descriptive
c. predictive
d. diagnostic
424
ISTUDY
4. (LO 8-2) In which stage of the IMPACT model (introduced in Chapter 1) would the use of
sparklines fit?
a. Track outcomes
b. Communicate insights
c. Address and refine results
d. Perform test plan
5. (LO 8-3) What computerized technique would be used to perform sentiment analysis on
an annual accounting report?
a. Text mining
b. Sentiment mining
c. Textual analysis
d. Decision trees
6. (LO 8-2) Determining how sensitive a stock’s intrinsic value to assumptions and
estimates made would be an example of _____ analytics.
a. diagnostic
b. predictive
c. descriptive
d. prescriptive
7. (LO 8-4) XBRL stands for:
a. Extensible Business Reporting Language.
b. Extensive Business Reporting Language.
c. XML Business Reporting Language.
d. Excel Business Reporting Language.
8. (LO 8-4) Which term defines and describes each XBRL financial element?
a. Data dictionary
b. Descriptive statistics
c. XBRL-GL
d. Taxonomy
9. (LO 8-4) What is the term used to describe the process of assigning XBRL tags inter-
nally within a financial reporting/enterprise system?
a. XBRL tagging
b. XBRL taxonomy
c. XBRL-GL
d. XBRL dictionary
10. (LO 8-4) What is the name of the output from data vendors to help compare companies
using different XBRL tags for revenue?
a. XBRL taxonomy
b. Data assimilation
c. Consonant tagging
d. Standardized metrics
ISTUDY
3. (LO 8-1) Why do audit firms perform analytical procedures to identify risk? Which type
of ratios (liquidity, solvency, activity, and profitability ratios) would you use to evaluate
the company’s ability to continue as a going concern?
4. (LO 8-4) Go to https://xbrl.us/data-rule/dqc_0015-lepr/ and find the XBRL element
name for Interest Expense and Sales, General, and Administrative Expense.
5. (LO 8-4) Go to https://xbrl.us/data-rule/dqc_0015-lepr/ and find the XBRL element name
for Other Nonoperating Income and indicate whether XBRL says that should normally
be a debit or credit entry.
6. (LO 8-1) Go to finance.yahoo.com and type in the ticker symbol for Apple (AAPL) and click
on the statistics tab. Which of those variables would be useful in assessing profitability?
7. (LO 8-4) Can you think of any other settings, besides financial reports, where tagged
data might be useful for fast, accurate analysis generally completed by computers?
How could it be used in a hospital setting? Or at your university?
8. (LO 8-3) Can you think of how sentiment analysis might be used in a marketing setting?
How could it be used in a hospital setting? Or at your university? When would it be
especially good to measure the sentiment?
Problems
1. (LO 8-1) Match the description of the financial statement analysis question to the data
analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
1. What is the firm’s debt relative to its assets in the past year?
2. What is the intrinsic value of the stock?
3. H
ow much would the company have earned if there had not been a
business interruption (due to fire or natural disaster)?
4. W
hat is the company’s cash flows from operating activities over the
past three years?
5. W
hy did the collectability of receivables (amount our customers owe
to the company) fall in the current quarter as compared to the prior
quarters?
6. Should we buy, sell, or hold the stock?
2. (LO 8-1) Match the description of the financial statement analysis technique to the data
analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
426
ISTUDY
Chapter 8 Financial Statement Analytics 427
3. (LO 8-1) Match the following descriptions or equations to one of the following compo-
nents of the DuPont ratio:
• Return on stockholders’ equity
• Profit margin
• Asset turnover
• Equity multiplier
• Return on assets
4. (LO 8-2) Match the following descriptions to one of the following visualization types:
• Heat map
• Sparklines
• Sunburst diagram
5. (LO 8-3) Analysis: Can you think of situations where sentiment analysis might be help-
ful to analyze press releases or earnings announcements? What additional informa-
tion might it provide that is not directly in the overall announcement? Would it be
useful to have sentiment analysis automated to just get a basic sentiment measure
versus the base level of sentiment expected in a press announcement or earnings
announcement?
6. (LO 8-3) Analysis: We noted in the text that negative words in the financial diction-
ary include words such as loss, claims, impairment, adverse, restructuring, and litiga-
tion. What other negative words might you add to that list? What are your thoughts on
positive words that would be included in the financial dictionary, particularly those that
might be different than standard English dictionary usage?
7. (LO 8-1) You’re asked to figure out how the stock market responded to Amazon’s
announcement on June 16, 2017, that it would purchase Whole Foods—arguably a
transformational change for Amazon, Walmart, and the whole retail industry.
ISTUDY
7A. Go to finance.yahoo.com, type in the ticker symbol for Amazon (AMZN), click on
historical data, and find the closing stock price on June 15, 2017. How much did
the stock price change to the closing stock price on June 16, 2017? What was the
percentage of increase or decrease?
7B. Do the same analysis for Walmart (WMT) over the same dates. How much did the
stock price change on June 16, 2017? What was the percentage of increase or
decrease? Which company was impacted the most and had the largest percentage
of change?
8. (LO 8-1) The preceding question asked you to figure out how the stock market
responded to Amazon’s announcement that it would purchase Whole Foods. The
question now is if the stock market for Amazon had higher trade volume on that day
than the average of the month before.
8A. Go to finance.yahoo.com, type in the ticker symbol for Amazon (AMZN), click on
historical data, and input the dates from May 15, 2017, to June 16, 2017. Down-
load the data, calculate the average volume for the month prior to June 16, and
compare it to the trading volume on June 16. What impact did the Whole Foods
announcement have on Amazon trading volume?
8B. Do the same analysis for Walmart (WMT) over the same dates. What impact did the
Whole Foods announcement have on Walmart trading volume?
Average Volume
Company 5/15/17 to 6/15/17 Volume on 6/16/17 Volume Change %
A) Amazon
B) Walmart
Word Category
1. ADVERSE
2. AGAINST
3. CLAIMS
4. IMPAIRMENT
5. LIMIT
6. LITIGATION
7. LOSS
8. PERMITTED
9. PREVENT
10. REQUIRED
11. BOUND
12. UNAVAILABLE
13. CONFINE
428
ISTUDY
10. (LO 8-3) Go to Loughran and McDonald’s sentiment word lists at https://sraf.nd.edu/
textual-analysis/resources/ and download the Master Dictionary. These lists are what
they’ve used to assess sentiment in financial statements and related financial reports.
Select the appropriate category (Litigious, Positive, Both, or None) for each given word
below, according to the authors’ word list.
Word Category
1. BENEFIT
2. BREACH
3. CLAIM
4. CONTRACT
5. EFFECT
6. ENABLE
7. SETTLEMENT
8. SHALL
9. STRONG
10. SUCCESSFUL
11. ANTICORRUPTION
12. DEFENDABLE
13. BOOSTED
429
ISTUDY
LABS ®
Lab 8-1 C
reate a Horizontal and Vertical Analysis Using
XBRL Data—S&P100
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: This lab will pull in XBRL data from S&P100 companies listed with the
Securities and Exchange Commission. You have the option to analyze a pair of companies
of your choice based on your own interest level. This lab will have you compare other com-
panies as well.
Data: Lab 8-1 SP100 Facts and Script.zip - 601KB Zip / 607KB Excel / 18KB Text
Microsoft Excel
LAB 8-1M Example of Dynamic Analysis in Microsoft Excel and Google Sheets
430
ISTUDY
Lab 8-1 Part 1 Connect to Your Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 8-1 [Your name] [Your email address].docx.
Financial statement analysis frequently involves identifying relationships between spe-
cific pieces of data. We may want to see how financial data have changed over time or how
the composition has changed.
To create a dynamic spreadsheet, you must first create a PivotTable (Excel) or connect
your sheet to a data source on the Internet (Google Sheets). Because Google Sheets is
hosted online, you will add the iXBRLAnalyst script to connect it to FinDynamics so you
can use formulas to query financial statement elements.
Because companies frequently use different tags to represent similar concepts (such as
the tags ProfitLoss or NetIncomeLoss to identify Net Income), it is important to make sure
you’re using the correct values. FinDynamics attempts to coordinate the diversity of tags by
using normalized tags that use formulas and relationships instead of direct tags. Normalized
tags must be contained within brackets []. Some examples are given in Lab Table 8-1.
[Cash, Cash Equivalents and Short-Term [Revenue] [Cash From Operations (CFO)]
Investments] [Cost of Revenue] [Changes in Working Capital]
[Short-Term Investments] [Gross Profit] [Changes in Accounts Receivables]
[Accounts Receivable, Current] [Selling, General & Administrative Expense] [Changes in Liabilities]
[Inventory] [Research & Development Expense] [Changes in Inventories]
[Other Current Assets] [Depreciation (&Amortization), IS] [Adjustments of Non-Cash Items, CF]
[Current Assets] [Non-Interest Expense] [Provision For Doubtful Accounts]
[Net of Property, Plant & Equipment] [Other Operating Expenses] [Depreciation (&Amortization), CF]
[Long-Term Investments] [Operating Expenses] [Stock-Based Compensation]
[Intangible Assets, Net] [Operating Income] [Pension and Other Retirement Benefits]
[Goodwill] [Other Operating Income] [Interest Paid]
[Other Noncurrent Assets] [Non-Operating Income (Expense)] [Other CFO]
[Noncurrent Assets] [Interest Expense] [Cash from Investing (CFI)]
[Assets] [Costs and Expenses] [Capital Expenditures]
[Accounts Payable and Accrued Liabilities, [Earnings Before Taxes] [Payments to Acquire Investments]
Current] [Income Taxes] [Proceeds from Investments]
[Short-Term Borrowing] [Income from Continuing Operations] [Other CFI]
[Long-Term Debt, Current] [Income from Discontinued Operations, Net [Cash From Financing (CFF)]
[Other Current Liabilities] of Taxes] [Payment of Dividends]
[Current Liabilities] [Extraordinary Items, Gain (Loss)] [Proceeds from Sale of Equity]
[Other Noncurrent Liabilities] [Net Income] [Repurchase of Equity]
[Noncurrent Liabilities] [Net Income Attributable to Parent] [Net Borrowing]
[Liabilities] [Net Income Attributable to Noncontrolling [Other CFF]
[Preferred Stock] Interest] [Effect of Exchange Rate Changes]
[Common Stock] [Preferred Stock Dividends and Other [Total Cash, Change]
[Additional Paid-in Capital] Adjustments] [Net Cash, Continuing Operations]
[Retained Earnings (Accumulated Deficit)] [Comprehensive Income (Loss)] [Net CFO, Continuing Operations]
[Equity Attributable to Parent] [Other Comprehensive Income (Loss)] [Net CFI, Continuing Operations]
[Equity Attributable to Noncontrolling [Comprehensive Income (Loss) Attributable [Net CFF, Continuing Operations]
Interest] to Parent] [Net Cash, DO]
[Stockholders’ Equity] [Comprehensive Income (Loss) Attributable [Net CFO, DO]
[Liabilities & Equity] to Noncontrolling Interest] [Net CFI, DO]
[Net CFF, DO]
431
ISTUDY
Microsoft | Excel
Note: For each of the items listed in the formula (e.g., AAPL, 2020, or [Net
income]), you can use cell references containing those values (e.g., B$1, $B$2,
and $A7) as inputs for your formula instead of hardcoding those values. These
references are used to look up or match the values in the PivotTable.
5. Using the formulas given above, answer the lab questions and continue to
Part 2.
432
ISTUDY
Rev. Confirming Pages
Google | Sheets
433
Microsoft | Excel
1. In your Excel document, add a new sheet and rename it Part2. Then enter
the values, as shown:
A B
1 Company AAPL
2 Year 2020
3 Period Y
4 Scale 1000000
2. Next, set up your financial statement using the following normalized tags and
periods. Note: Because we already identified the most current year in A2,
we’ll use a formula to find the three most recent years.
A B C D
6 = $B2 = B6-1 = C6-1
7 [Revenue]
8 [Cost of Revenue]
9 [Gross Profit]
10 [Selling, General &
Administrative Expense]
434
ISTUDY
A B C D
11 [Research & Development
Expense]
12 [Other Operating Expenses]
13 [Operating Expenses]
14 [Operating Income]
15 [Earnings before Taxes]
16 [Income Taxes]
17 [Net Income]
3. Now enter the = GETPIVOT() formula to pull in the correct values, using
relative or absolute references (e.g., $A7, $B$1, etc.) as necessary. For example,
the formula in B7 should be = GETPIVOTDATA(“Value”,PivotFacts!$A$3,
“Ticker”, $B$1,“Year”, B$6, “Account”, $A7)/$B$4.
4. If you’ve used relative references correctly, you can either drag the formula
down and across columns B, C, and D, or copy and paste the cell containing
the formula (don’t select the formula itself) into the rest of the table.
5. Use the formatting tools to clean up your spreadsheet.
6. Take a screenshot of your table (label it 8-1MB).
7. Next, you can begin editing your dynamic data and expanding your analysis,
identifying trends and ratios.
8. In your worksheet, add a sparkline to show the change in income statement
accounts over the three-year period:
A. In cell E7, go to Insert > Sparklines > Line in the ribbon.
B. Data Range: B7:D7
C. Location Range: E7
D. Click OK to add the line.
E. Next, copy the sparkline down the column. Note: The line is trending toward
the left as accounting reports typically show the most recent results on
the left.
9. Now perform a vertical analysis in the columns to the right showing each
value as a percentage of revenue:
a. Copy cells B6:D6 into F6:H6.
b. In F7, type = B7/B$7.
c. Drag the formula to fill in F7:H18.
d. Format the numbers as a percentage with one decimal place.
e. Add a sparkline in Column I.
10. Take a screenshot of your income statement (label it 8-1MC).
11. Now that you have a common size income statement, replace the company
ticker in cell B1 with any SP 100 company’s ticker (e.g., DD, XOM, or PG)
and press Enter. The data on the spreadsheet will update.
12. Using the formulas given above, answer the questions for this part. When you
are finished answering the lab questions, you may close Excel. Save your file
as Lab 8-1 SP100 Analysis.xlsx.
435
ISTUDY
Google | Sheets
A B
1 Company AAPL
2 Year 2020
3 Period Y
4 Scale 1000000
2. Then set up your financial statement using the following normalized tags and
periods. Note: Because we already identified the most current year in A2,
we’ll use a formula to find the three most recent years.
A B C D
6 = $B2 = B6-1 = C6-1
7 [Revenue]
8 [Cost of Revenue]
9 [Gross Profit]
10 [Selling, General & Administrative Expense]
11 [Research & Development Expense]
12 [Other Operating Expenses]
13 [Operating Expenses]
14 [Operating Income]
15 [Earnings before Taxes]
16 [Income Taxes]
17 [Net Income]
3. Now enter the = XBRLFact() formula to pull in the correct values, using rel-
ative or absolute references (e.g., $A7, $B$1, etc.) as necessary. For example,
the formula in B7 should be = XBRLFact($B$1,$A7,B$6,$B$3)/$B$4.
4. If you’ve used relative references correctly, you can either drag the formula
down and across columns B, C, and D, or copy and paste the cell (not the
formula itself) into the rest of the table.
5. Use the formatting tools to clean up your spreadsheet.
6. Take a screenshot of your table (label it 8-1GB).
7. Next, you can begin editing your dynamic data and expanding your analysis,
identifying trends and ratios.
8. In your Google Sheet, use a sparkline to show the change in income state-
ment accounts:
a. In cell E7, type: = SPARKLINE(B7:D7). Next, copy the sparkline down
the column. Note: The line is trending toward the left.
9. Now perform a vertical analysis in the columns to the right showing each
value as a percentage of revenue:
a. Copy cells B6:D6 into F6:H6.
b. In F7, type = B7/B$7.
436
ISTUDY
c. Drag the formula to fill in F7:H18.
d. Format the numbers as a percentage.
e. Add a sparkline in Column I.
10. Take a screenshot of your income statement (label it 8-1GC).
11. Now that you have a common size income statement, replace the company
ticker in cell B1 with your selected company’s ticker and press Enter. The
data on the spreadsheet will update.
12. Using the formulas given above, answer the questions for this part. When you
are finished answering the lab questions, you may close Google Sheets. Your
work will be saved automatically.
Lab 8-2 C
reate Dynamic Common Size Financial
Statements—S&P100
Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: This lab will pull in XBRL data from S&P100 companies listed with the
Securities and Exchange Commission. This lab will have you attempt to identify companies
based on their financial ratios.
Data: Lab 8-2 SP100 Facts Pivot.zip - 933KB Zip / 950KB Excel
437
ISTUDY
Lab 8-2 Example Output
By the end of this lab, you will create a dashboard that will let you explore common size
financial statement figures from annual reports for S&P100 companies. While your results
will include different data values, your work should look similar to this:
Microsoft Excel
LAB 8-2M Example of Dynamic Common Size Financial Statements in Microsoft Excel
and Google Sheets
438
ISTUDY
Revenue (millions) Assets (millions)
Company FY2020 FY2020
3M (MMM) is a multinational conglomerate that specializes in $32,184 $47,344
consumer goods and health care.
Merck (MRK) provides various health solutions through its pre- $47,994 $91,588
scription medicines, vaccines, biologic therapies, animal health, and
consumer care products worldwide.
Visa (V) is a financial services firm providing a payment processing $21,846 $80,919
network.
Walmart (WMT) operates retail stores in various formats worldwide. $523,964 $236,495
The company operates in three segments: Walmart U.S., Walmart
International, and Sam’s Club.
In Lab Exhibit 8-2B, you’ll find the common size ratios for each Lab Exhibit 8-2A company’s
income statement (as a percentage of revenue) and balance sheet (as a percentage of assets).
LAB EXHIBIT 8-2B
Mystery Ratios for Common Size Income Statement and Balance Sheet
A B C D E F G H I J
As a Percentage of Sales
Revenue 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Cost of Goods Sold 40.7 51.6 17.3 60.4 109.8 68.5 75.3 60.7 35.7 32.3
Gross Profit 59.3 48.4 82.7 39.6 9.8 31.5 24.7 39.3 64.3 67.7
Selling, General, and Administrative Expenses 29.5 21.5 9.5 7.4 8.3 16.8 20.8 22.9 22.5 21.8
Research and Development 0.0 5.8 0.0 0.0 4.3 0.0 0.0 0.0 12.9 28.2
Other Operating Expenses 2.6 50.4 22.6 86.6 0.3 68.5 0.0 1.1 1.0 1.2
Total Operating Expenses 32.1 77.7 35.5 94.1 12.2 85.3 20.8 24.8 36.6 51.3
Operating Income/Loss 27.3 22.3 64.5 5.9 22.0 14.7 3.9 14.5 27.6 16.5
Interest Expense 4.4 0.0 2.4 0.4 3.7 0.1 0.4 0.0 1.2 0.0
Income before Tax 29.5 20.9 63.1 6.3 24.9 15.3 3.8 12.7 28.3 18.3
Income Tax Expense 6.0 4.1 13.4 0.7 4.4 3.6 0.9 4.6 5.6 3.6
Net Income 23.5 16.7 49.7 5.5 20.4 12.3 2.9 13.4 22.7 14.8
As a Percentage of Assets
Current Assets 22.0 31.6 34.2 41.3 80.0 47.9 26.1 14.7 45.9 30.3
Receivables 0.6 9.9 2.0 7.6 1.3 0.0 2.7 3.4 5.8 8.6
Inventory 3.7 9.0 0.0 7.4 53.7 0.0 18.8 3.9 1.4 6.9
Other Current Assets 9.9 3.0 12.0 13.2 24.9 25.2 0.7 2.1 26.4 6.0
Total Current Assets 22.0 31.6 34.2 41.3 80.0 47.9 26.1 14.7 45.9 30.3
Long-Term Investments 22.1 0.0 0.3 0.0 0.7 0.9 0.0 8.9 6.0 0.9
Property, Plant, and Equipment 12.3 19.9 3.4 31.1 7.8 4.2 44.5 13.3 2.6 19.6
Goodwill 20.1 29.2 19.7 4.7 5.3 20.8 13.1 32.3 35.6 22.1
Intangible Assets 11.9 12.3 34.4 0.0 1.9 0.0 0.0 27.3 1.7 15.9
Total Long-Term Assets 78.0 68.4 65.8 58.7 20.0 52.1 73.9 85.3 54.1 69.7
Total Assets 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Liabilities 75.6 72.7 55.3 70.9 111.9 52.8 65.5 59.2 60.0 72.3
Current Liabilities 16.7 16.8 17.9 39.3 57.4 34.2 32.9 22.4 26.7 29.8
Accounts Payable 12.8 7.6 3.5 36.3 23.1 20.4 29.4 13.5 6.5 8.6
Total Non-Current Liabilities 58.9 55.9 37.3 15.2 54.5 18.7 27.1 36.9 33.3 42.4
Total Liabilities 75.6 72.7 55.3 70.9 111.9 52.8 65.5 59.2 60.0 72.3
Total Liabilities and Stockholders’ Equity 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
439
ISTUDY
Rev. Confirming Pages
Microsoft | Excel
Google | Sheets
1. Create a new Google Sheet called Lab 8-2 Mystery Ratios and connect it to
the iXBRLAnalyst add-in (or open the spreadsheet you created in Lab 8-1):
a. Click Extensions > Apps Script from the menu.
b. In a new browser window or tab, go to findynamics.com/gsheets/
ixbrlanalyst.gs (alternatively, open the ixbrlanalyst.txt file included in
the lab files).
c. Copy and paste the entire script from the FinDynamics page into the
Script Editor window, replacing any existing text.
d. Click Save and name the project XBRL, then click OK.
e. Close the Script Editor window and return to your Google Sheet.
f. Reload/refresh the page. If you see a new iXBRLAnalyst menu appear,
you are now connected to the XBRL data.
2. Use the = XBRLFACT() formula as well as the normalized accounts in Lab
8-1 to recreate the ratios above.
= XBRLFact($B$1,$A7,B$6,$B$3)/$B$4
Hint: Remember that the formula structure is = XBRLFact(company, tag, year,
period, member, scale). Use relative references (e.g., B6/B$6) to save yourself
some time. Fix the denominator row to match the Revenue and Assets values.
3. Take a screenshot (label it 8-2GA).
4. Answer the lab questions, then you may exit Google Sheets. Save your file as
Lab 8-2 Mystery Ratios.
440
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Financial analysts, investors, lenders, auditors, and many others perform
ratio analysis to help review and evaluate a company’s financial statements and financial
performance. This analysis allows the stakeholder to gain an understanding of the financial
health of the company and gives insights to allow more insightful and, hopefully, more effec-
tive decision making.
In this lab, you will access XBRL data to complete data analysis and generate finan-
cial ratios to compare the financial performance of several companies. Financial ratios can
more easily be calculated using spreadsheets and XBRL. For this lab, a template is provided
that contains the basic ratios. You will (1) select an industry to analyze, (2) create a copy
of a spreadsheet template, (3) input ticker symbols from three U.S. public companies, and
(4) calculate financial ratios and make observations about the state of the companies using
these financial ratios.
Data: Lab 8-3 SP100 Ratios.zip
441
ISTUDY
Microsoft | Excel & Google | Sheets
Microsoft Excel
LAB 8-3M Example of Financial Ratio Analysis in Microsoft Excel and Google Sheets
Microsoft | Excel
442
ISTUDY
c. Period: FY for a fiscal year
d. Round to: 1000000 will round to millions of dollars.
e. Comparable 1 Ticker: Company 2’s ticker
f. Comparable 2 Ticker: Company 3’s ticker
3. Take a screenshot (label it 8-3MA) of your figure with the financial state-
ments of your chosen companies.
4. Review the Facts sheet (or tab) to determine whether there are any values
missing for the companies you are analyzing. Click through the sheets at the
bottom to review the various ratios. To aid in this analysis, the template also
includes sparklines that provide a mini-graph to help you quickly visualize
any significant values or trends.
5. Take a screenshot (label it 8-3MB) of the DuPont ratios worksheet.
6. Answer the lab questions and then close Excel. Save your work as Lab 8-3
Financial Ratios with XBRL Analysis.xlsx.
Google | Sheets
443
ISTUDY
Lab 8-3 Objective Questions (LO 8-1)
OQ1. Which industry did you analyze?
OQ2. For your Company 1, which ratio has seen the biggest change from 2018 to
2020? Use sparklines or calculate the percentage change.
OQ3. Which of the three companies is most liquid in 2020, according to the quick
ratio?
OQ4. How well has your Company 1 managed short-term liabilities over the last
three years?
444
ISTUDY
Microsoft | Power BI Desktop
Tableau | Desktop
445
ISTUDY
Lab 8-4 Part 1 Filters and Calculations
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 8-4 [Your name] [Your email address].docx.
The dashboard you will prepare in this lab requires a little preparation so you can define
some calculations and key performance indicator targets and benchmarks for your evalua-
tion. Once you have these in place, you can create your visualizations. There are a number
of sentiment measures based on the Loughran-McDonald financial statement dictionary
(you will use the first three in this analysis). Each of these represents the count or number
of times a word from each dictionary appears in the financial statements:
• Lm.Dictionary.Count: Words that appear in the LM Dictionary, excluding filler words
and articles.
• Lm.Negative.Count: Words with negative financial sentiment, such as loss or impair-
ment. A high rate of negative words is associated with conservative reporting or nega-
tive news.
• Lm.Positive.Count: Words with positive financial sentiment. such as gain or asset. A
high rate of positive words is associated with optimistic reports.
• Lm.Litigious.Count: The total number of words with legal connotation, such as settle-
ment or jurisdiction. A high rate of litigious words indicated legal entanglement.
• Lm.Weak.Modal.Count: Words that indicate vacillation, such as might, possibly, or
could. A high rate of weak modal words is associated with volatility.
• Lm.Moderate.Modal.Count: Words such as probable or should. A high rate of moderate
modal words may reflect intention.
• Lm.Strong.Modal.Count: Words such as always or never. A high rate of strong modal
words is associated with certainty.
• Lm.Uncertainty.Count: Words such as doubt or unclear. A high rate of uncertain words
may reflect nebulous market or company conditions.
As an alternative, the Hv.Negative.Count value uses the Harvard General Inquirer off-the-
shelf dictionary that doesn’t take into account financial documentation and sentiment and
is useful for comparison.
446
ISTUDY
aggregates, such as year or company. Click Modeling > New Measure, then
enter the formula below as a new measure:
a. Pos Neg Ratio = SUM(‘SP100 Sentiment’[lm.positive.count])/
SUM(‘SP100 Sentiment’[lm.negative.count])
5. To enable KPI targets to use as benchmarks for comparison, add the following
new measure. Companies with the most positive to negative words have a PNR
above 0.7. Click Modeling > New Measure, then enter the formula below as a
new column. You can always edit this value later to change the benchmarks:
a. KPI Most Positive = 0.7
6. To enable the use of color on your graphs to show KPIs that exceed the tar-
get, create some additional measure based on IF. . .THEN. . . logic. Power
BI will apply conditional formatting with that color. Click Modeling > New
Measure, then enter the formula below as a new measure:
a. Most Positive = IF([Pos Neg Ratio]<[KPI Most
Positive],”#f28e2b”,”#4e79a7”)
7. Scroll your field list to show your new calculated values and take a screen-
shot (label it 8-4MA). Note: Your report should still be blank at this point.
8. Save your file as Lab 8-4 SP100 Sentiment Analysis.pbix. Answer the ques-
tions for this part and then continue to next part.
Tableau | Desktop
447
ISTUDY
c. Current value: 0.7
d. Value when workbook opens: Current value
e. Display format: Automatic
f. Allowable values: All
g. Click OK.
5. To enable the use of color on your graphs to show positive and negative com-
panies in orange and blue, respectively, create the following calculated field
then click OK.
a. Most Positive = [Pos Neg Ratio]<[KPI Most Positive]
6. Scroll your field list to show your new calculated values and take a screen-
shot (label it 8-4TA). Note: Your report should still be blank at this point.
7. Save your file as Lab 8-4 SP100 Sentiment Analysis.twb. Answer the ques-
tions for this part and then continue to next part.
1. Open your Lab 8-4 SP100 Sentiment Analysis.pbix file created in Part 1 and
go to the SP100 Sentiment tab.
2. Add a new Slicer to your page and resize it so it fits the top-right corner of
the page.
a. Field: Exchange
b. Filter: NYSE
448
ISTUDY
Rev. Confirming Pages
3. Add a new Slicer to your page and resize it so it fits just below your first slicer.
a. Field: date.filed > Date Hierarchy > Year
b. Filter: 2016 to 2019
4. Add a new Slicer to your page and resize it so it fits the bottom-right corner
of the page.
a. Field: company.name > Right-click and choose Rename for this visual and
rename SP100 Company.
b. Filter: all
5. Click on the blank part of the page and add a new Line and Stacked Col-
umn Chart to the page to show the proportion of positive to negative words.
Resize it so that it fills the top-left quarter of the remaining space on the page.
a. Drag the following fields to their respective field boxes:
1. X-axis: date.filed
2. Column Y-axis: lm.positive.count > Rename to Positive
3. Column Y-axis: Negative
4. Line Y-axis: Pos Neg Ratio > Rename to Positive-to-Negative Ratio
b. Click the Format visual (paintbrush) icon to add a title to your chart:
1. General > Title > Title text: Proportion of Positive and Negative Words
6. Click on the blank part of the page and add a new Table. Resize it so that it
fills the top-right quarter of the remaining space on the page.
a. Columns:
1. date.filed > Date Hierarchy > Year
2. word.count > Rename to Word Count
3. lm.positive.count > Rename to Positive WC
4. lm.negative.count > Rename to Negative WC
5. Pos Neg Ratio > Rename to PN Ratio
b. Click the Format visual (paintbrush) icon to add backgrounds to the data
values:
1. Go to Visual > Cell elements > Apply settings to and click Conditional
formatting (fx) below data bars. Use the Series field drop-downs to en-
able data bars for Word Count, Positive WC, Negative WC, and PN Ratio.
c. Take a screenshot (label it 8-4MB) of your first two visuals.
7. Click on the blank part of the page and add a new Stacked Column Chart.
Resize it so that it fills the bottom-left third of the remaining space on the page.
a. X-axis: company.name
b. Y-axis: Pos Neg Ratio
c. Click the Format visual (paintbrush) icon:
1. Visual > Y-axis > Title: PN Ratio
2. Visual > X-axis > Title: Company
3. Visual > Columns > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. Based on field: Most Positive
4. Title > Title text: Most Positive Companies
449
8. Click on the blank part of the page and add a new Stacked Column Chart. Resize
it so that it fills the bottom-middle third of the remaining space on the page.
a. X-axis: company.name
b. Y-axis: word.count > Average
c. Click the Format visual (paintbrush) icon:
1. Visual > Y-axis > Title: Average Word Count
2. Visual > X-axis > Title: Company
3. Visual > Columns > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Most Positive
4. General > Title > Text: Companies with the Most to Say
9. Click on the blank part of the page and add a new Scatter Chart. Resize it so
that it fills the bottom-right third of the remaining space on the page.
a. Drag the following fields to their respective boxes:
1. Values: company.name
2. X-axis: word.count > Average
3. Y-axis: Pos Neg Ratio
b. Click the Analytics magifying glass icon to add a trend line to your scatter
chart:
1. Trend line > On > Combine Series > Off
c. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > X-axis > Title: Average Word Count
2. Visual > Y-axis > Title: PN Ratio
3. General > Title > Text: PNR by Avg Word Count
d. Click the more options (three dots) in the top-right corner of the scatter
chart and choose Automatically find clusters. Set the number of clusters
to 6 and click OK.
e. Take a screenshot (label it 8-4MC) of your completed dashboard.
10. When you are finished answering the lab questions, you may close Power BI
Desktop and save your file.
Tableau | Desktop
1. Open your Lab 8-4 SP100 Sentiment Analysis.twb file from Part 1. Then add
the following:
a. Drag the following fields to their respective shelves:
1. Columns: Date.Filed > Year
2. Rows:Measure Values
3. Measure Values: Remove all values except:
450
451
ISTUDY
4. Create a new worksheet called Companies with the Most to Say and add the
following:
a. Columns: Company Name
b. Rows: Word Count > Average
c. Marks > Color: Most Positive
d. Sort by Avg. Word Count descending.
5. Create a new worksheet called PNR by Avg Word Count and add the following:
a. Columns: Word Count > Average
b. Rows: Pos Neg Ratio
c. Marks > Detail: Company Name
d. Analytics > Cluster: Number of Clusters: 6
e. Analytics > Trend Line > Linear
6. Finally, create a new dashboard tab called SP100 Sentiment and add your
charts from this part of the lab. Note: There will be two visuals on the top row
and three visuals on the bottom row.
a. In the Dashboard tab, change the size from Desktop Browser > Fixed
Size to Automatic.
b. Drag the Proportion of Positive and Negative Words sheet to the dash-
board. This will occupy the top-left corner of the dashboard.
c. Drag the Word Count sheet to the right side of the dashboard.
d. Drag the Most Positive Companies sheet to the bottom-left corner of the
dashboard.
e. Drag the Companies with the Most to Say sheet to the bottom-right cor-
ner of the dashboard.
f. Drag the PNR by Avg Word Count sheet to the bottom-right corner of the
dashboard.
g. In the top-right corner of each sheet on your new dashboard, click Use as
Filter (funnel icon) to connect the visualizations so you can drill down
into the data.
7. Take a screenshot (label it 8-4TC) of your completed dashboard.
8. When you are finished answering the lab questions, you may close Tableau
Desktop and save your file.
452
ISTUDY
OQ6. According to the visual showing NYSE companies’ total word count, does
the majority of the companies with a high total word count appear to have a
positive-negative ratio higher or lower than the benchmark?
453
ISTUDY
Chapter 9
Tax Analytics
A Look Back
In Chapter 8, we focused on how to access and analyze financial statement data. We highlighted the use of XBRL
to quickly and efficiently gain computer access to financial statement data. Next, we explained how ratios are used
to analyze financial performance. We also discussed the use of sparklines to help users visualize trends in the data.
Finally, we discussed the use of text mining to analyze the sentiment in financial reporting data.
A Look Forward
In Chapter 10, we together all of the accounting Data Analytics concepts with a set of exercises that walk all the way
through the IMPACT model. The chapter serves as a great way to combine all of the elements learned in the course.
454
ISTUDY
Knowing the tax liability for a move to a new jurisdiction is important for corpora-
tions and individuals alike. For example, a tax accountant might have advised
LeBron James not to sign with the Los Angeles Lakers in the summer of 2018
because it is expected it will cost him $21 million more in extra state income
taxes since California has higher taxes than Ohio.
Tax data analytics for this type of “what-if scenario analysis” could help frame
LeBron’s decision from a tax planning perspective. Such what-if scenario analysis
has wide application when contemplating new legislation, a merger possibility,
a shift in product mix, or a plan to set up operations in a new low-tax (or high-
tax) jurisdiction. Tesla recently applied tax planning concepts when considering
the tax incentives available for locating its Cybertruck factory. Tesla ultimately
decided to build its factory in Texas.
Sources: https://www.forbes.com/sites/seanpackard/2018/07/02/lebrons-move-
could-cost-him-21-million-in-extra-state-taxes/#6517d3156280 (accessed August 2,
2018); https://techcrunch.com/2020/07/14/tesla-lands-another-tax-break-to-locate-
Jim McIsaac/Contributor/Getty images cybertruck-factory-in-texas (accessed April 20, 2021).
OBJECTIVES
After reading this chapter, you should be able to:
455
ISTUDY
456 Chapter 9 Tax Analytics
ISTUDY
Chapter 9 Tax Analytics 457
Analytics Type Potential Tax Questions Addressed Data Analytics Techniques Used EXHIBIT 9-1
(Continued)
Predictive—identify common What is the expected taxable in- Sales, earnings, and cash-flow
attributes or patterns that may come over the next 5 years? forecasting using:
be used to forecast similar How much tax personnel turnover • Time series
activity to address the following will we expect to have in the next few • Analyst forecasts
questions: years? • Competitor and industry
Will it happen in the future? performance
What is the probability
• Macroeconomic forecasts
something will happen? Is it
forecastable?
Prescriptive—recommend action Based on expected future income and Tax planning based on what-if
based on previously observed expected future tax rates, how can we analysis and what-if scenario
actions to address questions of minimize future taxes? analysis
this type: How can we take maximum Sensitivity analysis
What should we do based on advantage of R&D tax credits over
what we expect will happen? the next few years?
How do we optimize our
performance based on potential
constraints?
Descriptive Analytics
Descriptive tax analytics provide insight into the current processes, policies, and calcula-
tions related to determining tax liability. These analytics involve summarizing transactions
by jurisdiction or category to more accurately calculate tax liability. They also track how
well the tax function is performing using tax key performance indicators (KPIs). We pro-
vide an example of these KPIs later in the chapter.
Diagnostic Analytics
Diagnostic tax analytics might help identify items of interest, such as high tax areas or
excluded transactions. For example, creating a trend analysis for sales and use tax paid
in different locations would help identify seasonal patterns or abnormal transaction vol-
ume that warrant further investigation. Diagnostic analytics also look for differences from
expectations to help determine if adequate taxes are being paid. One way for tax regula-
tors to assess if companies are paying sufficient tax is to look at the differences between
the amount of income reported for financial reporting purposes (like Form 10-Q or 10-K
submitted to the SEC) and the amount reported to the IRS (or other tax authorities) for
income tax purposes. Increasingly, tax software and analytics (such as Hyperion or Corp-
tax) are used to help with the reconciliation to find both permanent and temporary differ-
ences between the two methods of computing income and also to provide needed support
for IRS Schedule M-3 (Form 1120).
Predictive Analytics
Predictive tax analytics use historical data and new information to identify future tax liabili-
ties and may be used to forecast future performance, such as the expected amount of taxes
to be paid over the next 5 years. On the basic level, this includes regression and what-if
analyses and requires a specific dependent variable (or target, such as the value of a tax
credit or deferred tax asset). The addition of ancillary data, including growth rates, trends,
and other identified patterns, aids to the usefulness of these analyses. Additionally, tax ana-
lytics rely on tax calculation logic and tax determination, such as proportional deductions,
to determine the potential tax liability.
Another example of predictive analytics in tax would be determining the amount of R&D
tax credit the company may qualify to take over the next 5 years. The R&D tax credit is a tax
ISTUDY
458 Chapter 9 Tax Analytics
credit under Internal Revenue Code section 41 for companies that incur research and devel-
opment (R&D) costs. To receive this credit, firms must document an appropriate level of
detail before receiving R&D tax credit. For example, companies have to link an employee’s
time directly to a research activity or to a specific project to qualify for the tax credit. Let’s
suppose that a firm spent money on qualifying R&D expenditures but simply did not keep
the sufficient detail needed as supporting evidence to receive the credit. Analytics could be
used to not only predict the amount of R&D tax credit it may qualify for, but also find the
needed detail (timesheets, calendars, project timelines, document meetings between various
employees, time needed for management review, etc.) to qualify for the R&D tax credit.
Prescriptive Analytics
Building upon predictive analytics, prescriptive analytics use the forecasts of future perfor-
mance to recommend actions that should be taken based on opportunities and risks facing
the company. Such prescriptive analytics may be helpful in tax planning to minimize the
payments of taxes paid by a company. If a trade war with China occurred, it is important to
assess the impact of this event on the company’s tax liability. Or if there is the possibility of
new tax legislation, prescriptive analytics might be used to justify whether lobbying efforts
or payments to industry coalitions are worth the cost. Tax planning may also be employed
to help a company determine the structure of a transaction, such as a merger or acquisition.
To do so, prescriptive analytics techniques such as what-if scenarios, goal-seek analysis, and
scenario analysis would all be useful in tax planning to lower potential tax liability.
PROGRESS CHECK
1. Which types of analytics are useful in tax planning—descriptive, diagnostic,
predictive, or prescriptive—and why?
2. How can tax analytics support and potentially increase the amount of R&D tax
credit taken by a company?
ISTUDY
Chapter 9 Tax Analytics 459
and used for financial accounting purposes, where transactions that have an economic
impact are recorded as an input for the financial statements and other financial reporting
purposes. In addition, these financial reporting systems along with other data have also
been used for management accounting purposes to allow management to calculate the cost
of a product or to optimize a product mix that would maximize profits for the firm. There is
generally not a completely separate information system solely collecting tax data needed for
tax compliance and tax planning.
With little integration between the financial reporting system and the needs of the tax
function, tax departments would manually collect and extract data from their financial
reporting system and generalized data warehouse. After gathering data from these general-
ized data warehouses, tax departments would use Excel spreadsheets to capture and store
the detail needed to support tax calculations. Such lack of integration hampered efforts of
tax accountants to have the needed information to comply with tax law, to minimize current
taxes, and to allow for tax planning for future transactions.
With recent advances in technology, there are increasing opportunities for tax
departments to have greater control of their data, which allows them to work more
effectively and efficiently. Specifically, instead of using a generalized data warehouse,
enterprise systems increasingly use specific data marts for their tax function. Data
marts are defined as being a subset of the data warehouse oriented toward a specific
need. Such a tax data mart is used to extract past and real-time data from the financial
reporting system that are most applicable to the tax function. Tax departments are able
to specify which data might affect their tax calculations for their tax data mart and have
a continuous feed of those data. This tax data mart allows tax departments to more
completely “own” the data because no other group has the rights to modify them. They
can add to that tax data mart other relevant information that might come from other
sources.
They are also able to keep the tax data mart as a centralized repository so that different
users of the tax function can have access to the data. Exhibit 9-2 provides a good illustration
of how data are accumulated and subsequently dedicated for the tax function. Consistent
with the IMPACT model, tax data warehouses and tax data marts help tax departments to
“master the data” to address tax questions and issues inside the company.
EXHIBIT 9-2
Tax Data in a Data
Warehouse
ISTUDY
460 Chapter 9 Tax Analytics
With so much data available, there is a need for accountants to “bring order” to the
data and add value by presenting the data “in a way that people trust it.”
Indeed, tax accountants use the available data to:
• Comply with tax law.
• Get every possible deduction to minimize tax cost.
• Perform tax planning to minimize taxes in the future.
They do this by acquiring, managing, storing, and analyzing the needed data to
perform these tasks.
Source: https://nasba.org/blog/2021/03/24/why-are-cpas-necessary-in-todays-world/ (accessed
April 20, 2021).
1
https://www.forbes.com/sites/jenniferpryce/2018/08/14/theres-a-6-trillion-opportunity-in-opportunity-
zones-heres-what-we-need-to-do-to-make-good-on-it/#527391d46ffc (accessed August 15, 2018).
ISTUDY
Chapter 9 Tax Analytics 461
PROGRESS CHECK
3. Why do tax departments need to extract data for tax calculation from a financial
reporting system?
4. How is a tax data mart specifically able to target the needs of the tax department?
2
https://washington.cbslocal.com/2014/04/16/report-irs-data-mining-facebook-twitter-instagram-and-
other-social-media-sites/ (accessed August 2018).
ISTUDY
462 Chapter 9 Tax Analytics
3
“Defining Success: What KPIs Are Driving the Tax Function Today,” PwC, September 2017, https://www.
pwc.com/gx/en/tax/publications/assets/pwc_tax_function_of_the_future_tax_function_KPI_sept17.pdf
(accessed August 14, 2018).
ISTUDY
Chapter 9 Tax Analytics 463
Tax risk: With increased regulator and stakeholder scrutiny, firms bear the financial
and reputational risk of misreporting or tax provision adjustments. Example KPIs
include:
• Frequency and magnitude of tax audit adjustments.
• Frequency of concerns pertaining to the organization’s tax position.
• Levels of late filing or error penalties and fines.
• Number of resubmitted tax returns due to errors.
Tax efficiency and effectiveness: This includes the efficiency and effectiveness of technology,
processes, and people in carrying out the tax function. Example KPIs include:
• Levels of technology/tax training.
• Amount of time spent on compliance versus strategic activities.
• Level of job satisfaction of the tax personnel.
• Employee turnover of the tax personnel.
• Improved operational efficiency.
Tax sustainability: Refers to the ability to sustain similar tax performance over time.
Example KPIs include:
• Number of company tax audits closed and significance of assessment over time.
• The effective tax rate (ETR) over time.
Tax permanent differences: Additionally, tax managers should track permanent differ-
ences between book and tax revenue and expenses to ensure compliance and dispute
overpayments of taxes. These include:
• Penalties and fines (excluded from taxable income).
• Meals and entertainment (100 percent books, 50 percent tax).
• Interest on municipal bonds (nontaxed income).
• Life insurance proceeds (nontaxed income).
• Dividends received deduction (taxed based on percentage of ownership).
• Excess depreciation.
These tax-focused KPIs appear on dashboards or cockpits, consistent with the “C”
(communicate insights) and the “T” (track outcomes) of the IMPACT model. Cock-
pits are similar to dashboards but are much narrower in scope and focus than a dash-
board. This focus allows the tax function to highlight potential high-impact or single
areas of concern like reconciliation. We also note that the tax sustainability KPIs,
in particular, measure performance over time and are consistent with the “T” (track
outcomes) of the IMPACT model.
PROGRESS CHECK
5. Why is ETR (effective tax rate) a good example of a tax cost KPI? Why is ETR over
time considered to be a good tax sustainability KPI?
6. Why would a company want to track the levels of late filing or error penalties as
a tax risk KPI?
ISTUDY
464 Chapter 9 Tax Analytics
What-If Scenarios
What-if scenario analysis tests the impact of various input data on an expected output. In
tax, this means the manipulation of inputs—such as multiple tax rates, a series of transac-
tions, and varying profit margins—to estimate the future outputs, including estimated book
income, cash taxes paid, and effective tax rates. These analyses attempt to optimize the
inputs to reach a desired goal, such as minimizing the effective tax rate or generating a port-
folio of possible outputs given the inputs. In these cases, we need to estimate the possible
inputs and outputs as well as determine the expected probabilities of those items.
For example, assume the Pennsylvania General Assembly is debating a reduction in
the statutory corporate income tax rate from 10 percent to either 8 percent or 7 percent
with a positive (+5 percent), neutral, or negative (−5 percent) change in corporate income.
A company with expected earnings before tax of $1,000,000 might see potential tax savings
shown in Exhibit 9-3.
ISTUDY
Chapter 9 Tax Analytics 465
By itself, this analysis may indicate the path to minimizing tax would be the lower tax
rate with negative growth. An estimate of the joint probabilities of each of the nine scenar-
ios determines the expected value of each, or the most likely impact of a change (as shown
in Exhibit 9-4) and the dollar impact of the expected change in value (in Exhibit 9-5). For
example, there is a 0.05 probability (as shown in Exhibit 9-4) that there will be +5 percent
change in taxable income but no change in tax rate. This would result in a $250 increase
in taxes (as shown in Exhibit 9-5). In this case, the total expected value of the proposed
decrease in taxes is $15,575, which is the sum of the individual expected values as shown
in Exhibit 9-5.
Change in Taxable Income / Change in Tax Rate 10% 8% 7% Sum EXHIBIT 9-4
Joint Probabilities of
Positive change (+5%) 0.05 0.10 0.10 0.25 Changes in Tax Rate
Neutral change (+0%) 0.20 0.20 0.10 0.50 and Change in Income
Negative change (−5%) 0.10 0.10 0.05 0.25
Sum 0.35 0.40 0.25 1.00
The usefulness of the what-if analysis is that decision makers can see the possible impact
of changes in tax rates across multiple scenarios. This model relies heavily on assumptions
that drive each scenario, such as the initial earnings before tax, the expected change in
earnings, and the possible tax rates. Data Analytics helps refine initial assumptions (i.e.,
details guiding the scenarios) to strengthen decision makers’ confidence in the what-if mod-
els. The more analyzed data that are available to inform the assumptions of the model, the
more accurate the estimates and expected values can be. Here, data analysis of before-tax
income and other external factors can help determine more accurate probability estimates.
Likewise, an analysis of the legislative proceedings may help determine the likelihood of a
change.
ISTUDY
466 Chapter 9 Tax Analytics
PROGRESS CHECK
7. What are some data a tax manager would need in order to perform a what-if
analysis of the potential effects of a stock buyback?
8. How does having more metadata help a tax accountant minimize taxes?
ISTUDY
Summary
Recent advances in Data Analytics extend to tax functions, allowing them to work more
effectively, efficiently, and with greater control over the data.
■ Tax analytics allow tax departments, accounting firms, and regulators to address tax
opportunities and challenges. (LO 9-1)
■ Tax accountants use descriptive analytics to summarize performance, diagnostic analyt-
ics to compare with a benchmark and control costs, predictive analytics to predict tax
outcomes, and prescriptive analytics for tax planning purposes. (LO 9-1)
■ New regulations are requiring greater detail, and tax regulators are getting more adept in the
use of analytics, better able to address tax compliance and opportunities. In addition to the
regulator side, tax filers now have more data to support their tax calculations. (LO 9-2)
■ While the tax department has traditionally used data from the financial reporting system,
there are increasing opportunities to control and expand upon available tax data to help
address the most important tax questions. (LO 9-2)
■ Tax visualizations (dashboards, cockpits) can be helpful in monitoring how the tax function
is doing in meeting its KPIs (key performance indicators). (LO 9-3)
■ Prescriptive analytics can be especially powerful in doing tax planning and formulating
what-if scenarios. (LO 9-4)
Key Words
Tax Cuts and Jobs Act of 2017 (460) Tax legislation offering a major change to the existing tax code.
data mart (459) A subset of the data warehouse focused on a specific function or department to assist
and support its needed data requirements.
data warehouse (459) A repository of data accumulated from internal and external data sources,
including financial data, to help management decision making.
tax data mart (459) A subset of a company-owned data warehouse focused on the specific needs of
the tax department.
tax planning (464) Predictive analysis of potential tax liability and the formulation of a plan to reduce
the amount of taxes paid.
what-if scenario analysis (464) Evaluation of the impact of different tax scenarios/alternatives on
various outcome measures including the amount of taxable income or tax paid.
467
ISTUDY
the effective tax rate, the more effective the tax department is at finding ways to structure
transactions to minimize taxes and find applicable tax deductions and tax credits (like the
R&D tax credit or other tax loopholes). Monitoring the level of the ETR over time helps us
know if the tax department is persistent and consistent in reducing the taxes paid, or if this
rate is highly variable. Generally, most tax professionals would consider the more stable
the ETR over time, the better. Tracking ETR over time as part of the tax sustainability KPIs
allows management and the tax department to figure out if the ETR is persistent or if the
rate bounces around each year in an unsustainable way.
6. The greater the number of levels of late filings or error penalties, the more vulnerable the
company is to penalties, tax audits, and missed tax saving opportunities.
7. Data may include the possible price of the stock, the potential capital gains incurred by
the stockholders, and the number of shares.
8. The more metadata, the better the tax accountants can accurately calculate the amounts
of taxable and nontaxable items. For example, they can more clearly identify expenses
that qualify for the research and development credit or track meal and entertainment
expenses that may trigger tax presence in other locations.
1. (LO 9-1) In which stage of the IMPACT model (introduced in Chapter 1) would the use of
tax cockpits fit?
a. Track outcomes
b. Master the data
c. Address and refine results
d. Perform test plan
2. (LO 9-2) Tax departments interested in maintaining their own data are likely to have
their own:
a. tax reporting system.
b. tax data mart.
c. tax dashboard.
d. tax analytics.
3. (LO 9-3) According to the textbook, an example of a tax efficiency and effectiveness KPI
would be:
a. number of audits closed.
b. ETR (effective tax rate) over time.
c. number of resubmitted tax returns due to errors.
d. amount of time spent on compliance versus strategic activities.
4. (LO 9-3) According to the textbook, an example of a tax sustainability KPI would be:
a. frequency of concerns pertaining to the organization’s tax position.
b. level of job satisfaction of the tax personnel.
c. levels of technology/tax training.
d. number of audits closed and significance of assessment over time.
5. (LO 9-3) According to the textbook, an example of a tax cost KPI would be:
a. employee turnover of the tax personnel.
b. levels of technology/tax training.
c. ETR (effective tax rate).
d. levels of late filing or error penalties.
468
ISTUDY
6. (LO 9-4) The task of tax accountants and tax departments to minimize the amount of
taxes paid in the future is called:
a. tax planning.
b. tax compliance.
c. tax minimization.
d. tax sustainability.
7. (LO 9-3) According to the textbook, an example of a tax risk KPI would be:
a. employee turnover of the tax personnel.
b. levels of technology/tax training.
c. ETR (effective tax rate).
d. levels of late filing or error penalties.
8. (LO 9-3) What allows tax departments to view multiple years, periods, jurisdictions
(state or federal or international, etc.), and differing scenarios of data, typically through
use of a dashboard?
a. Tax data visualizations
b. Tax data warehouses
c. Tax compliance data
d. Tax planning
9. (LO 9-4) Predictive analysis of potential tax liability and the formulation of a plan to
reduce the amount of taxes paid is defined as:
a. tax data analytics.
b. tax data warehouses.
c. tax compliance data.
d. tax planning.
10. (LO 9-3) The evaluation of the impact of different tax scenarios/alternatives on various
outcome measures including the amount of taxable income or tax paid is called:
a. tax visualizations.
b. what-if scenario analysis.
c. tax compliance.
d. data warehousing.
469
ISTUDY
7. (LO 9-1) Descriptive analytics help calculate tax liability more accurately. Give some
examples of tax-related descriptive analytics.
8. (LO 9-1) Predictive analytics help identify future tax liabilities. What data would a tax
accountant need in order to perform a predictive analysis?
9. (LO 9-4) Explain how probability helps refine a what-if analysis.
10. (LO 9-3) How do visualizations of tax compliance assist a company in its efforts to
reduce tax risk and minimize the costs of tax preparation and compliance? In your opin-
ion, what would be needed to consistently make visualizations a key part of the tax
department evaluation of tax risk and tax cost minimization?
Problems
1. (LO 9-1) Match the description of the tax analysis question to the data analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
2. (LO 9-1) Match the description of the tax analytics technique to the data analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
3. (LO 9-3) Match the following tax KPIs to one of these areas:
• Tax cost
• Tax risk
470
ISTUDY
Tax KPI Tax Area
1. Effective tax rate (ETR).
2. Levels of late filing or error penalties and fines.
3. Cash taxes paid.
4. Expiration of tax credits.
5. Frequency of concerns pertaining to the organization’s tax position.
6. Number of resubmitted tax returns due to error.
4. (LO 9-3) Match the following tax KPIs to one of these areas:
• Tax sustainability
• Tax efficiency/effectiveness
5. (LO 9-3) Analysis: In your opinion, which of the four general categories of tax KPIs men-
tioned in the text would be most important to the CEO? Support your opinion.
6. (LO 9-4) Analysis: Assume that a company has the option of staying in a tax jurisdic-
tion with an effective tax rate of 20 percent or moving to a different location where the
effective tax rates are 11 percent. What other drivers besides the tax rate may affect the
decision to stay or move?
7. (LO 9-4) Analysis: If a company knows that the IRS will change a tax calculation in the
future, such as the capitalization of research and experimental expense in 2021, what
actions might management take today to reduce their tax liability when the new policy
goes into effect?
8. (LO 9-4) Analysis: How does tax planning differ from tax compliance? Why might the
company leadership be more excited about the value-creating efforts of tax planning
versus that of tax compliance?
9. (LO 9-2) Analysis: How does Data Analytics facilitate what-if scenario analysis? How
does the presence of a tax data mart help provide the needed data to support such
analysis?
10. (LO 9-2) Match the tax analytics definitions to their terms: data mart, data warehouse,
tax planning, tax data mart, what-if scenario analysis.
471
ISTUDY
LABS ®
Microsoft | Excel
Microsoft Excel
LAB 9-1M Example of Visual Distributions of Sales Tax Rates in Microsoft Excel
472
ISTUDY
Tableau | Desktop
Lab 9-1T Example of Visual Distributions of Sales Tax Rates in Tableau Desktop
Microsoft | Excel
473
ISTUDY
Chart Element > Axes > More Axis Options > Adjust Bin width to
.00791, and click Enter on your keyboard.
2. A filled map: Click out of the histogram and into the data so your
active cell is in the table, then click Insert tab on the ribbon >
Recommended Charts > Filled Map, click OK.
a. If prompted to send data to Bing, click I accept.
3. Rearrange your visualizations so you can see both.
4. Rename the sheet Lab 9-1 Output.
2. Take a screenshot that includes both visualizations (label it 9-1 MA).
3. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-1 State Sales Tax Visualization.xlsx.
Tableau | Desktop
ISTUDY
AQ3. Considering the histogram, are you surprised by its shape where there are
observations on both extremes, but none in the middle? Why would some states
have zero sales tax, but most states have higher sales taxes?
Microsoft | Excel
Microsoft Excel
LAB 9-2M Example of a PivotTable Indicating Estimated Sales Tax Owed for
Dillard’s in 2015 in Microsoft Excel
475
ISTUDY
Tableau | Desktop
LAB 9-2T Example of a Chart Indicating Estimated Sales Tax Owed for Dillard’s in 2015
in Tableau Desktop
Microsoft | Excel
1. Open Excel:
a. If you are continuing this lab from the work you completed in Lab
9-1, open your completed Lab 9-1 workbook (Lab 9-1 State Sales Tax
Visualization.xlsx).
b. If you do not have the completed Lab 9-1 workbook, open a new work-
book in Excel and connect to your data:
1. From the Data ribbon, click Get Data > From File > From Workbook.
2. Navigate to your Lab 9-2 State Sales Tax Rates.xlsx file > Table 1
and click Load.
476
ISTUDY
2. Import Dillard’s sales data:
a. From the Data tab in the ribbon, click Get Data > From Database > From
SQL Server Database.
1. Server: essql1.walton.uark.edu
2. Database: WCOB_Dillards
3. Expand Advanced Options and input the following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE YEAR(Tran_Date) = 2015 AND State <> ’U’ AND
TRANSACT.STORE <> 698
GROUP BY STATE, ZIP_CODE
b. Click Edit or Transform Data.
c. In the Power Query Editor, merge the Dillard’s transaction data with the
State Sales Tax table:
1. From the Home tab in the ribbon, click Merge Queries:
a. Select the STATE column in Query 1.
b. Select Table1 in the drop-down and select the state column.
c. Place a check mark next to Ignore Privacy Levels. . . in the pop-up
box that appears and click Save.
d. Keep the Join Kind as Left Outer and click OK.
d. Still in the Power Query Editor, create a new column to calculate the
amount of Sales Tax Owed for each state and zip code in 2015:
1. Click the arrows on the Table 1 column to expand the table.
a. Select Expand, unselect state, and click OK.
2. From the Add Column tab in the ribbon, select Custom Column:
a. New column name: Estimated State Sales Tax Owed
b. Custom column formula:= [amount]*[Table1.taxrate]
c. Click OK.
3. From the Home tab in the ribbon, click Close & Load.
4. Rename the sheet Lab 9-2 Output.
e. Insert a PivotTable (Insert tab > PivotTable):
1. Rows: State
2. Values: Estimated Sales Tax Owed (ensure that this appropriately
defaults to SUM)
3. Take a screenshot (label it 9-2MA).
4. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-2 Dillard’s Estimated State Sales Tax Owed.xlsx.
477
ISTUDY
Tableau I Prep
1. Open Tableau:
a. If you are continuing this lab from the work you completed in Lab 9-1,
open your completed Lab 9-1 workbook (Lab 9-1 State Sales Tax Visual-
ization.twb).
b. If you do not have the completed Lab 9-1 workbook, open a new Tableau
workbook and connect to the Lab 9-2 dataset:
1. Click Connect > To a File > Microsoft Excel.
2. Navigate to your Lab 9-2 State Sales Tax Rates.xlsx file and click Open.
2. Import Dillard’s sales data:
a. From the Data Source tab (You will already be in the Data Source tab
if you created a new workbook. If you opened your workbook from Lab
9-1, just navigate back to the Data Source tab in the bottom left corner.),
click Add (next to Connections) > To a Server > Microsoft SQL Server.
1. Server: essql1.walton.uark.edu
2. Database: WCOB_Dillards
3. Click Sign In.
4. Double-click New Custom SQL from the Table section and input the
following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE YEAR(Tran_Date) = 2015 AND State <> ’U’ AND
TRANSACT.STORE <> 698
GROUP BY STATE, ZIP_CODE
b. Click OK.
c. The data from Sheet 1 and the Custom SQL Query should automatically
join based on the matching fields of State. Close the Edit Relationship
window.
d. Navigate to a new sheet.
e. Create a new calculated field to calculate the sales tax owed:
1. Click Analysis > Create Calculated Field. . .
a. Name: Estimated State Sales Tax Owed
b. Formula: [AMOUNT]*[Taxrate]
c. Click OK.
2. Create a text table to show the sales tax Dillard’s owes for each state in
2015:
a. Rows : STATE (Custom SQL Query)
b. Text (in the Marks shelf): Estimated State Sales Tax Owed (ensure
that this appropriately defaults to SUM)
478
ISTUDY
3. Take a screenshot (label it 9-2TA).
4. Rename the sheet Lab 9-2 Output.
5. When you are finished answering the lab questions, you may close Tableau.
Save your file as Lab 9-2 Dillard’s Estimated State Sales Tax Owed.twb.
479
ISTUDY
Lab 9-3 Example Output
By the end of this lab, you will create a table to compare the estimated state sales tax owed
to the actual sales tax owed, along with sparklines to visualize the differences. While we
blurred the pertinent values in these screenshots, your work should look similar to this:
Microsoft | Excel
Microsoft Excel
Tableau | Desktop
LAB 9-3T Example of a Text Table and Accompanying Line Charts in Tableau Desktop
480
ISTUDY
Lab 9-3 Part 1 Calculate Sales Tax Paid in 2015 by
Dillard’s
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and responses and save it as Lab 9-3 [Your name] [Your email
address].docx.
In this part of the lab, you will calculate the actual sales tax paid in 2015 by Dillard’s in
each different state that it operates physical store locations in.
Microsoft | Excel
481
ISTUDY
i. This step is important because in the next step, where you will create a
calculated field, Power Query will not interpret null values as numbers,
so the result of any sum with a null value will also be null. Replacing
null with 0 will result in a proper sum of the tax amount paid.
4. Create a calculated field: From the Add Column tab in the ribbon,
click Custom Column. Input the following and click OK.
a. New Column Name: TAX_PAID
b. Custom Column formula: =[SALES TAX] + [SALES TAX
ADJUSTMENT]
3. To make it easier to navigate to this query as the lab continues, we will rename
it from Query1 to something more meaningful. Expand the Queries menu on
the right and right-click Query1. Select Rename and rename the query 9-3.
4. From the Home tab on the ribbon, click Close & Load to load the trans-
formed data into Excel.
5. Create a PivotTable (Insert tab on the Ribbon > PivotTable and click OK).
a. Rows: STATE
b. Values: TAX_PAID
6. Take a screenshot (label it 9-3MA).
7. Rename the sheet as PivotTable.
8. Save your file as Lab 9-3 Dillard’s Sales Tax Paid.xlsx.
9. Answer the questions and continue to Part 2.
Tableau | Desktop
1. Open Tableau Desktop and click Connect to Data> To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
e. SELECT STATE, ZIP_CODE, DEPT_DESC, SUM(TRAN_AMT) AS
AMOUNT
FROM ((TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE)
INNER JOIN SKU
ON TRANSACT.SKU = SKU.SKU)
INNER JOIN DEPARTMENT
ON SKU.DEPT = DEPARTMENT.DEPT
482
ISTUDY
WHERE DEPT_DESC LIKE ’%TAX%’ AND TRANSACT.STORE <> 698
GROUP BY STATE, ZIP_CODE, DEPT_DESC
3. Click OK.
4. Navigate to Sheet 1.
a. Drag DEPT_DESC to the Filters shelf and select only SALES TAX and
SALES TAX ADJUSTMENT, then click OK.
b. Add the following elements to the Columns and Row shelves:
i. Columns: DEPT_DESC
ii. Rows: STATE
iii. Text (in the Marks shelf): AMOUNT (ensure that this appropriately
defaults to SUM)
iv. Add in the Grand Total Sales Tax Paid for each state (the sum of
SALES TAX and SALES TAX ADJUSTMENT) by clicking the
Analysis tab > Totals > Show Row Grand Totals.
5. Rename Sheet 1 to Lab 9-3 Part 1.
6. Save your Tableau workbook as Lab 9-3 Dillard’s Sales Tax Paid.twb.
7. Take a screenshot (label it 9-3TA).
8. Answer the questions, then continue to Part 2.
483
ISTUDY
Microsoft | Excel
1. From the same Excel workbook you created in Part 1 (Lab 9-3 Dillard’s Sales
Tax Paid.xlsx), click the Data tab on the ribbon.
2. Click Get Data > From File > From Workbook:
a. If you are continuing this lab from the work you completed in Lab 9-2,
open your completed Lab 9-2 workbook (Lab 9-2 Dillard’s Estimated
State Sales Tax Owed.xlsx).
b. If you did not complete Lab 9-2, open Lab 9-3 Estimated State Sales Tax
Rates.xlsx.
3. In the Navigator window, select 9-2, then select Edit or Transform. Within the
Power Query Editor, you will make a variety of changes to the data to adjust
the data type for ZIP_CODE from numerical to text, switch the view to the
original Query (9-3), merge the Lab 9-2 query with Query1, and expand the
newly merged query to show only the necessary fields.
a. Adjust data type for ZIP_CODE: Click the 123 icon next to the ZIP_
CODE header and select text. If prompted to Change Column Type,
select Replace current.
b. Switch the view to Query 9-3: Expand the menu on the right labeled
Queries, and select 9-3.
c. Merge the two Queries: From the Home tab in the ribbon, select Merge
Queries.
1. 9-3: select the ZIP_CODE field.
2. Dropdown: select 9-2, then select the ZIP_CODE field.
3. If you are using the worksheet that you created in Lab 9-2:
a. In the Privacy levels window, select the box next to Ignore Privacy
Levels checks. . . and click Save.
4. Leave the Join Kind as a Left Outer join, and click OK.
d. Expand the newly merged query: Click the Expand button on the new
9-2 column (looks like two arrows), and select only AMOUNT and ESTI-
MATED STATE SALES TAX OWED, and click OK.
4. From the Home tab on the ribbon, click Close & Load.
5. Return to the tab of your workbook that you named PivotTable. Right-click
any of the data in the PivotTable and select Refresh. You should see the new
fields that you added in the Power Query Editor in your PivotTable field list
now.
6. Add 9-2.Estimated State Sales Tax Owed to the Values.
7. To visualize whether the estimated amount was more or less than the actual
amount paid, add in a Sparkline for each state.
a. Place your cursor in the cell to the right of the first row (this is probably
cell D4). From the Insert tab in the ribbon, select Sparklines > Line.
1. Data Range: B4:C4
2. Location Range: $D$4
484
ISTUDY
3. Click OK.
4. Drag the Sparkline all the way down the PivotTable dataset to view
sparklines for each state.
8. Take a screenshot (label it 9-3MB) of the PivotTable and the sparklines.
9. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-3 Dillard’s Sales Tax Paid.xlsx.
Tableau | Desktop
1. From the same Tableau workbook from Part 1 (Lab 9-3 Dillard’s Sales Tax Paid.twb),
click the Data Source tab > Add (next to Connections) > Microsoft Excel and
browse to Lab 9-3 Estimated State Sales Tax Rates.xlsx, and click OK.
2. Double-click the sheet labeled 9-2 to relate it to the Part 1 data, and edit the
relationship so the related fields are ZIP_CODE.
3. Navigate to the Lab 9-3 Part 1 sheet to allow the data to update, then right-
click the sheet name and select Duplicate.
4. Rename the duplicated sheet Lab 9-3 Part 2 and adjust the data to show the
comparison of the estimated state sales tax owed and the actual sales tax paid:
a. Right-click the column headers for SALES TAX and SALES TAX
ADJUSTMENT and select Hide.
b. Double-click ESTIMATED STATE SALES TAX OWED to add it to the
Measure Values.
c. Drag the Measure Names pill from the Rows shelf to the Columns shelf.
5. Create a visualization of the comparison by state:
a. Duplicate your Lab 9-3 Part 2 sheet and name it Lab 9-3 Part 2 - viz.
b. Drag the Measure Values pill from the Marks shelf to the Rows shelf.
c. Adjust the Marks by clicking the Dropdown (labeled (Automatic)) and
select Line.
d. To make it easier to compare each state’s values, adjust the axis. Right-
click any of the axes in the visualizations and select Edit Axis. . .
e. Change the Range to Independent axis ranges for each row or column and
close the window.
6. Create a dashboard to view the raw numbers and the sparklines on the same
sheet.
7. Take a screenshot (label it 9-3TB) of the All tables sheet.
8. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 9-3 Dillard’s Sales Tax Paid.twb.
485
ISTUDY
OQ2. In how many states did Dillard’s owe less in taxes than estimated based on the
state sales tax rate?
Lab 9-4 C
omprehensive Case: Estimate Sales Tax Owed by
Zip Code—Dillard’s and Avalara
Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: In this lab, you will use sales tax rate data from June 2020 (provided
by Avalara) to estimate the amount of sales tax Dillard’s might have owed in 2015 using
not only the state sales tax rates, but also local tax rates (city, county, and special where it
applies). While the tax rates you will use are from 2021 and not 2015, the estimate should
still be closer to what Dillard’s actually paid than the estimate you made in Lab 9-2 because
your calculations will include sales tax rates beyond just the state rates. These tax rates are
provided for free and updated monthly by Avalara, although the freely provided rates do not
come without risk—there are cities and counties that contain multiple zip codes, and some
zip codes cross multiple city and county lines. In this lab, you will investigate how well you
can rely on freely provided resources for tax rates versus paying for tax rates based on physi-
cal address (a much more granular and precise data point).
Data: Dillard’s Sales Data and Lab 9-4 Avalara Tax Rates.zip - 236KB Zip / 1.84MB
CSV - Dillard’s sales data are available only on the University of Arkansas Remote Desktop
(waltonlab.uark.edu). See your instructor for login credentials.
486
ISTUDY
Microsoft | Excel Desktop
Microsoft Excel
LAB 9-4M Example of a PivotTable with Conditional Formatting and Sparklines in
Microsoft Excel
Tableau | Desktop
487
ISTUDY
Microsoft | Excel
Tableau | Desktop
1. Open Tableau Desktop and click To a File > To a Text file > Browse to the
location you saved Lab 9-4 Avalara Tax Rates folder and connect to the first
file in the folder.
2. Right-click the rectangle in the Connection pane for the TAXRATES_
ZIP5. . . file and select Convert to Union. . .
a. Select Wildcard (automatic) and click OK. This action will combine all
of the files in the Tax Data folder into one.
488
ISTUDY
3. Navigate to Sheet 1.
a. Rows: State
b. Marks (Text): State Rate (change the aggregate measure to AVERAGE)
c. After glancing through the State Tax Rates, add Estimated Combined
Rate to the Values by double-clicking on the field name, then adjust the
aggregate measure to AVERAGE.
4. Adjust the visualization using the Show Me tab or by sorting to assess which
states have the highest state rates and combined rates, and also to compare
state and combined rates across states.
5. Rename Sheet 1 to Lab 9-4 Part 1.
6. Save your Tableau workbook as Lab 9-4 Avalara Combined Data.twb.
7. Take a screenshot (label it 9-4TA).
8. Answer the questions, then continue to Part 2.
Lab 9-4 Part 2 Merge the Avalara Tax Data with the
Dillard’s Data
In this part of the lab, you will merge the query from Part 1 with the data from Lab 9-3, and
then compare the estimated amount of taxes paid from Part 1 of this lab with the actual
amount of taxes paid that you calculated in Lab 9-3.
If you did not complete the Microsoft track of Lab 9-3, a dataset has been provided for
you to use. In Lab 9-3, you used Dillard’s transaction data, in particular the sum of SALES
TAX and SALES TAX ADJUSTMENT amounts accounted for in the database for each
state in which Dillard’s operated stores in 2015.
Microsoft | Excel
1. From the same Excel workbook you created in Part 1 (Lab 9-4 Avalara Com-
bined Data.xlsx), click the Data tab on the ribbon.
2. Click Get Data > From File > From Workbook:
a. If you are continuing this lab from the work you completed in Lab 9-3, open
your completed Lab 9-3 workbook (Lab 9-3 Dillard’s Sales Tax Paid.xlsx).
489
ISTUDY
b. If you did not complete Lab 9-3, open Lab 9-4 Dillard’s Sales Tax Paid
Abbreviated.xlsx.
3. In the Navigator window, select 9-3, then select Edit or Transform. Within
the Power Query Editor, you will make a variety of changes to the data
to switch the view to the original Query (Lab 9-4 Avalara ZipCode Tax
Data), merge the Lab 9-3 query with the current query (using a RIGHT
join instead of a LEFT join), expand the newly merged query to show
only the necessary fields, and create a calculated field to estimate the tax
owed based on the Avalara Estimated Combined Tax Rate and the actual
amount of sales Dillard’s had in 2015 in each zip code that it has a physical
location.
a. Switch the view to the query “Lab 9-4 Avalara ZipCode Tax Data:” Expand
the menu on the right labeled Queries, and select Lab 9-4 Avalara Tax Rates.
b. Merge the two Queries: From the Home tab in the ribbon, select Merge
Queries.
1. Lab 9-4 Avalara ZipCode: select the ZipCode field.
2. Dropdown: select 9-3, then select the ZIP_CODE field.
3. If you are using the worksheet that you created in Lab 9-3:
a. In the Privacy levels window, select the box next to Ignore Privacy
Levels checks. . . and click Save.
4. Adjust the Join Kind to: Right Outer (all from second, matching from
first), and click OK.
c. Expand the newly merged query: Click the Expand button on the
new 9-3 column (looks like two arrows), and select only TAX_PAID,
AMOUNT, and STATE SALES TAX OWED.
d. Create a calculated field: From the Add Column tab in the ribbon, click
Custom Column. Input the following and click OK.
1. New Column Name: Estimated_Tax_Owed_by_Zip
2. Custom Column formula: =[EstimatedCombinedRate] * [Amount]
3. Click OK.
4. From the Home tab on the ribbon, click Close & Load.
5. Return to the tab of your workbook that you named PivotTable. Right-click any
of the data in the PivotTable and select Refresh. You should see the new fields
that you added in the Power Query Editor in your PivotTable field list now.
6. Add the following three fields to the values (and ensure that the default aggre-
gate is SUM):
• Estimated State Sales Tax Owed
• Tax Paid
• Estimated Tax Owed by Zip
7. Create a Sparkline to view how the estimates differ from the estimate made
from the Lab 9-2 data (an estimate of only state sales tax owed), 9-3 data (the
amount Dillard’s actually paid), and the new estimate (an estimate of sales
tax owed taking into account city, county, and state tax rates).
a. Place your cursor in the cell to the right of the first row (this is probably
cell G4). From the Insert tab in the ribbon, select Sparklines > Line.
490
ISTUDY
1. Data Range: D4:F4
2. Location Range: $G$4
3. Click OK.
4. Drag the Sparkline all the way down the PivotTable dataset to view
sparklines for each state.
8. Take a screenshot (label it 9-4MB) of the PivotTable and the sparklines.
9. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-4 Avalara Dillards Combined.xlsx.
Tableau | Desktop
1. From the same Tableau workbook from Part 1 (Lab 9-4 Avalara Combined
Data.twb), click the Data Source tab > Add (next to Connections) > Microsoft
Excel and browse to Lab 9-4 Dillard’s Sales Tax Paid Abbreviated.xlsx, and
click OK.
2. Double-click the sheet labeled 9-3 to relate it to the Part 1 data, and edit the
relationship so the related fields are ZIP_CODE.
a. If the relationship does not form because of a Type Mismatch, this
means that the data type for Zip Code is string (text) in one of the data-
sets and number in another. Simply select one of the Zip Code attributes
and adjust its data type so it matches the other Zip Code data type.
3. Navigate to a new sheet and rename it Lab 9-4 Part 2.
a. Rows: State (either State field will work)
b. Marks (Text): State Sales Tax Owed
c. Double-click TAX_PAID and Estimated Tax Owed by Zip on the field
name, to add them to the Measure Values.
4. Create a visualization of the comparison by state:
a. Duplicate Lab 9-4 Part 2 and name it Lab 9-4 Part 2 - viz.
b. Drag the Measure Values pill from the Marks shelf to the Rows shelf.
c. Adjust the Marks by clicking the Dropdown (labeled (Automatic)) and
select Line.
d. To make it easier to compare each state’s values, adjust the axis. Right-
click any of the axes in the visualizations and select Edit Axis. . .
e. Change the Range to Independent axis ranges for each row or column and
close the window.
5. Create a dashboard to view the raw numbers and the sparklines on the same
sheet.
6. Take a screenshot (label it 9-4TB) of the All tables sheet.
7. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 9-4 Avalara Dillards Combined.twb.
491
ISTUDY
Lab 9-4 Part 2 Objective Questions (LO 9-3, 9-4)
OQ1. Did Dillard’s pay more or less than the estimated combined amount in Missis-
sippi (MS)?
OQ2. Did Dillard’s pay more than the estimated combined amount in any states?
Lab 9-5 C
omprehensive Case: Online Sales Taxes
Analysis—Dillard’s and Avalara
Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: In this lab, you will gather transaction data for only the online sales that
Dillard’s processed in 2015, as well as the location of the customer to determine what the
estimated sales tax charges would be. In this lab, you will use the Risk Level attribute from
the Avalara dataset to help determine whether you would recommend that Dillard’s would
be safe to use the free resources, or if it should pay for software that would determine sales
tax owed based on more precise measures.
Avalara ZipCode Tax Data: For Lab 9-5, the files for every state, D.C., and Puerto Rico
have been included because of the large extent of localities that Dillard’s ships online
sales to.
Data: Dillard’s Sales Data and Lab 9-5 Avalara Tax Rates.zip - 289KB Zip / 2.33MB
CSV - Dillard’s sales data are available only on the University of Arkansas Remote Desktop
(waltonlab.uark.edu). See your instructor for login credentials.
492
ISTUDY
Microsoft | Excel
Microsoft Excel
Tableau | Desktop
LAB 9-5T Example of a Text Table with Color Formatting in Tableau Desktop
Lab 9-5 Combine the Avalara Tax Rate Data and Merge
with Dillard’s Data
As was mentioned in the chapter, the South Dakota v. Wayfair Supreme Court decision in
2018 resulted in new complexities for online sales by allowing states to require businesses
who participate in online sales in their state to collect and remit sales tax. The Dillard’s data
493
ISTUDY
that we have in this database only goes through 2016, which only includes transactions that
occurred prior to the Wayfair decision, but we can use these past transactions to simulate
the decisions Dillard’s (and other companies) needed to make in a post-Wayfair world. If
you completed Lab 9-4, you learned a bit about the decision a company might make regard-
ing using free resources for determining sales tax rates versus more precise paid resources.
A few notes on Risk Level:
• Avalara created the Risk Level attribute to help companies understand how much risk
they are taking on by using zip codes to calculate rate instead of specific addresses
(remember that some zip codes cross city, county, and even state lines!).
• When a tax rate has a risk value of 1, this means the entire zip code is associated with a
single city and county. This indicates a high likelihood that the single combined tax rate
applies across the whole area. Any zip codes with a higher risk value, however, are asso-
ciated with multiple counties or cities—and the higher the number, the more boundary
lines the zip code crosses. The purpose is to gauge how much risk a company is taking
on by either exposing its business to significant risk at audit time or aggravating custom-
ers by overcharging them on items by as much as 5 or 6 percent.
• The zip codes AA, AE, and AP are not associated with a particular state; rather they
describe different military abbreviations, so they are either international or cross state
lines (in particular, AA crosses state lines because it covers military addresses in the
Americas excluding Canada).
Before you begin the lab, you should create a new blank Word document where you
will record your screenshots and responses and save it as Lab 9-5 [Your name] [Your email
address].docx.
Microsoft | Excel
494
ISTUDY
d. Add the Dillard’s data to the Power Query model: From the Home tab in
the Power Query ribbon, select New Source > Database > SQL Server.
1. Server: essql1.walton.uark.edu
2. Database: WCOB_Dillards
3. Expand Advanced Options and input the following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS AMOUNT
FROM TRANSACT
INNER JOIN CUSTOMER
ON CUSTOMER.CUST_ID = TRANSACT.CUST_ID
WHERE TRANSACT.STORE = 698 AND YEAR (TRAN_DATE)
= 2015 GROUP BY STATE, ZIP_CODE
Unlike in previous Chapter 9 comprehensive labs where you cre-
ated a query to filter out the online sales, this query focuses on only
online sales and the customer’s location instead of the physical
store’s location.
e. 1. Merge the two Queries: From the Home tab in the ribbon, select
Merge Queries.
a. Query1: select the ZIP_CODE field.
b. Dropdown: select Lab 9-5 Avalara Tax Rates, then select the
ZipCode field.
i. In the Privacy levels window, select the box next to Ignore
Privacy Levels checks. . . and click Save.
c. Leave the join kind as Left Outer (the default) and click OK.
2. Expand the newly merged query: Click the Expand button on the
new Lab 9-5 Avalara Tax Rates column (looks like two arrows), and
select only EstimatedCombinedRate and RiskLevel.
3. Create a calculated field: From the Add Column tab in the ribbon,
click Custom Column. Input the following and click OK.
a. New Column Name: Estimated_Online_Tax_Owed_by_Zip
b. Custom Column formula: =[EstimatedCombinedRate] * [Amount]
c. Click OK.
f. From the Home tab on the ribbon, click Close & Load.
g. Create a PivotTable (Insert tab on the Ribbon > PivotTable and click OK).
1. Rows: State
2. Values: Estimated_Online_Tax_Owed_By_Zip (SUM) and RiskLevel
(AVERAGE)
3. Add Conditional Formatting and/or sort your PivotTable based
on Risk Level to assess which states pose a higher level of risk for
Dillard’s if it calculates only online sales tax based on zip code (and
not specific shipping address).
4. Take a screenshot of your PivotTable with sorting and/or Conditional
Formatting (label it 9-5MA).
5. Save your file as Lab 9-5 Avalara and Dillard’s Online Data.xlsx.
495
ISTUDY
Tableau | Desktop
1. Open Tableau Desktop. From Tableau’s Data Source page, you will take
a few steps to import and prepare your data: importing the Avalara data,
changing the data type for the Zip Code (this will help relate the Avalara
data to the Dillard’s data in the next step), and importing the Dillard’s data
and relating the two datasets.
a. Import Avalara Data: Click To a File > To a Text file > Browse to the
location you saved Lab 9-5 Avalara Tax Rates folder and connect to the
first file in the folder.
1. Right-click the rectangle in the Connection pane for the Lab 9-5
Avalara ZipCode Tax Data. . . file and select Convert to Union. . .
2. Select Wildcard (automatic) and click OK. This action will combine
all of the files in the Tax Data folder into one.
b. Adjust the Zip Code data type: Click the Globe icon next to the Zip
Code header and select string.
c. Import Dillard’s data: Click Add (next to Connections) > To a Server >
Microsoft SQL Server.
1. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is; click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS
AMOUNT
FROM TRANSACT
INNER JOIN CUSTOMER
ON CUSTOMER.CUST_ID = TRANSACT.CUST_ID
WHERE TRANSACT.STORE = 698 AND YEAR ( TRAN_
DATE ) = 2015
GROUP BY STATE, ZIP_CODE
i. Unlike in previous Chapter 9 comprehensive labs where you
created a query to filter out the online sales, this query focuses
on only online sales and the customer’s location instead of the
physical store’s location.
e. In the Edit Relationship window, adjust the related fields to Zip
Code in each query, then close the window.
2. Navigate to Sheet 1:
a. Create a Calculated Field: From the Analysis tab, select Create
Calculated Field. . .
1. Name: Estimated Online Tax Owed by Zip
2. Calculation: [Estimated Combined Rate] * [Amount]
3. Click OK.
496
ISTUDY
b. Rows: State
c. Measure Values (double-click these to ensure they appear in Measure
Values and not elsewhere): Estimated Online Tax Owed by Zip (SUM)
and Risk Level (AVERAGE).
d. Add either Color Marks formatting or sort your table based on Risk Level to
assess which states pose a higher level of risk for Dillard’s if it calculates only
online sales tax based on zip code (and not specific shipping address).
3. Take a screenshot (label it 9-5TA).
4. Save your Tableau workbook as Lab 9-5 Avalara and Dillard’s Online Data.twb.
497
ISTUDY
Chapter 10
Project Chapter (Basic)
A Look Back
Chapter 9 discussed the application of Data Analytics to tax questions and looked at tax data sources and how they
may differ depending on the tax user (a tax department, an accounting firm, or a regulatory body) and tax needs. We
also investigated how visualizations are useful components of tax analytics. Finally, we considered how data analysis
might be used to assist in tax planning.
A Look Forward
Chapter 11 will revisit the Dillard’s sales and returns data to provide an advanced overview of different analytical
tools and techniques to provide additional understanding of the data.
498
ISTUDY
Tools like Tableau and Power BI are
popular because they enable quick
analysis of simple descriptive and
diagnostic analytics. By creating visual
answers to data problems, accoun-
tants can tell stories that help inform
management decisions, aid auditors,
and provide insight into financial data.
Both Tableau and Power BI enable
more simplified analysis by incor-
porating natural language process-
ing into their cloud-based offerings.
Instead of dragging dimensions and
measures to build the analyses, you
can simply ask a question in a natural
sentence, and the tool will map your
question to your existing data model.
Microsoft PowerBI
OBJECTIVES
After reading this chapter, you should be able to:
499
ISTUDY
500 Chapter 10 Project Chapter (Basic)
ISTUDY
Rev. Confirming Pages
Microsoft or Tableau
Using the skills you have gained throughout this text, use Microsoft Power BI or
Tableau Desktop to complete the generic tasks presented below:
Build a new dashboard (Tableau) or page (Power BI) called Financial that includes
the following:
1. Create a new workbook, connect to 10-1 O2C Data.xlsx, and import all
seven tables. Double-check the data model to ensure relationships are cor-
rectly defined as shown in Exhibit 10-1.
2. Add a table to your worksheet or page called Sales and Receivables that
shows the invoice month in each row and the invoice amount, receipt
amount, adjustment amount, AR balance, and write-off percentage in the
columns. Tableau Hint: Use Measure Names in the columns and Measure
Values in the marks to create your table. Then once your table is complete,
use Analytics > Summarize > Totals to calculate column totals.
a. You will need to create a new measure or calculated field showing the
account AR Balance, or the total invoice amount minus the total receipt
amount minus the total adjustment amount. Tableau Hint: To minimize
erroneous values from appearing in Tableau due to blank or missing
values, use the IFNULL() function to replace blank values with 0, for
example, IFNULL([Receipt Amount],0).
b. You will need to calculate the write-off percentage as the total AR adjustment
divided by total invoice amount. Hint: Format the write-off percentage as
a percent or to four decimals.
c. Filter this visual to show only values from January 2020 to December 2020.
3. Add a new bar chart called Bad Debts that shows the invoice amount and
adjustment amount along with a tooltip for write-off percentage. Tableau
Hint: Choose Dual Axis and Synchronize Axis to combine the two values.
a. Filter this visual to show only values from January 2020 to December 2020.
4. Clean up the formatting and titles of your visuals and combine them into a
single dashboard or page labeled Financial.
5. Take a screenshot of your dashboard showing the account balances (label it 10-1A).
6. Save your workbook as 10-1 O2C Analysis, answer the lab questions, then
continue to Part 2.
ISTUDY
Chapter 10 Project Chapter (Basic) 503
Microsoft or Tableau
Using the skills you have gained throughout this text, use Microsoft Power BI or
Tableau Desktop to complete the generic tasks presented below:
Build a dashboard (Tableau) or page (Power BI) called Management with the
following:
1. Add a filter to show only sales orders from November.
2. Add a table to your page called Total Sales by Day that shows the total sales
order amount by sales order date. Power BI Hint: Use the date hierarchy to
drill down to specific days of the month. Tableau Hint: Set the sales order
date to DAY() and place the total sales order amount as a text mark.
3. Add a bar chart to your page called Sales by Customer that shows the total
sales order amount by customer account name in descending order.
4. Add a new matrix table to your page called AR by Customer that shows the
customer and invoices in rows, and earliest invoice due date, age, and balance
as values.
a. Create a parameter showing the Report Date as 12/31/2020. Power
BI Hint: Create a new column and use the DATE() function.
b. Create a new measure showing the Age as the difference between the
Invoice Due Date and the Report Date. Power BI Hint: Use the
DATEDIFF() function to calculate the age and the MIN() function on
the date fields to load specific dates.
c. Use the AR Balance you created in Part 1.
d. Filter the table to show only outstanding balances that are greater than 0.
5. Add a new card to your page called Days Sales Outstanding to show the cur-
rent KPI value. Hint: Create a new measure showing the DSO as the accounts
receivable balance divided by the total sales amount multiplied by 30 days.
6. In Tableau, combine all of these visuals into one dashboard.
7. Take a screenshot of your dashboard (label it 10-1B).
8. Save your workbook, answer the lab questions, then continue to Part 3.
ISTUDY
504 Chapter 10 Project Chapter (Basic)
OQ3. Which customer did the company sell the most to in November?
OQ4. How much did the company sell to that customer in November?
OQ5. What is the value of the oldest outstanding invoice in November?
OQ6. What is the age of the oldest outstanding invoice?
OQ7. What is the current days sales outstanding KPI value for 2020?
Microsoft or Tableau
Using the skills you have gained throughout this text, use Microsoft Power BI or
Tableau Desktop to complete the generic tasks presented below:
Build a new dashboard (in Tableau) or page (in Power BI) called Audit that
includes the following:
1. Add a table to your page called Exceptions to identify any shipments that
occurred before the order was placed. It should show the Sales Order ID and
the number of days to ship in ascending order.
a. Create a new measure called Order To Ship Days that calculates the
difference between the sales order date and the shipment date. Power BI
Hint: Use the DATEDIFF() function to calculate the difference and the
MIN() function on the date fields to load specific dates.
b. Filter this visual on order to ship days to show only negative values.
ISTUDY
Chapter 10 Project Chapter (Basic) 505
2. Add a new matrix table called Missing Invoice to determine whether any
orders have shipped but have not yet been invoiced. It should list the sales
orders, earliest (minimum) shipment date, minimum shipment ID, and mini-
mum invoice ID.
a. Filter this visual on invoice ID to show only missing (blank) values.
3. You should find at least one exception here. If you don’t see any exceptions,
try selecting different months in the sales order date month filter.
4. Take a screenshot of your dashboard showing exceptions and missing
invoices (label it 10-1C).
5. Save your workbook, answer the lab questions, then continue to Part 4.
ISTUDY
506 Chapter 10 Project Chapter (Basic)
ISTUDY
Chapter 10 Project Chapter (Basic) 507
v
EXHIBIT 10-2 Procure-to-Pay Data
Data: 10-2 P2P Data.zip - 551KB Zip / 594KB Excel
Microsoft or Tableau
Build a new dashboard (Tableau) or page (Power BI) called Financial that includes
the following:
1. Create a new workbook, connect to 10-2 P2P Data.xlsx, and import all eight
tables. Double-check the data model to ensure relationships are correctly defined.
2. Add a table to your page called Purchases that shows the invoice quarter in
each row and the invoice amount, payment amount, adjustment amount, bal-
ance, and write-off percentage in the columns. Tableau Hint: Use Measure
Names in the columns and Measure Values in the marks to create your table.
Then once your table is complete, use Analytics > Summarize > Totals to calcu-
late column totals.
a. You will need to create a new measure showing the account AP Balance,
or the total invoice amount minus the total payment amount
ISTUDY
508 Chapter 10 Project Chapter (Basic)
ISTUDY
Chapter 10 Project Chapter (Basic) 509
Microsoft or Tableau
Build a dashboard (Tableau) or page (Power BI) called Management with the following:
1. Add a filter to show only purchase orders from November.
2. Add a table to your page called Total Purchases by Day that shows the total pur-
chase order amount by purchase order date. Power BI Hint: Use the date hierar-
chy to drill down to specific days of the month. Tableau Hint: Set the purchase
order date to DAY() and place the total purchase order amount as a text mark.
3. Add a bar chart to your page called Purchases by Supplier that shows the total
purchase order amount by supplier account name in descending order by amount.
4. Add a new matrix to your page called AP by Supplier that shows the supplier
and invoices in rows, and earliest invoice due date, age, and balance as values.
a. Create a parameter showing the Report Date as 12/31/2020. Power
BI Hint: Create a new column and use the DATE() function.
b. Create a new measure showing the Age as the difference between the
Invoice Due Date and the Report Date. Power BI Hint: Use the
DATEDIFF() function to calculate the age and the MIN() function on
the date fields to load specific dates.
c. Create a new measure showing the account Balance, or the total invoice
amount minus the total payment amount minus the total adjustment amount.
Tableau Hint: To minimize errors in Tableau, use the IFNULL() function to
replace blank values with 0, for example, IFNULL([Receipt Amount],0).
d. Filter the table to show only outstanding balances that are greater than 0.
5. Add a new card to your page called Days Payables Outstanding to show the
current KPI value. Hint: Create a new measure showing the DPO as the
accounts payable balance divided by the total purchases amount multiplied
by 30 days.
6. In Tableau, combine all of these visuals into one dashboard.
7. Take a screenshot of your dashboard (label it 10-2B).
8. Save your workbook, answer the lab questions, then continue to Part 3.
ISTUDY
510 Chapter 10 Project Chapter (Basic)
Microsoft or Tableau
Build a new dashboard (Tableau) or page (Power BI) called Audit with the following:
1. Add a new bar chart called Outliers that shows the Z-score of purchases
by supplier account name. You will need to create the following measures/
calculated fields:
a. Average Purchases shows the average purchase order amount. Tableau
Hint: Use WINDOW_AVG(SUM([Purchase_Order_Amount_Local])) to
aggregate by supplier.
b. Std Dev Purchases shows the standard deviation of purchase order
amount. Tableau Hint: Use the WINDOW_STDEVP() formula. Power BI
Hint: Use Std Dev for the population.
c. Z-Score Purchases is calculated by dividing the total purchase order
amount minus average purchases by the Std Dev Purchases.
2. Now create a new worksheet called Missing Orders to determine if any
invoices have been received that don’t match existing orders.
a. In this case, you want to filter the Purchase Order ID from the Invoice
Received table to show only missing (null) values.
3. Take a screenshot of your dashboard showing exceptions (label it 10-2C).
4. Save your workbook, answer the lab questions, then continue to Part 4.
ISTUDY
Chapter 10 Project Chapter (Basic) 511
ISTUDY
Chapter 11
Project Chapter (Advanced):
Analyzing Dillard’s Data to
Predict Sales Returns
A Look Back
Chapter 10 had a project chapter that emphasized basic Data Analytics skills related to the order-to-cash and
purchase-to-pay processes.
512
ISTUDY
Americans returned $260 billion in merchandise to retailers
in 2015. While the average rate of return at retailers is just
8 percent, it increases on average to 10 percent during the
holiday sales returns. However, it increases dramatically for
online sales—to 30 percent or higher—with clothing returns
from online sales hitting 40 percent. With a much higher
return rate, as online retailers such as Amazon continue to
increase their market share of total retail sales, it’s only going
to get worse.
What’s more? Not only is product being returned in
greater numbers, but the value of the unwanted and dam-
aged returns is greatly diminished:
Because accountants are required to estimate sales returns (and the diminished value of returned items), and off-
set sales in the same period that the original sales are made, accountants need to establish a reasonable and hope-
fully reliable method to estimate such returns. This chapter establishes various descriptive, diagnostic, and predictive
analytics that may be used to help evaluate the estimate of sales returns.
*CNBC LLC.
Source: https://www.cnbc.com/2016/12/16/a-260-billion-ticking-time-bomb-the-costly-business-of-retail-returns.html,
accessed April 2019. https://www.forbes.com/sites/stevendennis/2018/02/14/the-ticking-time-bomb-of-e-commerce-
returns/#46d599754c7f (accessed April 2019).
OBJECTIVES
After completing this chapter, you should be able to:
513
ISTUDY
514 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns
1
Accounting Standards Codification (ASC) 606, Revenue from Contracts with Customers, as amended,
and created by Accounting Standards Update (ASU) 2014-09, Revenue from Contracts with Customers.
ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 515
1. After logging in to the Remote Desktop, open Microsoft Excel and click the
Data tab on the ribbon.
2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT MONTH(TRAN_DATE) AS MONTH, TRAN_DATE,
STATE, TRANSACT.STORE, TRAN_TYPE,
SUM(SALE_PRICE) AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON TRANSACT.STORE = STORE.STORE
GROUP BY MONTH(TRAN_DATE), TRAN_DATE, STATE,
TRANSACT.STORE, TRAN_TYPE
d. Click OK (if prompted: Connect using current credentials, and click OK).
e. Click Edit or Transform Data.
3. There are several changes you need to make to the data to prepare them for
analysis: pivoting the tran_type column so we have two separate columns for
purchases and returns, creating a calculated field for percentage of returned
sales, and creating a conditional column indicating whether transactions
were performed online or in-person.
a. Pivot the tran_type column:
1. Select the TRAN_TYPE column.
2. From the Transform tab in the ribbon, select Pivot Column.
3. Values Column: Amount
4. Click OK.
b. Create a calculated field for percentage of returned sales:
1. From the Add Column tab in the ribbon, select Conditional Column.
2. New column name: % of Returned Sales
3. Column name: P
4. Operator: equals
5. Value: 0
6. Output: 0
7. Otherwise: = [R]/[P]
8. Click OK.
c. Create a conditional column for online versus in-person transactions:
1. From the Add Column tab in the ribbon, select Conditional Column.
2. New Column Name: Online-dummy
3. Column Name: STORE
ISTUDY
516 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns
4. Operator: equals
5. Value: 698
6. Output: Online
7. Otherwise: In-Person
8. Click OK.
d. From the Home tab in the ribbon, select Close & Load.
e. Once your data load into Excel, name the spreadsheet Ch 11 Query
Data.
f. Once your data load into Excel, add the data to the data model through
Power Pivot and create a date table:
1. Enable the Power Pivot add-in (File > Options > Add-ins > Manage:
COM add-ins > PowerPivot, select GO, then enter a check mark
next to MS PowerPivot for Excel, click OK).
2. From the Power Pivot tab in the ribbon, select Add to Data Model.
3. From the Power Pivot window, select the Design tab > Date Table>
New.
4. Once the Date table populates, select PivotTable from the Home tab
and click OK to create a PivotTable in a new worksheet.
5. Name the spreadsheet with the new PivotTable Ch 11 QS1 PivotTable.
4. In Excel, create the following PivotTables or PivotCharts (if you create Pivot-
Tables, include relevant conditional formatting):
a. Average % of Returned Sales by month to indicate the months with the
highest and lowest averages.
• When using fields from the Calendar table and the query table, you
will need to build relationships. Once you add fields from each table,
Excel will prompt you to create relationships. Let Excel do this auto-
matically for you by clicking Auto-Detect... If Excel does not detect
the relationship, you can build it manually. The matching fields are
Date Table.Date and Query1.TRAN_DATE.
b. Average % of Returned Sales for online transactions versus in-person
transactions.
c. Average % of Returned Sales across states.
5. Add three slicers and adjust the report connections for each so that they
interact and slice each of your PivotTables or PivotCharts:
a. Month
b. Online-Dummy
c. State
6. Arrange your Slicers, PivotTables, and/or PivotCharts so that they are all eas-
ily viewable on your Excel sheet.
7. Take a screenshot of your PivotTables, PivotCharts, and Slicers (label it 11 -1M).
8. Save your workbook as Chapter11.xlsx, answer the QS questions, then continue
to Question Set 1, Part 2.
ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 517
Tableau | Prep
ISTUDY
518 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns
ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 519
answers to your five questions. Load them into a report or dashboard and take a screen-
shot of your dashboard analyses (label it 11-2MB or 11-2TB).
Microsoft | Excel
1. Access the Power Query Editor to prepare your data for running a hypothesis test:
a. Open your Excel file saved as Chapter11.xlsx (from QS1) and select a
cell of the table in the Ch 11 Query Data sheet to activate the Query tab
in the Excel ribbon.
b. From the Query tab, select Edit to open the Power Query editor. If prompt-
ed to do so, click Edit Permissions and Run the Native Database Query in
ISTUDY
520 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns
the window that pops up. Then repeat a similar process by clicking Edit
Credentials, clicking Connect, allowing Encryption Support, and click-
ing OK. The query data should show up in the Power Query Editor now.
2. You will perform three actions: duplicate the existing % of Returned Sales
column, add a new conditional column to create a holiday dummy variable,
and pivot your new conditional column on % of Returned Sales.
a. Duplicate the existing % of Returned Sales column:
1. Select the % of Returned Sales column.
2. From the Add Column tab in the ribbon, click Duplicate column.
b. Add a new conditional column for the holiday dummy variable:
1. From the Add Column tab in the ribbon, select Conditional Column.
a. New column name: Holiday-Dummy
b. Column Name: Month
c. Operator: Equals
d. Value: 1
e. Output: Holiday
f. Otherwise: Non-Holiday
g. Click OK.
2. Pivot the Holiday-Dummy column:
a. Select the Holiday-Dummy column.
b. From the Transform tab in the ribbon, click Pivot Column.
c. Values Column: % of Returned Sales - Copy and click OK.
3. Return to the Home tab in the ribbon and click Close & Load.
4. Once your data have loaded, you will run a t-test to see if the percentage of
sales returned in January is significantly higher than the rest of the year.
a. From the Data tab in the ribbon, click Data Analysis.
• If the Data Analysis Toolpak hasn’t been added in, see Appendix
C for directions on how to add it. Click Data Analysis to open the
Analysis Tools window.
b. Select t-Test: Two Sample Assuming Unequal Variances, click OK, then
enter the following:
1. Variable 1 range: all Holiday values (including the label)
2. Variable 2 range: all Non-Holiday values (including the label)
3. Place a check mark next to Labels.
4. Output options: New Worksheet Ply
5. Click OK.
5. The output for the hypothesis test will appear on a new sheet in your Excel
workbook. Name this sheet Ch 11 QS2 t-test.
6. Take a screenshot of your hypothesis test results (label it 11-2MA).
7. Save your workbook as Chapter11.xlsx, answer the lab questions, then
continue to Part 2.
ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 521
Tableau | Desktop
1. While you cannot run a t-test in Tableau, you can drill down further into the
data to create a Holiday set and compare the % of returned sales during Janu-
ary versus the rest of the year. To do so, you will add Month to the visualiza-
tion, select January, create a set named Holiday, then create visualizations
using the new Holiday set.
2. Create a new sheet in your Chapter 11.twb tableau file and name it Holiday
Diagnostic Analysis.
a. Drag TRAN_DATE to the Rows shelf.
b. Right-click the YEAR(TRAN_DATE) pill in the Rows shelf and se-
lect Month.
c. In the visualization, select January and select Set button (looks like two
overlapping circles).
d. Select Create Set. . . and Name it Holiday.
e. Replace the MONTH(TRAN_DATE) pill in the Rows shelf with Holiday
and drag % of Returned Sales to the Columns shelf.
f. Adjust the Measure for SUM(% of Returned Sales) to Average.
3. Take a screenshot of your Holiday Diagnostic Analysis visualization
(label it 11-2TA).
4. Save your workbook as Chapter11.twb, answer the lab questions, then con-
tinue to Part 2.
ISTUDY
522 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns
BI for this question set. For the Tableau track, you can continue with the same Tableau
workbook you used in the previous question set.
1. Open Power BI Desktop and connect to your Chapter 11.xlsx Excel workbook.
2. Select the Ch 11 Query Data sheet and click Load.
3. Expand Ch 11 Query Data in the Fields window and click through the fol-
lowing attributes one at a time. When you click an attribute, a new tab in the
ribbon appears named Column tools. Click into that tab and ensure that the
following data types are set correctly.
a. Holiday: Decimal Number
b. Non-Holiday: Decimal Number
c. Online-Dummy: Text
4. You will create three visualizations next, one to compare average holiday to
non-holiday % of returned sales, overall, another one to break it down by
online versus in-person, and a third to break it down by state.
a. Visualization one: Clustered Column Chart
1. Value: Holiday and Non-Holiday
2. Adjust the Value fields aggregations from SUM to AVERAGE.
b. Visualization two: Clustered Column Chart (this is easiest if you copy and
paste your first visualization)
1. Value: Holiday and Non-Holiday(ensure the aggregate value is
AVERAGE for both)
2. Axis: Online-Dummy
c. Visualization three: Clustered Column Chart (this is easiest if you copy
and paste the first visualization)
1. Value: Holiday and Non-Holiday(ensure the aggregate value is
AVERAGE for both)
2. Axis: State
d. Rearrange and resize the visualizations so that they are easier to read and
take a screenshot of your dashboard (label it 11-2MB).
ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 523
Tableau | Desktop
From your Chapter 11.twb tableau file, you will duplicate the Holiday Diagnostic
Analysis visualization to create two additional visualizations, and then combine all
three in a dashboard. The two additional visualizations will show a breakdown of
holiday/non-holiday returns for online versus in-person transactions and a break-
down of holiday/non-holiday returns across states.
1. Duplicate the Holiday Diagnostic Analysis sheet and rename the duplicate
sheet Holiday Diagnostic Analysis - Online.
2. Drag Online-Dummy to the right of the IN/OUT(Holiday) pill in Rows.
3. Duplicate the Holiday Diagnostic Analysis sheet again and rename the dupli-
cate sheet Holiday Diagnostic Analysis - States.
4. Drag State to the right of the IN/OUT(Holiday) pill in Rows.
5. Create a new dashboard and arrange all three holiday diagnostic analysis
sheets so that they are easy to read and take a screenshot of your dash-
board and label it 11-2TB.
ISTUDY
524 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns
AQ4. Write your fourth question and provide an answer based on your analysis.
AQ5. Write your fifth question and provide an answer based on your analysis.
Microsoft | Excel
ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 525
Tableau | Desktop
Create a Line Chart to compare 2014 and 2015 Returns.
a. Create a new sheet in your Chapter 11.twb Tableau workbook and name it
Line Chart.
• If you are reopening Tableau, open a new Tableau workbook first, then
navigate to open the Chapter11.twb file.
b. Columns: TRAN_DATE
c. Rows: % of Returned Sales
d. Expand the YEAR(TRAN_DATE) pill to show Quarters.
e. Right-click QUARTER(TRAN_DATE) and select Month.
f. Drag TRAN_DATE to Color on the Marks card.
g. Because 2016 does not include the full year, right-click the YEAR(TRAN_
DATE) pill in the Marks card and select Filter. . ., then uncheck the box next
to 2016 and click OK.
h. Right-click STATE and select Show Filter.
i. Hide the Show Me window to view the state filter card.
j. Take a screenshot (label it 11-3TA).
k. Answer QS3 Part 1 questions, then continue to QS3 Part 2.
ISTUDY
526 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns
Microsoft | Excel
1. Because the line graphs seemed to suggest that previous transactions would
help predict future transactions, we can run a regression to build a model that
will help stores predict the percentage of sales that will be returned each month.
a. Create a PivotTable in your Chapter 11.xlsx worksheet.
1. Columns: Year (from Date Table > More Fields. . .)
2. Rows: Month then Day of Week (from Date Table > More Fields. . .)
3. Values: % of Returned Sales(change the aggregate value to Average
from Sum)
b. Adjust PivotTable settings:
1. From the PivotTable Design tab in the ribbon, select Report Layout>
Show in Tabular Form.
2. From the PivotTable Design tab in the ribbon, select Grand Totals >
Off for Rows and Columns.
2. From the Data tab in the ribbon, click Data Analysis.
• If the Data Analysis Toolpak hasn’t been added in, see Appendix C for
directions on how to add it. Click Data Analysis to open the Analysis
Tools window.
3. Select Regression, click OK, then enter the following:
a. Input Y Range: all 2015 values (including the label)
b. Input X Range: all 2014 values (including the label)
c. Place a check mark next to Labels.
d. Click OK.
e. Take a screenshot (label it 11-3MB).
4. Answer QS3 Part 2 questions, then continue to QS3 Part 3.
Tableau | Desktop
This portion of the question set cannot be completed in Tableau.
ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 527
ISTUDY
Appendix A
Basic Statistics Tutorial
528
ISTUDY
Appendix A Basic Statistics Tutorial 529
The sample arithmetic mean is the sum of all the data points divided by the number of
observations. The median is the midpoint of the data and is especially useful when there
are skewed numbers one way or another. The mode is the observation that occurs most
frequently.
√
N
1 ∑ ((xi−
σ = __ x¯))2
N i=1
The greater the sample standard deviation or variance, the greater the variability.
PROBABILITY DISTRIBUTIONS
There are three primary probability distributions used in statistics and Data Analytics,
including normal distribution, the uniform distribution, and the poisson distribution.
Normal Distribution
A normal distribution is arguably the most important probability distribution because it fits
so many naturally occurring phenomenon in and out of accounting—from the distribution
of return on assets to the IQ of the human population.
The normal distribution is a bell-shaped probability distribution that is symmetric about
its mean, with the data points closer to the mean more frequent than those data points fur-
ther from its mean. As shown in Exhibit A-1, data within one standard deviation (+/− one
standard deviation) includes 68 percent of the data points. Within two standard deviations,
95 percent of the data points; three standard deviations, 99.7 percent of the data points.
A Z-score is computed to tell us how many standard deviations (σ), a data point (or
observation), xi, is from its population mean, µ, using the formula z = (xi − µ)/σ. A Z-score
ISTUDY
530 Appendix A Basic Statistics Tutorial
of 1 suggests that the observation is one standard deviation above its mean. A Z-score of –2
suggests that the observation is two standard deviations below its mean.
Many of the statistical tests employed in data analysis are based on the normal distribu-
tion and how many standard deviations a sample observation is from its mean.
EXHIBIT A-1
Normal Distribution
and the Frequency of
Observations around Its
Mean (Using 1, 2, or 3
Standard Deviations)
68% of data
95% of data
99.7% of data
–3 –2 –1 μ 1 2 3
Number of Standard Deviations from the Mean
HYPOTHESIS TESTING
As we learn in Data Analytics, data by themselves are not really that interesting. It is using data
to answer, or at least address, questions posed by management that makes them interesting.
Management might pose a question in terms of a hypothesis, like their belief that sales
at their stores are higher on Saturdays than on Sundays. Perhaps they want to know this
answer to decide if they will need more staff to support sales (e.g., cashiers, shelf stockers,
parking lot attendants) on Saturday as compared to Sunday. In other words, management
holds an assumption that sales are higher on Saturday than on Sundays.
Usually hypotheses are paired in two’s: the null hypothesis and the alternate hypothesis.
The first is the base case, often called the null hypothesis, and assumes the hypothesized
relationship does not exist. In this case, the null hypothesis would be stated as follows:
Null hypothesis: H0: Sales on Saturday are less than or equal to sales on Sunday.
The alternate hypothesis would be the case that management believes to be true:
Alternate hypothesis: HA: Sales on Saturday are greater than sales on Sunday.
For the null hypothesis to hold, we would assume that Saturday sales are the same as (or
less than) Sunday sales. Evidence for the alternate hypothesis occurs when the null hypothesis
ISTUDY
Appendix A Basic Statistics Tutorial 531
does not hold and is rejected at some level of statistical significance. In other words, before we
can reject or fail to reject the null hypothesis, we need to do a statistical test of the data with
sales on Saturdays and Sundays and then interpret the results of that statistical test.
STATISTICAL TESTING
There are two types of results from a statistical test of hypotheses: the p-value and/or the
critical values.
The p-Value
We describe a finding as statistically significant by interpreting the p-value.
A statistical test of a hypothesis returns a p-value. The p-value is the result of a test that
either rejects or fails to reject the null hypothesis. The p-value is compared to a threshold
value, called the significance level (or alpha). A common value used for alpha is 5 percent
or 0.05 (as is 1 percent or 0.01).
The p-value is compared to the alpha threshold. A result is statistically significant when
the p-value is less than alpha. This signifies a change was detected: that the default hypoth-
esis can be rejected.
If p-value > alpha: Fail to reject the null hypothesis (i.e., not significant result).
If p-value <= alpha: Reject the null hypothesis (i.e., significant result).
For example, if we were performing a test of whether Saturday sales were greater than
Sunday sales and the test statistic was a p-value of 0.09, we would state something like, “The
test found that the Saturday sales are not different than Sunday sales, failing to reject the
null hypothesis at a 5 percent level of significance.”
This statistical result should then be reported to management.
EXHIBIT A-2
Statistical Testing
(1 – α) = 0.95 Using Alpha, p-Values,
and Confidence
Reject H0 Intervals
Fail to Fail to
reject H0 reject H0 α = .05
No difference
Negative difference Positive difference
between
between Saturday between Saturday and
Saturday and
and Sunday sales Sunday sales
Sunday sales
ISTUDY
532 Appendix A Basic Statistics Tutorial
Microsoft Excel
The t-test output found that the mean holiday sales returns over 1,167 days is 0.13
(or 13 percent) of sales, and the mean non-holiday sales returns are 0.119 (or 11.9 percent)
of sales. The question is if those two numbers are statistically different from each other. The
t Stat of 7.86 and the p-value [shown as “P(T<=t) one tail”] is 3.59E-15 (i.e., well below
0.01 percent), suggesting the two sample means are significantly different from each other.
The t-test output notes the difference in crucial p-values for a one-tailed t-test and a two-
tailed t-test. A one-tailed t-test is used if we hypothesize that holiday returns are significantly
greater (or significantly smaller) than non-holiday returns. A two-tailed t-test is used if we
don’t hypothesize holiday or non-holiday returns are greater or smaller than the other, only
that we expect the two sample means will be different from each other.
ISTUDY
Appendix A Basic Statistics Tutorial 533
Microsoft Excel
There are many things to note about the regression results. The first is that the overall
regression model did better than chance at predicting the college completion rate as shown
by the “F”-score. We note that by seeing the p-score representing “Significance F” result is
very small, almost zero, suggesting there is virtually zero probability that the completion
rate can be explained by a model with no independent variables than a model that has
independent variables. This is exactly the situation we want, suggesting we should be able to
identify a factor that explains completion rates.
There is another statistic used to measure how the overall regression model did at predicting
the dependent variable of completion rates. The adjusted R-squared is a value between 0 and 1.
An adjusted R-squared value of 0 represents no ability of the model to explain the dependent
variable, and an adjusted R-squared value of 1 represents perfect ability of the model to explain
the dependent variable. In this case, the adjusted R-squared value is 0.642, which represents a
reasonably high ability to explain the changes in the college completion rate.
The statistics also report that the SAT score (SAT_AVG) helps predict the completion
rate. This is shown by the “t Stat” that is greater than 2 (or less than –2) for SAT_AVG
(with t Stat of 47.74) and a p-value less than an alpha of 0.05 (as shown with the p-value of
1.564E-285). As expected, given the positive coefficient on SAT_AVG, the greater the SAT
score, the greater the college completion rate.
ISTUDY
Appendix B
Excel (Formatting, Sorting, Filtering,
and PivotTables)
Revenues 50000
Expenses
Cost of Goods Sold 20000
Research and Development Expenses 10000
Selling, General, and Administrative Expenses 10000
Interest Expense 3000
Required:
1. Add a comma as a 1000 separator for each number.
2. Insert the words Total Expenses below the list of expenses.
3. Calculate subtotal for Total Expenses using the SUM() command.
4. Insert a single bottom border under Interest Expense and under the Total Expenses
subtotal.
5. Insert the words Net Income, and calculate Net Income (Revenues – Total Expenses).
6. Format the top and bottom numbers of the column with a $ currency sign.
7. Insert a Bottom Double Border to underline the final Net Income total.
534
ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 535
Solution:
1. Open Appendix B Data.xlsx and access the sheet named “Income Statement
Formatting.”
2. Add a comma as a 1000 separator for each number.
Highlight the column with all of the numbers. Right-click and select Format Cells. . .
to open this dialog box:
Click on Number and set Decimal places to zero. Click on Use 1000 Separator (,)
and click OK.
3. Insert the words Total Expenses below the list of expenses.
Type Total Expenses at the bottom of the list of expenses.
4. Calculate subtotal for Total Expenses using the SUM() command.
Use the SUM() command to sum all of the expenses, as follows:
ISTUDY
536 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)
Microsoft Excel
5. Insert a single bottom border under Interest Expense and under the Total Expenses
subtotal.
Use the icon indicated to add the bottom border.
6. Insert the words Net Income and calculate Net Income (Revenues – Total Expenses).
Type Net Income at the bottom of the spreadsheet. Calculate Net Income by insert-
ing the correct formula in the cell (here, =B2-B9):
ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 537
7. Format the top and bottom numbers of the column with a $ currency sign.
Right-click on each number and Format Cells, select Currency and no decimal
points, and click OK.
Microsoft Excel
8. Insert a Bottom Double Border to underline the final Net Income total.
Place your cursor on the cell containing Net Income (7,000). Then select Bottom
Double Border from the Font > Borders menu.
This is the final product:
Microsoft Excel
ISTUDY
538 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)
Microsoft Excel
4. Let’s sort by sales price from largest to smallest. Input Sales into the Sort by, select
Largest to Smallest in the dialog box, and select OK.
Microsoft Excel
ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 539
The highest sales price appears to be $140 for 50 pounds of apricots at a cost of $88.01.
Microsoft Excel
Looking down at the end of this list, we see that the lowest sales price appears to be
five pounds of bananas for $2.52.
8. Alternatively, we could filter based on date to get all transactions on 3/2/2021. We first
need to clear the filter in cell F1 by clicking on the Filter symbol and selecting Select All.
9. Click the chevron in cell D1 (Date of Sale), click Select All to unselect all, and then
select 2021, then March, then 2.
ISTUDY
540 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)
Microsoft Excel
(Level 4) PivotTables
PivotTables allow you to quickly summarize large amounts of data. In Excel, click
Insert > PivotTable, choose your data source, then click the check mark next to or
drag your fields to the appropriate boxes in the PivotTable Fields pane to identify
filters, columns, rows, or values. You can easily move attributes from one pane to
another to quickly “pivot” your data. Here is a brief description of each section:
Rows: Show the main item of interest. You usually want master data here, such as
customers, products, or accounts.
Columns: Slice the data into categories or buckets. Most commonly, columns are
used for time (e.g., years, quarters, months, dates).
Values: This area represents the meat of your data. Any measure that you would like to
count, sum, average, or otherwise aggregate should be placed here. The aggregated
values will combine all records that match a given row and column.
Filters: Placing a field in the Filters area will allow you to filter the data based on that
field, but it will not show that field in the data. For example, if you wanted to filter
based on a date, but didn’t care to view a particular date, you could use this area of
the field list. With more recent versions of Excel, there are improved methods for filtering,
but this legacy feature is still functional.
10. Let’s compute the accumulated gross margin for bananas, apricots, and apples.
11. First, unclick the filter at Data > Sort & Filter > Filter by clicking on and unselecting Filter.
12. Next, let’s compute the gross margin for each line item in the invoice. In cell J1, input the
words Gross Margin. Underline it with a bottom border. In cell J2, input =H2-I2 and Enter.
Microsoft Excel
ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 541
Microsoft Excel
17. The empty PivotTable will open up in a new worksheet, ready for the PivotTable analysis.
Drag [Description] from FIELD NAME into the Rows and [Gross Margin] from
FIELD NAME into ΣValues fields in the PivotTable. The ΣValues will default to “Sum
of Gross Margin”.
The resulting PivotTable will look like this:
18. The analysis shows that the gross margin for apples is $140.39; for apricots, $78.02;
and for bananas, $77.08.
ISTUDY
542 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)
Data Dictionary:
Sales_Transactions Table
Sales_Order_ID: Unique identifier for each individual Sales Order
Sales_Order_Date: Date each sales order was placed
Sales_Order_Quantity_Sold: Quantity of each product sold on the transaction
Product_Description: Description of the product sold
Product_Sale_Price: Price of each product sold on the transaction
Store_Location: State in which the store is located
State Sales Tax Table
State: The state abbreviation
State_Tax_Rate: The tax rate for each state
There are two columns that match in these two tables: Store_Location (from the Sales_
Transactions table) and State (from the State Sales Tax table). These two tables are placed
next to each other to make the VLOOKUP function easier to manage. And note that the
State Sales Tax table is organized such that the value to look up (State) is to the left of the
value (sales tax rate) to find.
Step 1:
We will add a new column to the Sales_Transactions table to bring in the State_Sales_Tax
associated with every Customer_St listed in the transactions table.
1. Add a new column to the right of the Store_Location named State_Sales_Tax (cell G1).
Microsoft Excel
ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 543
In cell G2, we will create a VLOOKUP function. VLOOKUP functions have four
arguments:
• Cell_reference: The cell in the current table that has a match in the related table. In this
case, it is a reference to the row’s corresponding Store_Location. Excel will match that
state with the corresponding state in the State Sales Tax table.
• Table_array: An entire table reference to the table that contains the descriptive data
that you wish to be returned. In this case, it is the entire State Sales Tax table.
• Column_number: The number of the column (not the letter!) that contains the descrip-
tive data that you wish to be returned. In this case, State Sales Tax Rate is in the second
column of the State Sales Tax table, so we would type 2.
• TRUE or FALSE: There are two types of VLOOKUP functions, TRUE and FALSE.
TRUE is for looking up what Excel calls “approximate” data. In our case, we’ll use
FALSE. A FALSE VLOOKUP will only return matches when there is an exact match
between the two tables (whenever your data are relational structured data, a perfect
match should be easily discoverable). If True or False is not designated, TRUE is the
default. Also, True can also be represented as 1 and False as 0.
Step 2:
2. Type in the following function (using cell references will be easier than typing manually):
Microsoft Excel
3. Once you click Enter, the formula should copy all the way down—once again exhibiting
the benefits of working with Excel tables instead of ranges.
ISTUDY
Appendix C
Accessing the Excel Data Analysis
Toolpak
Excel offers a toolpak that helps perform much of the data analysis, called the Excel Data
Analysis Toolpak.
To run a correlation, form a histogram, run a regression, or perform other similar analy-
sis using the Excel Data Analysis Toolpak, we need to make sure our Analysis Toolpak
is loaded up by looking at the ribbon of Data > Analysis and seeing if the Data Analysis
Add-In has been installed.
Microsoft Excel
If it has not yet been added, go to File> Options > Add-Ins, select the Analysis Toolpak,
and select GO:
544
ISTUDY
Appendix C Accessing the Excel Data Analysis Toolpak 545
In the Add-ins window that appears, place a check mark next to Analysis ToolPak and
then click OK. This will add the Data Analysis ToolPak to the Data tab so you can perform
additional data analysis.
Microsoft Excel
To perform the additional data analysis, please select Data > Analysis > Data Analysis.
A dialog box will open.
ISTUDY
Appendix D
SQL Part 1
SQL can be used to create tables, delete records, or edit databases, but in Data Analytics,
we primarily use SQL to extract data from the database—that is, not to edit or manipulate
the data, but to create different views of the data to help us answer business questions. SQL
extraction queries are also referred to as SELECT queries, because they each begin with the
word SELECT.
Throughout this appendix, all the examples and the practice problems refer to Appendix
D Data.accdb. This is a very small database to help you immediately visualize what a que-
ry’s results would look like.
Introduction to SELECT
SELECT indicates which attributes you wish to view. For example, the Customers table
contains a complete customer list with several descriptive attributes for each of the com-
pany’s customers. If you would like to see a full customer list, but you just want to see
FirstName, LastName, and State, you can just select those three attributes in the first line
of your query:
SELECT FirstName, LastName, State
Introduction to FROM
FROM lets the database management system know which table(s) contain(s) the attributes
that you are selecting. For instance, in the query begun previously, the three attributes in the
SELECT clause come from the Customers table. So that query can be completed with the
following FROM clause:
FROM Customers
Try putting that query all together to see the results:
SELECT FirstName, LastName, State
FROM Customers
546
ISTUDY
Appendix D SQL Part 1 547
EXHIBIT D-1
Microsoft Access 2016
If you wish to view the same three columns, but you want to see the LastName column
as the first column so that the results more closely resemble a phone book, you can change
the order of the attributes listed in your SELECT statement:
SELECT LastName, FirstName, State
FROM Customers
Now the query returns the same number of records, but with a different order of attri-
butes (columns), seen in Exhibit D-2:
EXHIBIT D-2
Microsoft Access 2016
ISTUDY
548 Appendix D SQL Part 1
After you get the hang of creating simple SELECT FROM queries, you can begin to bring
in some of the SQL clauses that can make our queries even more interesting. The next two
SQL clauses we will cover are WHERE and ORDER BY. They follow FROM, to make a
query in this order:
SELECT
FROM
WHERE
ORDER BY
Introduction to WHERE
WHERE behaves like a filter in Excel. An example of using WHERE to modify the query
is the following:
SELECT LastName, FirstName, State
FROM Customers
WHERE State = “Arkansas”
That query would return only the customers who were from Arkansas; the result is
shown in Exhibit D-3:
EXHIBIT D-3
Microsoft Access 2016
ISTUDY
Appendix D SQL Part 1 549
ISTUDY
550 Appendix D SQL Part 1
EXHIBIT D-5
Microsoft Access 2016
To extract all of the records from the Customers table that follow the last name “jones”
alphabetically:
SELECT *
FROM Customers
WHERE LastName > “Jones”
That query returns the following records shown in Exhibit D-6:
EXHIBIT D-6
Microsoft Access 2016
If you wanted to include any employees with the last name of Jones in the list, you would
change the operator from > to >= :
SELECT *
FROM Customers
WHERE LastName >= “Jones”
The revised output is shown in Exhibit D-7:
EXHIBIT D-7
Microsoft Access 2016
ISTUDY
Appendix D SQL Part 1 551
4. Write a query that will return all of the records from the Customers table of the custom-
ers from Texas.
5. Write a query that will return all of the records from the Customers table of the custom-
ers from Baton Rouge
Introduction to ORDER BY
In Exhibit D-7, when you added Jeremy Jones to the output, you might have been surprised
that the order of the records didn’t change. The default order of SQL queries is ascending
based on the first column selected. When you SELECT *, the default will be in the order of
the primary key, which is the order of the records in the original table.
If you would like to sort the records in a query output based on any other column, you
can do so with the ORDER BY clause.
The syntax of an ORDER BY clause is the following:
To sort the records in ascending order (1 to infinity or A to Z): ORDER BY
[attribute_name] ASC
EXHIBIT D-8
Microsoft Access 2016
ISTUDY
552 Appendix D SQL Part 1
Notice how the two exhibits have the same information, the same order of attributes,
and the same number of records, but the ordering of the records has changed.
To revise the same query, but this time to order the results by both Last Name and First
Name (ascending):
SELECT LastName, FirstName, State
FROM Customers
ORDER BY LastName ASC, FirstName ASC
That query returns the following records shown in Exhibit D-9:
EXHIBIT D-9
Microsoft Access 2016
ISTUDY
Appendix D SQL Part 1 553
The following query uses an aggregate function to create a query that would show the
total count of orders in the Sales_Orders table:
SELECT COUNT(Sales_Order_ID)
FROM Sales_Orders
The output of that query would produce only one column and one row, shown in
Exhibit D-10:
EXHIBIT D-10
Microsoft Access 2016
The problem with this output, of course, is the lack of description. The column is titled
Expr1000, which is not very descriptive. This title is produced because there isn’t a column
named COUNT(Sales_Order_ID), so the database management system doesn’t know what
to title the column in the output.
To make this column more meaningful, we can use aliases. An alias simply renames a
column. It is written as AS. To rename the column COUNT(Sales_Order_ID) to Count_
Total_Orders, the query would look like the following:
SELECT COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
The output is more meaningful with the alias added in, shown in Exhibit D-11:
EXHIBIT D-11
Microsoft Access 2016
To create a query that would show the grand total quantity of products ever sold (as stored
in the Sales_Orders table) with a meaningful column name, we could run the following:
SELECT SUM(Quantity_Sold) AS Total_Quantity_Sold
FROM Sales_Orders
Which returns the following output, shown in Exhibit D-12:
EXHIBIT D-12
Microsoft Access 2016
ISTUDY
554 Appendix D SQL Part 1
Introduction to GROUP BY
In the introduction to aggregates, we worked through an example that provided the grand
total count of orders in the Sales_Orders table:
SELECT COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
That query results in a grand total of 10, but what if we would like to see how those data
split up among customers who have ordered from us? This is where GROUP BY comes
in. GROUP BY works as the “engine” that powers subtotaling the data. After the keyword
GROUP BY, you indicate the attribute by which you would like to slice the data. In this
case, we want to slice the grand total by CustomerID.
SELECT COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
GROUP BY CustomerID
The problem with this query is that it does slice the data by customer, but it doesn’t actually
show us the CustomerID associated with each subtotal. The output is shown in Exhibit D-13:
EXHIBIT D-13
Microsoft Access 2016
If we want to actually view the CustomerID that is associated with each subtotal, we need
to not only put the attribute in the GROUP BY field, but also add it to the SELECT field.
Remember from earlier in this tutorial that the order in which you place the attributes
in the SELECT clause indicates the order that those columns will display in the output.
For this output, it would make the most sense to see CustomerID before Count_Total,
because CustomerID is acting as a label for the totals. We can modify the query to include
CustomerID in the following way:
SELECT CustomerID, COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
GROUP BY CustomerID
This provides the following output, shown in Exhibit D-14:
EXHIBIT D-14
Microsoft Access 2016
ISTUDY
Appendix D SQL Part 1 555
Similarly, we can extend the second example provided in the Aggregates section that cre-
ated a grand total of the quantity sold from the Sales_Orders table. If we would prefer to not
see the grand total quantity sold, but instead slice that total by InventoryID in order to see
the subtotal of the quantity of each inventory item sold, we can create the following query:
SELECT InventoryID, SUM(Quantity_Sold) AS Total_Quantity_Sold
FROM Sales_Orders
GROUP BY InventoryID
Which produces the following query, shown in Exhibit D-15:
EXHIBIT D-15
Microsoft Access 2016
Notice that InventoryID needs to be added in two places: You must place it in the
GROUP BY clause to provide the “engine” that subtotals a grand total (or slices it), and
then you must also place InventoryID in the SELECT clause so that you can see the labels
associated with each subtotal.
GROUP BY Practice
1. Create a query that would show the total quantity of items sold each day. Rename the
aggregate Total_Quantity_Sold.
2. Create a query that would show the total number of Customers we have stored in the
Customers table, and group them by the State the customers are from. Rename the
aggregate column in the output Num_Customers.
Introduction to HAVING
Occasionally when running a query to gather subtotals (using a GROUP BY clause), you
do not want to see all of the results, but instead would rather filter the results for certain
subtotals. Unfortunately, SQL cannot filter aggregate measures in the WHERE clause, but
fortunately, we have a different clause that can—HAVING.
Anytime you wish to filter your query results based on aggregate values (e.g.,
SUM(Quantity_Sold)), you can do so in the HAVING clause.
For example, in the previous section about GROUP BY, we created a query to see the
total count of orders associated with each customer. The output showed that the vast major-
ity of our customers had participated in only one order. But what if we wanted to only see
the customer(s) who had participated in more than one order?
We can create the following query to add in this filter:
SELECT CustomerID, COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
GROUP BY CustomerID
HAVING COUNT(Sales_Order_ID) > 1
As it turns out, there is only one customer who participated in more than one order, as
we can see in the query output, shown in Exhibit D-16:
ISTUDY
556 Appendix D SQL Part 1
EXHIBIT D-16
Microsoft Access 2016
• The aggregate can be any of our aggregate values, SUM(), AVG(), or COUNT().
• The attribute is the field that you are aggregating, SUM(Quantity) or
COUNT(CustomerID).
• The = can be replaced with any operator, =, <, >, =<, =>, <>.
• The number is the value that you are filtering your results on.
Let’s work through another example. The second example in the GROUP BY section
showed the quantity sold of each inventory item. If we want to view only those items that
have sold less than 5 items, we can create the following query:
SELECT InventoryID, SUM(Quantity_Sold) AS Total_Quantity_Sold
FROM Sales_Orders
GROUP BY InventoryId
HAVING SUM(Quantity_Sold) < 5
This query produces the following output, shown in Exhibit D-17:
EXHIBIT D-17
Microsoft Access 2016
HAVING Practice
1. Create a query that would show the total quantity of items sold each day. Rename the
aggregate Total_Quantity_Sold. Show only the days on which more than 6 items were sold.
2. Create a query that would show the total number of Customers we have stored in the
Customers table, and group them by the State the customers are from. Rename the
aggregate column in the output Num_Customers. Show only the states that more than
one customer is from.
ISTUDY
Appendix D SQL Part 1 557
EXHIBIT D-18
Microsoft Access 2016
The call-out circle and boxes in the exhibit can help us find how these two tables are
related. First, we can see the circle that indicates the relationship connecting the Customers
and Sales_Orders table. This shows us that the two tables are indeed related. The next step
is to identify how they are related. The two red boxes in Exhibit D-18 indicate the related
fields, CustomerID is the primary key in the Customers table, and CustomerID is the for-
eign key in the Sales_Orders table. Since these two tables are related, we can retrieve them
fairly easily with a JOIN clause.
In order to retrieve data from more than one table, we need to use SQL JOINs. There are
three types of JOINs, but for much of our analysis, an INNER JOIN will suffice. JOINs are
technically part of the FROM clause. They follow the following template:
FROM table1
INNER JOIN table2
ON table1.matching_key = table2.matching_key
The order of the tables does not matter; you could place the Customers table in either
the FROM or the INNER JOIN clause, and the order of the tables does not matter in the
ON clause. It just matters that you indicate both tables you want to retrieve data from, and
that you indicate the two different tables with their matching keys in the ON clause.
To select all of the data from the Customers table and the Sales_Orders table, you can
run the following query:
SELECT *
FROM Customers
INNER JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID
If you want to only select the Sales_Order_ID and the Order_Date from the Sales_
Orders table, but also select the State attribute from the Customers table, you could run the
following query:
SELECT Sales_Order_ID, Order_Date, State
FROM Customers
INNER JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID
ISTUDY
558 Appendix D SQL Part 1
ISTUDY
ISTUDY
Confirming Pages
Appendix E
SQL Part 2
In Appendix D, you learned about many key terms in SQL, including how to join tables.
The purpose of joining tables is to enable you to retrieve data that are stored in more than
one table all at once. The join type that you learned about in Appendix D is an INNER
JOIN. There are two other popular join types, though, LEFT and RIGHT.
We will work with the same Access Database that you used in Appendix D. Although it
contains the same data, you can access it through the Appendix E Data.accdb.
We’ll start with bringing these data into Tableau because Tableau has a great way of visu-
alizing joined tables, and specifically, the differences between INNER, LEFT, and RIGHT
JOINs.
1. Open Tableau.
2. Select Microsoft Access to connect to the file and navigate to where you have stored the
file, then click Open.
3. In the Data Source view, drag the Customers table to the Drag tables here section.
4. Double-click on the Customers rectangle to enter the Physical layer.
5. Double-click the Sales_Orders table to create a join between Sales_Orders and Customers.
Click the Venn diagram to see the following details about how the tables are related:
560
Tableau has defaulted to joining these two tables with an INNER JOIN, and it has accu-
rately identified the two keys that are related between the two tables, CustomerID in the
Customers table, and CustomerID in the Sales_Orders table.
This is very similar to how you would write a query to gather the same information
directly in the Access database, where one of the tables is indicated in the FROM clause,
the second table is indicated in the INNER JOIN clause, and the keys that are common
between the two tables are indicated with an equal sign between them in the ON clause:
SELECT *
FROM Customers
INNER JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID
As the Venn diagram suggests, an INNER JOIN will show all of the data for which there
is a match between the two tables. However, it is important to notice what that means it
leaves out—it will not return any of the data for which there is not a match between the two
tables.
In this instance, there is actually one customer held in the Customers table that is not
included in the Sales_Orders table (Customer 3, Edna Orgeron). Why would this happen?
Perhaps this fictional company records data on potential customers, so even though some-
one may have not actually purchased anything yet, the company can still contact them.
Whatever the reason might be—the fact that CustomerID 3 is in the Customers table, but
has no reference in the Sales_Orders table means that using an INNER JOIN will not
include CustomerID 3 in the results.
If the above SQL query were to be run, the following result would return:
SQL
Notice that the red box surrounding the records for Customers 2 and 4 do not include
anything for Customer 3.
ISTUDY
562 Appendix E SQL Part 2
The red box indicates an important change that occurs as soon as we made the change
to a LEFT JOIN—Customer 3 is included! But not only that, while we see Customer 3’s
name and contact information, we see null values for any attributes from the Sales_Orders
table. That is because there isn’t any corresponding information for Customer 3 in the
Sales_Orders table.
To replicate this query in Access, the only change that needs to be made is swap the
word INNER with LEFT:
SELECT *
FROM Customers
LEFT JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID
It is easier to visualize how joins are created in Tableau, but they work the same way in
SQL. The table that you place in the FROM clause is the “left” table, and the table that you
place in the JOIN clause is the “right table.”
ISTUDY
ISTUDY
Appendix F
Power Query in Excel and Power BI
Excel’s Get and Transform tools are a part of the Power BI suite that is integrated into
Excel 2016 and also the standalone Power BI application. These tools allow you to connect
directly to a dataset stored in a variety of locations, including Excel files; .csv files; the web;
and a multitude of relational databases, including Microsoft Access, SQL Server, Teradata,
Oracle, PostGreSQL, and MySQL.
Throughout this text, the majority of the times we analyze the Dillard’s dataset in the
Comprehensive Labs, we will load the data from SQL Server into Power BI and transform
it using Power Query, or we will load the data into Excel using the Get and Transform tool
or into Power BI.
When we extract the data, we may want to extract entire tables, or we may want to
extract only a portion via a SQL query.
In this appendix, we will connect to the Dillard’s data. The Dillard’s data are stored on
the University of Arkansas’ remote desktop, so make sure to log in to the desktop in order
to work through these steps. Ask your instructor for login information if you do not have it
already.
564
ISTUDY
Appendix F Power Query in Excel and Power BI 565
2. The following box will pop up, into which you should provide the name of the Server
and the Database name that your instructor provides you. For the comprehensive exer-
cises, we use the Database name WCOB_DILLARDS.
SQL
ISTUDY
566 Appendix F Power Query in Excel and Power BI
Once you have input the Server and Database name, you have two options:
3. Extract entire tables by clicking OK. Continue to Step 5.
4. Extract only a portion of data from one or more tables based on the criteria of a SQL
query. To do so, click Advanced Options. Skip to Step 7.
In either instance, after clicking OK, you will be prompted for two messages:
• When prompted to input your credentials, select Use my Current Credentials and click
Connect.
• When prompted with an Encryption Support window, click OK.
SQL
6. Select the table(s) that you would like to load into Excel. If you would like to select
more than one table, place a check mark in the box next to Select multiple items and
select STORE and TRANSACT.
SQL
ISTUDY
Appendix F Power Query in Excel and Power BI 567
SQL
ISTUDY
568 Appendix F Power Query in Excel and Power BI
• Clicking Load will load the data directly into an Excel table or Power BI datasheet.
• Clicking Edit or Transform Data will open the Power Query window for you to trans-
form the data before they are loaded into Excel (add or delete columns, remove or
transform null values, aggregate data, etc.).
• Click the Close & Load or Close & Apply button when you are finished transforming
the data to load them into Excel.
• The Remove Rows button provides options to remove rows with nulls in selected col-
umns, with duplicates in selected columns, or based on other criteria
Transform tab on the ribbon:
ISTUDY
Appendix F Power Query in Excel and Power BI 569
• Replace Values functions the same way in Power Query as it does in Excel, except the
transformation is stored and thus repeatable when created in Power Query.
• Pivot Column creates two new columns out of an existing category column (for exam-
ple, you can pivot the Transaction_Type column by the transaction amount).
• The Date button will allow you to transform an existing date column into a date part
(year, month, day, etc.) or change the date format. It is also useful to create duplicates
of existing date columns, then transform the copies into the date parts.
Loading the data to the data model will allow us to work with a large dataset in a
PivotTable, even though the dataset itself is too large for the worksheet.
ISTUDY
570 Appendix F Power Query in Excel and Power BI
ISTUDY
ISTUDY
Appendix G
Power BI Desktop
Power BI Desktop is a Microsoft tool that combines ETL tools with reporting tools. When
we work with Power Query or PowerPivot in Excel, we’re actually working with Power
BI tools. If you will ultimately want to run statistical tests such as hypothesis testing or
regression analysis, it’s best to work within Excel directly and use the Power Query add-in.
However, if you need to transform your data using Power Query prior to creating reports or
dashboards or even if you just want to explore your data, Power BI Desktop can be a great
alternative to other reporting and visualization tools such as Tableau.
When it comes to creating visualizations in Power BI, you can create extremely similar
results to what you can create in Tableau, but the path to getting there is different. Power
BI defaults to a report mode (similar to Tableau’s Dashboard mode), so that as you create
visuals, they appear as tiles that you can resize and rearrange around the canvas.
When you open Power BI Desktop, you will be greeted with a startup screen similar to
the following:
Power BI Desktop
The tutorials and other training resources on the right of the startup screen are helpful
for getting started with the tool.
The Get Data button on the left of the startup screen will bring you into Power BI’s
Power Query tool. It is set up exactly like the Power Query tool is set up in Excel, so you can
use it to connect to a variety of sources (Excel, SQL Server, Access, etc.).
572
ISTUDY
Appendix G Power BI Desktop 573
To familiarize yourself with Power BI, we will use the Appendix_G_Data.xlsx. It is a modi-
fied version of the Sláinte file that you might work with in Lab 2-2, Lab 4-1, Lab 4-2, Lab 6-3,
or Lab 7-2. The data are a subset of the sales data related to a fictional brewery named Sláinte.
1. Click Get Data on the startup screen.
2. Select Excel from the list of possible data sources, then click Connect.
Power BI Desktop
3. Browse to the file location for Power BI_Appendix.xlsx and Open the file.
4. Because there are three spreadsheets in the file, the Navigator provides you the option
to select 1, 2, or all of the spreadsheets. Place check marks in each.
5. You are also given an option to either Load or Edit the data. If you click Edit, you will
enter the Power Query window with the same ribbon and options to transform the data
as you are familiar with from the Excel version of the tool (add columns, split columns,
pivot data, etc.). These data do not need to be transformed, so we will click Load.
6. Once the data are loaded, you will see a blank canvas on which you can build a report.
There are three key elements bordering the canvas:
a. To the left of the blank canvas, you are presented with three options:
ISTUDY
574 Appendix G Power BI Desktop
Report Mode: The first option, represented with an icon that looks like a bar chart,
is for Report mode. This is the default view and is where you can build your visual-
izations and explore your data.
Data Mode: The second option, represented with an icon that looks like a table or a
spreadsheet, is for Data mode. If you click into this icon, you can view the raw data
that you have imported into Power BI. You can also create new measures or new col-
umns from this mode.
Model Mode: The third option, which looks like a database diagram, is for Model
mode. If you click into this icon, you enter PowerPivot. From this mode, you can edit
the table and attribute names or edit relationships between tables.
b. To the right of the blank canvas is your Fields list and your options for Visualizations.
Power BI Desktop
Visualizations: You can drag any of these options over into the canvas to begin
designing a visualization. Once you have tiles on your report, you can change the
type of visualization being used to depict a set of fields by clicking the tile, then
selecting any of the visualization options to change the way the data are presented.
Fields: This section is similar to your PivotTable field list. You can expand the tables
to see the attributes that are within each and placing a check mark in the fields will
add them to an active tile.
ISTUDY
Appendix G Power BI Desktop 575
Values, Filters, etc.: This section will vary based on the tile and the fields you are actively
working with. Anytime you add a field to a visualization, that field gets automatically added
to the filters, which cuts out the need to manually add filters or slicers to your PivotTable.
c. Immediately above the canvas is the familiar ribbon that you can expect from
Microsoft applications. The four tabs—Home, View, Modeling, and Help—stay
consistent across the three different modes (report, data, and model), but the
options that you can select will vary based on the mode in which you are working.
7. To begin working with the data, Expand the Customer table to place a check mark in
the State field.
Power BI Desktop
8. Power BI will default to creating a tile with a map visualization. This is similar to how
Tableau defaults to working with geographic data. To make the map more interesting,
expand the Sales_Orders table to place a check mark in the Quantity Sold field.
Power BI Desktop
ISTUDY
576 Appendix G Power BI Desktop
This will make the tile more interesting by changing the size of the symbol associated
with each state—the larger the symbol, the higher the quantity sold in that state.
9. You can also change the way the data are presented by selecting a different visualiza-
tion type. Select the first option to view the data in a horizontal bar chart.
Power BI Desktop
10. One of the most exciting offerings from Power BI is its natural language processing for
data exploration. From the Insert tab in the ribbon, click Buttons. In the drop-down,
select Q&A.
Power BI Desktop
11. The following icon will appear as a separate tile. If the placement defaults to being on
top of the bar chart, you can click and drag it to somewhere else on the canvas:
Power BI Desktop
ISTUDY
Appendix G Power BI Desktop 577
12. To activate the Q&A, ctrl + click the icon. The following window will pop up and you
can select from the list of questions that Power BI has come up with, or you can type
directly into the “Ask a question about your data” box.
Power BI Desktop
13. You can also add a question directly to the canvas by selecting Q&A from the
Visualizations pane.
There are many other exciting benefits that Power BI can do, but with this introduction
you should have the confidence to jump in and explore more that Power BI has to offer.
ISTUDY
Appendix H
Tableau Prep Builder
Before jumping into the labs, you may wish to introduce yourself to Tableau Prep through
this appendix if you have never used the Tableau Prep tool. Tableau Prep Builder is a tool
used to extract, transform, and load data from various sources interactively. When you
create a flow in Tableau Prep that cleans, joins, pivots, and exports data for use in Tableau
Desktop, the flow can be reused with multiple datasets with the same structure.
To access Tableau Prep, you can use the University of Arkansas’ remote desktop to
access the Walton Lab (see your instructor for instructions on how to access it), or you can
download a free academic usage license of Tableau by following this URL: https://www.
tableau.com/academic/students. Tableau Prep will work on a PC or a Mac. The images in
this textbook will reflect Tableau Prep for PC, but it is very similar to Tableau Prep for Mac.
Tableau Prep can connect to a variety of data types, including Excel, Access, and SQL
Server. We will connect to the Superstore sample flow.
1. Open Tableau Prep Builder.
2. Immediately upon opening Tableau, you will see a list of flows to open or you can connect
to data. Open the Superstore sample flow.
578
ISTUDY
Appendix H Tableau Prep Builder 579
3. Here you will see a completed flow used to transform and combine various data files.
Click through each of the primary steps (items 4 through 7 below) to see transforma-
tion options:
4. Click the Orders_East icon to show the connect to data step. This makes the connection
to the raw data. Here you can rename fields and uncheck attributes to exclude them
from your transformation.
ISTUDY
580 Appendix H Tableau Prep Builder
5. Click the Fix Data Type step to clean and transform the data. This allows you to make
transformations to fix errors and combine values. It also shows a preview of the data
and frequency distribution of values for each field.
6. Click the All Orders step to show the union step. This is where you will combine mul-
tiple data files into one. In this case, the union will add each of the preceding tables one
after the other into a larger table.
ISTUDY
Appendix H Tableau Prep Builder 581
7. In the flow, click the Create ‘Superstore Sales.hyper’ step where you export the data.
This step is where you save the cleaned data as a Tableau hyper file or Excel file for use
in another program, such as Tableau Desktop. It also shows you a preview of your trans-
formed data.
8. Finally, to add any additional steps to the flow, click the + icon to the right of any box.
This gives you the option to add a new task in the flow. Click through each of the steps
to see options available.
• Clean Step to fix errors,
• Aggregate to calculated summary statistics,
• Pivot to summarize data on multiple attributes like a PivotTable in Excel,
• Join step to combine two tables on a matching primary key and foreign key pair,
• Union to combine similar tables into one larger table,
• Script to do some advanced transformation using a scripting language, or
• Output to export your data at any point in the process.
ISTUDY
Appendix I
Tableau Desktop
Before jumping into the labs, you may wish to introduce yourself to Tableau Desktop
through this appendix if you have never used the Tableau Desktop tool. Tableau Desktop is
an interactive data visualization and modeling tool that allows you to connect to data and
create powerful dashboards.
To access Tableau Desktop, you can use the University of Arkansas’ remote desktop
(see your instructor for instructions on how to access it), or you can download a free aca-
demic usage license of Tableau by following this URL: https://www.tableau.com/academic/
students. Tableau will work on a PC or a Mac. The images in this textbook will reflect
Tableau for PC, but it is very similar to Tableau for Mac.
Tableau Desktop can connect to a variety of data types, including Excel, Access, and
SQL Server. We will connect to the dataset Appendix_I_Data.xlsx. If you worked through
Appendix B about PivotTables, this is the same dataset that you worked with previously.
1. Open Tableau.
2. Immediately upon opening Tableau Desktop, you will see a list of file types that you can
connect to. We’ll connect to an Excel file, so click Microsoft Excel.
582
ISTUDY
Appendix I Tableau Desktop 583
4. To begin working with the data, click Sheet 1 in the bottom left.
ISTUDY
584 Appendix I Tableau Desktop
5. To begin creating some basic visualizations with the data, double-click on the measure
Gross Margin.
Immediately you will see how Tableau interacts with data differently than Excel because
it has defaulted to displaying a bar chart. This isn’t a very meaningful chart as it is, but
you can add meaning by adding a dimension.
ISTUDY
Appendix I Tableau Desktop 585
You can continue adding attributes (sometimes referred to as pills) to the columns or
rows shelves and changing the method of visualization using the Show Me tab to fur-
ther familiarize yourself with the tool.
ISTUDY
Appendix J
Data Dictionaries
This appendix describes the datasets and tables used in this textbook. Complete data
dictionaries with tables, attributes, and descriptions can be found in the textbook resources
in Connect.
Sláinte
Sláinte data contain author-generated data that reflect business transactions for a fictional
brewery. Different datasets look at purchases, sales, and other general transactions. See the
Sláinte data dictionary on Connect for full details.
LendingClub
This dataset contains demographic data tied to approved and rejected loans on the Lending-
Club peer-to-peer lending website. These attributes are anonymized to hide individual
borrowers, but contain information such as employment history, outstanding accounts, and
debt-to-income information.
College Scorecard
This dataset is provided by the U.S. Department of Education and contains demographic
information about the composition of student body and completion rates of different
colleges in the United States.
586
ISTUDY
Appendix J Data Dictionaries 587
S&P100
The SP100 Facts dataset contains values attached to XBRL financial statement data pulled
from the SEC EDGAR data repository. The single table contains the company name, ticker,
standardized XBRL tag, value, and year. This dataset mimics the data available in the live
Google Sheet XBRLAnalyst add-in.
The SP100 Sentiment dataset contains word counts from financial statements pulled
from the SEC EDGAR data repository. This single table includes company information
and filing dates, as well as total word and character counts. Additionally, it includes word
counts for categories of terms that match those in the Harvard and Loughran-McDonald
sentiment dictionaries, such as positive, negative, modal, litigious, and uncertain.
Avalara
This dataset provides tax information for U.S. states including different region, state, county,
and city tax rates. It is used for tax planning purposes.
ISTUDY
Glossary
A common size financial statement (407) A type of financial
statement that contains only basic accounts that are common
accounting information system (54) A system that records, across companies.
processes, reports, and communicates the results of
composite primary key (58) A special case of a primary
business transactions to provide financial and nonfinancial
key that exists in linking tables. The composite primary key
information for decision-making purposes.
is made up of the two primary keys in the table that it is
alternative hypothesis (131) The opposite of the null linking.
hypothesis, or a potential result that the analyst may expect.
computer-assisted audit techniques (CAATs)
audit data standards (ADS) (249) The audit data standards (286) Automated scripts that can be used to validate data,
define common tables and fields that are needed by auditors test controls, and enable substantive testing of transaction
to perform common audit tasks. The AICPA developed these details or account balances and generate supporting evidence
standards. for the audit.
continuous auditing (253) A process that provides real-time
B assurance over business processes and systems.
Balanced Scorecard (342) A particular type of digital dash- continuous data (188) One way to categorize quantitative
board that is made up of strategic objectives, as well as KPIs, data, as opposed to discrete data. Continuous data can take
target measures, and initiatives, to help the organization on any value within a range. An example of continuous data
reach its target measures in line with strategic goals. is height.
Benford’s law (128, 292) The principle that in any large, continuous monitoring (253) A process that constantly
randomly produced set of natural numbers, there is an evaluates internal controls and transactions and is the chief
expected distribution of the first, or leading, digit with 1 responsibility of management.
being the most common, 2 the next most, and down succes- continuous reporting (253) A process that provides real-
sively to the number 9. time access to the system status and accounting information.
Big Data (4) Datasets that are too large and complex customer relationship management (CRM) system (54) An
for businesses’ existing systems to handle utilizing their information system for managing all interactions between the
traditional capabilities to capture, store, manage, and analyze company and its current and potential customers.
these datasets.
D
C
Data Analytics (4) The process of evaluating data with
causal modeling (133) A data approach similar to regres- the purpose of drawing conclusions to address business
sion, but used to test for cause-and-effect relationships questions. Indeed, effective Data Analytics provides a way
between multiple variables. to search through large structured and unstructured data to
classification (11, 133) A data approach that attempts identify unknown patterns or relationships.
to assign each unit in a population into a few categories data dictionary (19, 59) Centralized repository of
potentially to help with predictions. descriptions for all of the data attributes of the dataset.
clustering (11, 128) A data approach that attempts to data mart (459) A subset of the data warehouse focused
divide individuals (like customers) into groups (or clusters) on a specific function or department to assist and support its
in a useful or meaningful way. needed data requirements.
co-occurrence grouping (11, 129) A data approach that data reduction (12, 120) A data approach that attempts
attempts to discover associations between individuals based to reduce the amount of information that needs to be
on transactions involving them. considered to focus on the most critical items (e.g., highest
common data model (249) A tool used to map existing cost, highest risk, largest impact, etc.).
database tables and fields from various systems to a data request form (62) A method for obtaining data if you
standardized set of tables and fields for use with analytics. do not have access to obtain the data directly yourself.
588
ISTUDY
Glossary 589
data warehouse (248, 459) A repository of data accumulated Enterprise Resource Planning (ERP) (54) Also known as
from internal and external data sources, including financial Enterprise Systems, a category of business management
data, to help management decision making. software that integrates applications from throughout the
decision boundaries (138) Technique used to mark the split business (such as manufacturing, accounting, finance, human
between one class and another. resources, etc.) into one system.
decision support system (141) An information system ETL (60) The extract, transform, and load process that is
that supports decision-making activity within a business by integral to mastering the data.
combining data and expertise to solve problems and perform exploratory visualizations (189) Made when the lines between
calculations. steps “P” (perform test plan), “A” (address and refine results),
decision tree (138) Tool used to divide data into smaller and “C” (communicate results) are not as clearly divided as
groups. they are in a declarative visualization project. Often when you
are exploring the data with visualizations, you are performing
declarative visualizations (188) Made when the aim of the test plan directly in visualization software such as Tableau
your project is to “declare” or present your findings to an instead of creating the chart after the analysis has been done.
audience. Charts that are declarative are typically made
after the data analysis has been completed and are meant to
exhibit what was found in the analysis steps. F
descriptive analytics (116, 286) Procedures that summarize financial statement analysis (406) Used by investors,
existing data to determine what has happened in the past. analysts, auditors, and other interested stakeholders to
Some examples include summary statistics (e.g., Count, Min, review and evaluate a company’s financial statements and
Max, Average, Median), distributions, and proportions. financial performance.
descriptive attributes (58) Attributes that exist in relational flat file (57, 248) A means of storing data in one place,
databases that are neither primary nor foreign keys. These such as in an Excel spreadsheet, as opposed to storing the
attributes provide business information, but are not required data in multiple tables, such as in a relational database.
to build a database. An example would be “Company Name”
foreign key (58) An attribute that exists in relational
or “Employee Address.”
databases in order to carry out the relationship between two
diagnostic analytics (116, 286) Procedures that explore the tables. This does not serve as the “unique identifier” for each
current data to determine why something has happened the record in a table. These must be identified when mastering
way it has, typically comparing the data to a benchmark. As the data from a relational database in order to extract the
an example, these allow users to drill down in the data and data correctly from more than one table.
see how they compare to a budget, a competitor, or trend.
fuzzy matching (287) Process that finds matches
digital dashboard (125, 341) An interactive report showing that may be less than 100 percent matching by finding
the most important metrics to help users understand how correspondences between portions of the text or other entries.
a company or an organization is performing. Often created
using Excel or Tableau.
discrete data (188) One way to categorize quantitative data,
H
as opposed to continuous data. Discrete data are represented heat map (414) A visualization that shows the relative size
by whole numbers. An example of discrete data is points in a of values by applying a color scale to the data.
basketball game. heterogeneous systems approach (248) Heterogeneous
dummy variable (135) A numerical value (0 or 1) to represent systems represent multiple installations or instances
categorical data in statistical analysis; values assigned a 1 indi- of a system. It would be considered the opposite of a
cate the presence of something and 0 represents the absence. homogeneous system.
DuPont ratio (409) Developed by the DuPont Corporation to homogeneous systems approach (248) Homogeneous
decompose performance (particularly return on equity [ROE] systems represent one single installation or instance
into its component parts. of a system. It would be considered the opposite of a
heterogeneous system.
horizontal analysis (411) An analysis that shows the change
E of a value from one period to the next.
effect size (141) Used in addition to statistical significance human resource management (HRM) system (54) An
in statistical testing; effect size demonstrates the magnitude information system for managing all interactions between the
of the difference between groups. company and its current and potential employees.
ISTUDY
590 Glossary
ISTUDY
Glossary 591
Q normal distribution has 0 for its mean (and thus, for its mode
and median, as well), and 1 for its standard deviation.
qualitative data (186) Categorical data. All you can do with
standardization (188) The method used for comparing
these data is count and group, and in some cases, you can
two datasets that follow the normal distribution. By using a
rank the data. Qualitative data can be further defined in two
formula, every normal distribution can be transformed into
ways: nominal data and ordinal data. There are not as many
the standard normal distribution. If you standardize both
options for charting qualitative data because they are not as
datasets, you can place both distributions on the same chart
sophisticated as quantitative data.
and more swiftly come to your insights.
quantitative data (187) More complex than qualitative
standardized metrics (419) Metrics used by data vendors to
data. Quantitative data can be further defined in two ways:
allow easier comparison of company-reported XBRL data.
interval and ratio. In all quantitative data, the intervals
between data points are meaningful, allowing the data to structured data (4, 123) Data that are organized and reside
be not just counted, grouped, and ranked, but also to have in a fixed field with a record or a file. Such data are generally
more complex operations performed on them such as mean, contained in a relational database or spreadsheet and are
median, and standard deviation. readily searchable by search algorithms.
summary statistics (119) Describe the location, spread,
shape, and dependence of a set of observations. These
R commonly include the count, sum, minimum, maximum,
ratio analysis (407) A tool that attempts to evaluate mean or average, standard deviation, median, quartiles,
relationships among different financial statement items correlation covariance, and frequency that describe a specific
to help understand a company’s financial and operating measurable value.
performance. sunburst diagram (414) A visualization that shows inherent
ratio data (187) The most sophisticated type of data on hierarchy.
the scale of nominal, ordinal, interval, and ratio; a type of supervised approach/method (133) Approach used to learn
quantitative data. They can be counted and grouped just like more about the basic relationships between independent and
qualitative data, and the differences between each data point dependent variables that are hypothesized to exist.
are meaningful like with interval data. Additionally, ratio supply chain management (SCM) system (54) An
data have a meaningful 0. In other words, once a dataset information system that helps manage all the company’s
approaches 0, 0 means “the absence of.” An example of ratio interactions with suppliers.
data is currency.
support vector machine (139) A discriminating classifier
regression (11, 133) A data approach that attempts to that is defined by a separating hyperplane that works first to
estimate or predict, for each unit, the numerical value of find the widest margin (or biggest pipe).
some variable using some type of statistical model.
systems translator software (248) Systems translator
relational database (56) A means of storing data in order software maps the various tables and fields from varied ERP
to ensure that the data are complete, not redundant, and to systems into a consistent format.
help enforce business rules. Relational databases also aid in
communication and integration of business processes across
an organization. T
response (or dependent) variable (10) A variable that t-test (290) A statistical test used to determine if there is a
responds to, or is dependent on, another. significant difference between the means of two groups, or
two datasets.
Tax Cuts and Jobs Act of 2017 (460) Tax legislation
S offering a major change to the existing tax code.
similarity matching (11, 133) A data approach that tax data mart (459) A subset of a company-owned data
attempts to identify similar individuals based on data known warehouse focused on the specific needs of the tax department.
about them. tax planning (464) Predictive analysis of potential tax
sparkline (413) A small visual trendline or bar chart that liability and the formulation of a plan to reduce the amount
efficiently summarizes numbers or statistics in a single of taxes paid.
spreadsheet cell. test data (138) A set of data used to assess the degree
standard normal distribution (188) A special case of the and strength of a predicted relationship established by the
normal distribution used for standardizing data. The standard analysis of training data.
ISTUDY
592 Glossary
X
U
XBRL (eXtensible Business Reporting Language) (122, 417)
underfitting (140) A modeling error when the derived
A global standard for exchanging financial reporting
model poorly fits a limited set of data points.
information that uses XML.
unstructured data (4) Data that do not adhere to a pre-
defined data model in a tabular format. XBRL taxonomy (417) Defines and describes each key
data element (like cash or accounts payable). The taxonomy
unsupervised approach/method (129) Approach used for
also defines the relationships between each element (like
data exploration looking for potential patterns of interest.
inventory is a component of current assets and current assets
is a component of total assets).
V
XBRL-GL (420) Stands for XBRL–General Ledger; relates
vertical analysis (407) An analysis that shows the to the ability of an enterprise system to apply XBRL tags
proportional value of accounts to a primary account, such to financial elements within the firm’s financial reporting
as Revenue. system.
ISTUDY
Index
A Sales Tax Owed—Dillard’s, Attributes, descriptive, 58, 70
Abacus, 14 475–479 Audience, effective communication and,
Accountants Lab 9-3: Comprehensive Case: 203–204
role in data management, 460 Calculate Total Sales Tax Audit data analytics, 282–333
skills and tools needed, 13–17, 337 Paid—Dillard’s, 479–486 address and refine results, 288
Accounting Lab 9-4: Comprehensive Case: Benford’s law. See Benford’s law
affect of data analytics on, 5–8 Estimate Sales Tax Owed communicate insights, 288
analytic models for, 116–119 by Zip Code—Dillard’s and descriptive analytics, 286–287,
auditing and, 6. See also Auditing Avalara, 486–492 288–290. See also Descriptive
data reduction and, 120–122 Lab 9-5: Comprehensive Case: analytics
decision support systems, 117, 118, Online Sales Taxes Analysis— diagnostic analytics, 286–287,
141–142, 144, 296 Dillard’s and Avalara, 290–294. See also Diagnostic
profiling and, 126–127 492–497 analytics
regression and, 134–137 LendingClub example, 23–24 examples, 287
skills needed, 13–17, 337 management accounting, 338–339, identify the questions, 284
Accounting data, using and storing, 56 347 Lab 6-1: Evaluate Trends and
Accounting information system, 54, Advanced Environmental Recycling Outliers—Oklahoma, 304–311
56, 70 Technologies (AERT), 126–127 Lab 6-2: Diagnostic Analytics Using
Accounts receivable Advanced machine learning, 117 Benford’s Law—Oklahoma,
cash and, 500–506 Age analysis, descriptive analytics and, 311–317
Question 1 Part 2: How efficiently 287, 289 Lab 6-3: Finding Duplicate
Is the Company Collecting Aggregates/aliases, expand SELECT Payments—Sláinte, 317–321
Cash?, 503–504 SQL clauses, 552–553 Lab 6-4: Comprehensive Case:
Acid test ratio, 408 Ahmed, A. S., 136n4 Sampling—Dillard’s, 321–325
ACL software, 288 AICPA, 62, 249–251, 264 Lab 6-5: Comprehensive Case:
Activity ratios, 408–409 Airbnb, 412 Outlier Detection—Dillard’s,
Address and refine results Alarms, continuous monitoring, 325–332
audit data analytics and, 288 254–255 master the data, 284–286
IMPACT cycle, 13, 23–24, 338–339, Alibaba, 422 nature/extent/timing of, 284
347 Alphabet, 422 perform test plan, 286–288
Lab 2-2: Prepare Data for Analysis— Alternative hypothesis, 131, 144 predictive/prescriptive, 286–287,
Sláinte, 79–83 Alternative stacked bar chart, 196 294–296
Lab 2-7: Comprehensive Case: Altman, Edward, 295n2 track outcomes, 288
Preview Data from Tables— Amazon, 11, 12, 57, 118, 143 when to use, 284–288
Dillard’s, 103–108 Amazon RDS, 56 Audit data standards (ADS), 62,
Lab 2-8: Comprehensive Case: American Institute of Certified Public 249–251, 252, 256, 264
Preview a Subset of Data in Accountants (AICPA), 62, Audit plan, 251–253
Excel, Tableau Using a SQL 249–251, 264 Auditing
Query—Dillard’s, 108–112 Analytics mindset, 14 alarms and exceptions, 254–255
Lab 7-2: Create a Balanced Anscombe, Francis, 183 automation and, 246–247
Scorecard Dashboard— Anscombe’s Quartet, 183–184 clustering approach to, 130–131
Sláinte, 367–376 Apple Inc., 39–40, 410, 411 continuous monitoring techniques,
Lab 8-1: Create a Horizontal and Applied statistics, predictive analytics, 128, 253–254, 257
Vertical Analysis Using 287, 296 data analytics and, 6
XBRL Data—S&P100, Artificial intelligence, 117, 118, 142–143, data reduction and, 120–121
430–437 286, 287, 296 documentation of, 255–256
Lab 9-2: Comprehensive Case: Asset turnover ratio, 409 external, 120–121
Calculate Estimated State Assurance services, 246–247 internal, 120–121, 127–128, 247–248
593
ISTUDY
594 Index
ISTUDY
Index 595
College Scorecard—Cont. Lab 2-6: Build Relationships among Continuous auditing, 128, 253–254, 257
Lab 3-3: Perform a Linear Database Tables—Dillard’s, Continuous data, 188, 205
Regression Analysis, 160–166 98–102 Continuous monitoring/reporting,
Color, use in charts, 201–202, 414 Lab 2-7: Preview Data from Tables— 253–254, 257
Column chart, 198 Dillard’s, 103–108 Control Objectives for Information and
Columns, tables and, 57–58 Lab 2-8: Preview a Subset of Data in Related Technologies (COBIT),
Committee of Sponsoring Excel, Tableau Using a SQL 252
Organizations (COSO), 252 Query—Dillard’s, 108–112 Corptax, 457
Common data model Lab 3-4: Descriptive Analytics: Cost accounting
AICPA and, 249–251 Generate Summary Lab 7-1: Evaluate Job Costs,
defined, 249, 257 Statistics—Dillard’s, 166–169 355–367
Lab 5-1: Create a Common Data Lab 3-5: Diagnostic Analytics: regression approach, 136
Model—Oklahoma, 263–267 Compare Distributions— Cost behavior, 340–341
Lab 5-2: Create a Dashboard Based Dillard’s, 169–174 Cost optimization, 337
on a Common Data Lab 3-6: Create a Data Abstract Coughlin, Tom, 127
Model—Oklahoma, 267–271 and Perform Regression Credit risk score, 20, 22
Common size financial statements, Analysis—Dillard’s, 174–179 Current ratio, 408
407–408, 423, 437–441 (lab) Lab 4-4: Visualize Declarative Data— Customer KPIs, 344. See also Key
Communicating results, 180–243 Dillard’s, 229–236 performance indicators (KPIs)
audience and tone, 203–204 Lab 4-5: Visualize Exploratory Customer relationship management
audit data analytics and, 288 Data—Dillard’s, 236–242 (CRM), 54, 70
charting data. See Charting data Lab 5-5: Setting Scope—Dillard’s, Cybersecurity, 68
content and organization, 202–203 277–281
data visualization. See Data Lab 6-4: Sampling—Dillard’s,
visualization 321–325 D
determining method, 186 Lab 6-5: Outlier Detection—Dillard’s, Daily Mail, 195
introduction to, 13, 24 325–332 Damodaran, Aswath, 412–413
Lab 4-1: Visualize Declarative Lab 7-3: Analyze Time Series Data— Dashboards
Data—Sláinte, 212–218 Dillard’s, 377–389 digital, 125, 144, 341–342, 346, 348
Lab 4-2: Perform Exploratory Lab 7-4: Comparing Results to Lab 4-1: Visualize Declarative
Analysis and Create a Prior Period—Dillard’s, Data—Sláinte, 212–218
Dashboards—Sláinte, 218–222 389–398 Lab 4-2: Perform Exploratory
Lab 4-3: Create Dashboards— Lab 7-5: Advanced Performance Analysis and Create
LendingClub, 223–229 Models—Dillard’s, 398–403 Dashboards—Sláinte, 218–222
Lab 4-4: Comprehensive Case: Lab 9-2: Calculate Estimated State Lab 4-3: Create Dashboards—
Visualize Declarative Sales Tax Owed—Dillard’s, LendingClub, 223–229
Data—Dillard’s, 229–236 475–479 Lab 4-4: Comprehensive Case:
Lab 4-5: Comprehensive Case: Lab 9-3: Calculate Total Sales Tax Visualize Declarative
Visualize Exploratory Paid—Dillard’s, 479–486 Data—Dillard’s, 229–236
Data—Dillard’s, 236–242 Lab 9-4: Estimate Sales Tax Owed Lab 4-5: Comprehensive Case:
management accounting, 339 by Zip Code—Dillard’s and Visualize Exploratory
revision and, 204 Avalara, 486–492 Data—Dillard’s, 236–242
statistics versus visualizations, Lab 9-5: Online Sales Taxes Lab 5-2: Create a Dashboard Based
183–184 Analysis—Dillard’s and on a Common Data
use of words, 202–204 Avalara, 492–497 Model—Oklahoma, 267–271
Complexity of model, versus Computer-assisted audit techniques Lab 7-2: Create a Balanced
classification, 140 (CAATs), 286–288, 297 Scorecard Dashboard,
Compliance, tax, 461–463 Conceptual charts, 195 367–376
Composite primary key, 58, 70 Conceptual data, 187 sales tax data, 462
Comprehensive Case Confidence interval, 531 to track KPIs, 458
Lab 1-4: Questions about Dillard’s Confidence level, 289–290 Data
Store Data, 44–47 Connect, PwC tool, 255 big, 3, 4
Lab 1-5: Connect to Dillard’s Store Content, of written communication, breach of, 53, 68–69
Data, 47–51 202–203 categorical, 186, 205
ISTUDY
596 Index
ISTUDY
Index 597
Decision boundaries, 138, 144 Lab 3-2: Diagnostic Analytics: Excel, Tableau Using a SQL
Decision support systems, 117, 118, Identify Data Clusters— Query, 108–112
141–142, 144, 296 LendingClub, 157–160 Lab 3-4: Comprehensive Case:
Decision trees, 138–139, 144 Lab 3-5: Comprehensive Case: Descriptive Analytics:
Declarative visualizations, 188–191, Diagnostic Analytics: Generate Summary Statistics,
205, 212–218 (lab), 229–236 Compare Distributions— 166–169
(lab) Dillard’s, 169–174 Lab 3-5: Comprehensive Case:
Deductions, what-if analysis and, management accounting, 337–338 Diagnostic Analytics:
465–466 methods used, 122 Compare Distributions,
Delivery process, 504–505 profiling, 117, 118, 123–128 169–174
Deloitte, 291 Question 2.1: Is the Percentage of Lab 4-4: Visualize Declarative Data,
Denormalize data, 79 Sales Returned Significantly 229–236
Dependent variables, 10–11, 26 Higher in January after the Lab 4-5: Visualize Exploratory Data,
Descriptive analytics Holiday Season?, 519–521 236–242
age analysis, 287, 289 Question 2.2: How Do the Percentages Lab 5-5: Comprehensive Case:
audit data analytics and, 286–287, of Returned Sales for Holiday/ Setting Scope, 277–281
288–290 Non-Holiday Differ for Online Lab 6-4: Comprehensive Case:
data reduction. See Data reduction Transactions and across Sampling, 321–325
defined, 144, 297 Different States?, 521–523 Lab 6-5: Comprehensive Case:
examples of, 116, 117–118 Question 2.3: What Else Can Outlier Detection, 325–332
financial statements, 406, 407 You Determine about the Lab 7-3: Comprehensive Case:
Lab 3-1: Descriptive Analytics: Filter Percentage of Returned Sales Analyze Time Series Data,
and Reduce Data—Sláinte, through Diagnostic Analysis?, 377–389
153–157 523–524 Lab 7-4: Comprehensive Case:
Lab 3-4: Comprehensive Case: sequence check, 287, 294 Comparing Results to a Prior
Descriptive Analytics: stratification/clustering, 287, 294 Period, 389–398
Generate Summary t-tests, 287, 290–291 Lab 7-5: Comprehensive Case:
Statistics—Dillard’s, 166–169 tax analytics and, 456, 457 Advanced Performance
management accounting and, Z-score, 123–124, 128, 287, 290, 291 Models, 398–403
337–338 Dictionaries, data. See Data Lab 9-2: Comprehensive Case:
Question Set 1: Descriptive and dictionaries Calculate Estimated State
Exploratory Analysis, Digital dashboards Sales Tax Owed, 475–479
514–519 balanced scorecard, 346 Lab 9-3: Comprehensive Case:
sampling, 287, 289–290 defined, 144, 348 Calculate Total Sales Tax
sorting, 287, 289 key performance indicators and, Paid, 479–486
summary statistics. See Summary 341–342 Lab 9-4: Comprehensive Case:
statistics profiling and, 125 Estimate Sales Tax Owed by
tax analytics and, 456, 457 Dillard’s Stores Inc. Zip Code, 486–492
Descriptive attributes, 58, 70 data dictionary, 586 Lab 9-5: Comprehensive Case:
Diagnostic analytics estimating sales returns, 514–527 Online Sales Taxes Analysis,
audit data analytics and, 286–287, Lab 1-4: Comprehensive Case: 492–497
290–294 Questions about Dillard’s Power Query tutorial, 564–570
benefits, 122 Store Data, 44–47 Discrete data, 188, 205
Benford’s Law, 128, 287, 292–293 Lab 1-5: Comprehensive Case: Discrimination Function, 461
box plots and quartiles, 124–125, Connect to Dillard’s Store Distribution
195, 290 Data, 47–51 predicting, Benford’s law, 128,
cluster analysis, 117, 118, 128–131, Lab 2-6: Comprehensive Case: 292–293
287, 294 Build Relationships among probability, 529–530
defined, 116, 118, 144, 297 Database Tables, 98–102 Distributions, 116
drill-down, 287, 293 Lab 2-7: Comprehensive Case: Documentation, of auditing, 255–256
exact and fuzzy matching, 287, Preview Data from Tables, Drill-down, diagnostic analytics, 287,
293–294 103–108 293
financial statements, 406, 410 Lab 2-8: Comprehensive Case: Dropbox, 272
hypothesis testing, 122, 131–133 Preview a Subset of Data in Dummy variables, 135, 144
ISTUDY
598 Index
ISTUDY
Index 599
ISTUDY
600 Index
ISTUDY
Index 601
ISTUDY
602 Index
ISTUDY
Index 603
ISTUDY
604 Index
OUTER JOIN, SQL clause, 293–294 by Zip Code—Dillard’s and Excel’s Get and Transform,
Outlier detection (lab), 304–311, Avalara, 486–492 565–566
325–332 Lab 9-5: Comprehensive Case: loading data, 569–570
Overfitting, 140, 145 Online Sales Taxes Analysis— return to window after closing, 570
Overlap method, 415 Dillard’s and Avalara, tutorial, 564–570
492–497 uses, 15–16
Performance metrics, 339, 348. See worksheet failure, 569
P also Key performance indicators Pre-pruning, of decision trees, 138
p-Value, 132, 135, 141, 531 (KPIs) Predictive analytics, 133–141
Parameters, versus statistics, 528 Peters, G. F., 123n1, 284n1 artificial intelligence, 296
Park, J., 123n1, 284n1 Pie charts, 192–194, 197 audit data analytics and, 286,
Payments, procure-to-pay (P2P) process, PivotTables 294–296
506–511 address and refine results, 23–24 auditing, 136–137
PCAOB, 252 Lab 2-2: Prepare Data for classification, 118, 133, 137–140,
Perform test plan Analysis—Sláinte, 79–83 287, 295
audit data analytics and, 286–288 Lab 6-5: Comprehensive Case: defined, 116, 118, 145, 297
IMPACT cycle and, 10–13, 20–22, Outlier Detection—Dillard’s, examples of, 116–117
116–119 325–332 financial statements, 406, 410–412
LendingClub, 20–22 performing the test plan using, link prediction, 117, 118, 133
management accounting and, 20–22 management accounting, 136,
337–338 Q 3.1: By Looking at Line Charts 337–338
Perform the analysis for 2014 and 2015, Does the overfitting data, 140, 145
Lab 1-1: Data Analytics Questions in Average Percentage of Sales overview of, 133–134
Financial Accounting, 39–40 Returned in 2014 Seem to probability, 287, 295
Lab 2-2: Prepare Data for Be Predictive of Returns in Q 3.1: By Looking at Line Charts
Analysis—Sláinte, 79–83 2015?, 524–525 for 2014 and 2015, Does the
Lab 2-7: Comprehensive Case: Q 3.2: Using Regression, Can We Average Percentage of Sales
Preview Data from Tables— Predict Future Returns as a Returned in 2014 Seem to
Dillard’s, 103–108 Percentage of Sales Based Be Predictive of Returns in
Lab 2-8: Comprehensive Case: on Historical Transactions?, 2015?, 524–525
Preview a Subset of Data in 526–527 Q 3.2: Using Regression, Can We
Excel, Tableau Using a SQL Q 3.3: What Else Can You Determine Predict Future Returns as a
Query—Dillard’s, 108–112 about the Percentage of Percentage of Sales Based
Lab 3-3: Perform a Linear Returned Sales through on Historical Transactions?,
Regression Analysis—College Predictive Analysis?, 527 526–527
Scorecard, 160–166 tools for, 540–541 Q 3.3: What Else Can You Determine
Lab 4-2: Perform Exploratory Poisson distribution, 530 about the Percentage of
Analysis and Create Population, versus sample, 528 Returned Sales through
Dashboards—Sláinte, 218–222 Post-pruning, decision tree, 138 Predictive Analysis?, 527
Lab 6-3: Finding Duplicate PostGreSQL, 56 regression, 118, 134–137,
Payments—Sláinte, 317–321 Power Automate, 15–16 287, 295
Lab 7-2: Create a Balanced Power BI sentiment analysis, 287, 295–296
Scorecard Dashboard— as an analytics tool, 15–16, 481 tax analytics and, 457–458
Sláinte, 367–376 ask a question, 577–578 Predictor variables, 11, 26
Lab 9-1: Descriptive Analytics: State choosing mode, 573–574 Prescriptive analytics
Sales Tax Rates, 472–475 defined, 16 applied statistics, 287, 296
Lab 9-2: Comprehensive Case: opening, startup screen, 572–573 artificial intelligence, 117, 118,
Calculate Estimated State Sales SQL and, 63 142–143, 286, 287, 296
Tax Owed—Dillard’s, 475–479 tutorial, 572–579 audit data analytics and, 287
Lab 9-3: Comprehensive Case: visualizations/fields/values, 190, decision support systems, 117, 118,
Calculate Total Sales Tax 574–577 141–142, 144, 296
Paid—Dillard’s, 479–486 Power Pivot tool, 385 defined, 117, 145, 297
Lab 9-4: Comprehensive Case: Power Query financial statements, 406, 412–413
Estimate Sales Tax Owed editing data, 567–569 management accounting, 338
ISTUDY
Index 605
ISTUDY
606 Index
Relational database Lab 9-5: Comprehensive Case: Lab 2-2: Prepare Data for Analysis,
defined, 56, 70 Online Sales Taxes 79–83
versus flat file, 57–58 Analysis—Dillard’s and Lab 3-1: Descriptive Analytics: Filter
relationships and, 56–59 Avalara, 492–497 and Reduce Data, 153–157
tables and, 57–58 Sallam, R. L. C., 15 Lab 4-1: Visualize Declarative Data,
Relational database management Sample 212–218
systems (RDBMS), 56 describing, 528–529 Lab 4-2: Perform Exploratory
Relationships, relational databases and, versus population, 528 Analysis and Create
56–59 Sampling Dashboards, 218–222
Relevant costs, 339 descriptive analytics and, 287, Lab 5-3: Set Up a Cloud Folder and
Remote audit work, 255–256 289–290 Review Changes, 272–274
Resnick, B., 53n1 Lab 6-4: Comprehensive Case: Lab 5-4: Identify Audit Data
Response variables, 10–11, 26 Sampling—Dillard’s, Requirements, 275–277
Return on assets ratio, 409 321–325 Lab 6-3: Finding Duplicate
Return on equity (ROE), 409 SAP, 14, 248, 253, 420 Payments, 317–321
Returns, estimating sales. See Scale, charting data and, 201 Lab 7-1: Evaluate Job Costs,
Estimating sales returns Scatter plots, 195, 341 355–367
Revenue analytics skill, 337 Schema, database, 56, 58, 59 Lab 7-2: Create a Balanced
Revenue optimization, 337, 355–367 Scorecard, balanced. See Balanced Scorecard Dashboard,
Revision, of written communication, scorecard 367–376
204 Scripting language, 251 Snapchat, 27
Revlon, 254 Scrubbing data, 10, 14, 53 Snow, John, 181
Richardson, J. L., 15 Securities and Exchange Commission Social media, 7–8
Richardson, V. J., 123n1, 284n1 (SEC), 6, 122, 417 Software needs
RIGHT JOIN, SQL clause, 293, 562 Security, of data, 53 exact and fuzzy matching, 287,
Risk scores, 20, 22 SELECT, SQL clauses, 546 293–294
Robert Half Associates, 64 SELECT FROM, SQL clauses, sampling, 287, 289–290
Robotics process automation (RPA), 3, 547–548 sorting, 287, 289
16, 246 SELECT FROM WHERE, SQL storing data, 56–57
R.R. Donnelley, 419 clauses, 550–551 Software translators, 248–249
Sensitivity analysis, 412 Solvency ratios, 409
Sentiment analysis Sorkin, Andrew Ross, 254
S Lab 8-4: Analyze Financial Sorting, descriptive analytics and, 287,
Sage, 14 Sentiment—S&P100, 289
Sales cycle process, 500–506 444–453 S&P100
Sales returns, estimating. See Estimating predictive analytics and, 287, data dictionary, 587
sales returns 295–296 Lab 8-1: Create a Horizontal and
Sales tax liability text mining and, 415–417 Vertical Analysis Using
evaluating, 462 Sequence check, 287, 294 XBRL Data, 430–437
Lab 9-1: Descriptive Analytics: Shared folders (lab), 272–274 Lab 8-2: Create Dynamic Common
State Sales Tax Rates, Similarity matching Size Financial Statements,
472–475 defined, 11, 26, 145 437–441
Lab 9-2: Comprehensive Case: diagnostic analytics, 117, 118. See Lab 8-3: Analyze Financial
Calculate Estimated State also Diagnostic analytics Statement Ratios, 441–444
Sales Tax Owed—Dillard’s, supervised approach, 133, 145 Lab 8-4: Analyze Financial
475–479 Simsion, G. C., 57n4 Sentiment, 444–453
Lab 9-3: Comprehensive Case: Singer, N., 53n2 Sparklines, 413–414, 423
Calculate Total Sales Tax Singleton, T., 61n5 Spread, describing, 529
Paid—Dillard’s, 479–486 Slack, 255 Spreadsheet software, 15–16
Lab 9-4: Comprehensive Case: Sláinte SQL clauses
Estimate Sales Tax Owed data dictionary, 586 FROM, 546–547
by Zip Code—Dillard’s and Lab 2-1: Request Data from IT, aggregates/aliases, expand SELECT,
Avalara, 486–492 77–78 552–553
ISTUDY
Index 607
SQL clauses—Cont. Storey, V. C., 4n4 Lab 5-2: Create a Dashboard Based
example queries, ORDER BY, Strategy Management Group Company, on a Common Data Model—
551–552 335, 343 Oklahoma, 267–271
Full Outer Join, 293–294 Stratification, diagnostic analytics, 287, versus Microsoft, 189–191
Get and Transform tool, 565–566 294 overview, 16–17
GROUP BY, 554–555 Structured data Question Set 1: Descriptive and
HAVING, 555–556 data reduction and, 120 Exploratory Analysis,
INNER JOIN, 293, 557–558, defined, 4, 26, 145 514–519
560–561 profiling and, 123 tutorial, 582–585
JOIN, 557–558 Structured Query Language (SQL). See Tableau Workbook
LEFT JOIN, 293, 561–562 Microsoft SQL Server Question Set 1: Order-to-Cash
ORDER BY, 551 Sullivan, G. M., 141n6 (O2C), 500–506
parentheses, joining tables, 558 Summary statistics Question Set 2: Procure-to-Pay,
RIGHT JOIN, 293, 562 defined, 117, 145 506–511
SELECT, 546 descriptive analytics, 116, 119–120, Takeda, C., 136n4
SELECT FROM, 547–548 287, 289 Tapadinhas, J., 15
SELECT FROM WHERE, 550–551 Lab 2-4: Generate Summary Target, 133
tutorial, 546–559, 560–562 Statistics—LendingClub, Tax analytics, 454–497
using data from more than one table, 91–95 compliance and liability, 461–463
556–558 Lab 3-4: Comprehensive Case: data management and, 458–461
WHERE, 548–549 Descriptive Analytics: diagnostic/descriptive, 456, 457
See also Microsoft SQL Server Generate Summary IRS and, 456, 461
SQL (Structured Query Language). See Statistics—Dillard’s, 166–169 Lab 9-1: Descriptive Analytics: State
Microsoft SQL Server versus visualizations, 183–184 Sales Tax Rates, 472–475
SQLite, 56 Sunburst diagram, 414, 423 Lab 9-2: Comprehensive Case:
Square Payments, 120 Supervised approach, predictive Calculate Estimated State
Stacked bar chart, 192–193, 196, 198 analytics, 133, 145 Sales Tax Owed—Dillard’s,
Standard normal distribution, 188, 206 Supply chain management system 475–479
Standardization, 62, 188, 206 (SCM), 54, 70 Lab 9-3: Comprehensive Case:
Standardized metrics, 419–420, 421, Support vector machine, 139–140, 145 Calculate Total Sales Tax
423 SurveyMonkey, 528 Paid—Dillard’s, 479–486
Standards, audit data, 62, 249–251, 252, Sweet spot, 140 Lab 9-4: Comprehensive Case:
256, 264 Symbol maps, 194 Estimate Sales Tax Owed
Statistical significance, 131–132 Systems translator software, 248–249, by Zip Code—Dillard’s and
Statistical testing, 531 257 Avalara, 486–492
Statistics Lab 9-5: Comprehensive Case:
applied, 287, 296 Online Sales Taxes
describing the sample, 528–529 T Analysis—Dillard’s and
describing the spread, 529 t-Test Avalara, 492–497
hypothesis testing, 530–531 defined, 297 predictive/prescriptive, 457–458
output from sample t-test of a diagnostic analytics, 287, 290–291 questions addressed using, 456–457
difference of means of two for equal means, 131 for tax planning, 464–466
groups, 532 interpreting output from sample, using the IMPACT model, 456–458
parameters versus, 528 532 visualizations and, 461–463
population versus sample, 528 two-sample, 131–132 Tax compliance, 461–463
probability distributions, 529–530 Table attributes, 57–58 Tax cost, 462–463
regression, interpreting the statistical Tableau Prep, 16–17, 38, 578–581 Tax credits, what-if analysis and,
output from, 533 Tableau Public, 16–17 465–466
statistical testing, 531 Tableau software Tax Cuts and Jobs Act of 2017,
tutorial, 528–533 as an analytics tool, 481 460, 467
versus visualizations, 183–184 Lab 4-2: Perform Exploratory Tax data management, 458–461
StockSnips, 405 Analysis and Create Tax data mart, 459, 467
Storage of data, 56–57 Dashboards—Sláinte, 218–222 Tax efficiency/effectiveness, 463
ISTUDY
608 Index
Tax liability, 461–463 Unfavorable variance, 340 Lab 5-2: Create a Dashboard Based
Tax planning, 464–466, 467 UNICODE, 67 on a Common Data
Tax risk, 463 Unified Modeling Language (UML), Model—Oklahoma, 267–271
Tax sustainability, 463 56, 255 Lab 9-1: Descriptive Analytics: State
Taxes, data analytics and, 8 Uniform distribution, 530 Sales Tax Rates, 472–475
Taxonomy, 417–418, 423 Unique identifier, 57 normal distribution, 188
TeamMate, 255, 288, 296 U.S. GAAP Financial Reporting preference over text, 184–185
Teradata, 56 Taxonomy, 417 purpose of, 185–192
Tesla, 455 U.S. Securities and Exchange qualitative versus quantitative data,
Test data, 138, 145 Commission (SEC), 6, 122, 417 186–188
Test plan Unix commands, 14 Question 3.1: By Looking at Line
audit data analytics, 286–288 Unstructured data, 4, 26 Charts for 2014 and 2015,
IMPACT cycle and, 10–13, 20–22, Unsupervised approach, 129, 145 Does the Average Percentage
116–119 of Sales Returned in 2014
LendingClub, 20–22 Seem to Be Predictive of
management accounting and, V Returns in 2015?, 524–525
337–338 Validation, of data, 53, 64–65, 82–83 Question Set 1: Descriptive and
Text, versus data visualization, 184–185 Value, describing sample by middle or Exploratory Analysis,
Text data types, SQL WHERE clauses, typical, 528–529 514–519
549 Values, organizational, 345 relative size of accounts, 414
Text mining, 415–417 Variability of data, describing, 529 showing trends, 413
Thomas, S., 136n4 Variables sparklines, 413–414, 423
Time series analysis, 137, 145, 377–389 dependent/independent, 10–11, 26 versus statistics, 183–184
(lab) dummy, 135, 144 sunburst diagrams, 414, 423
Times interest earned ratio, 409 explanatory, 11, 26 tax analytics and, 461–463
Tolerable misstatements, 290 predictor, 11 tools for, 189–191
Tone, of written communication, response, 10–11, 26 use of written communication,
203–204 Variance analysis, 116, 127, 339–340, 202–204
Total cost, 339 355–367 See also Communicating results
Tracking outcomes, 13, 24, 288, 339 Vertical financial statement analysis, VLookup function, Excel, 63, 542–543
Trade-off, 140 407–408, 411, 423, 430–437
Training data, classification of, 138, 145 Vision, organizational, 345
Transforming data, 64–67 Visualization of data W
Translator software, 248–249 Anscombe’s Quartet, 183–184 Walmart, 127, 129, 345
TransUnion, 27 categorical data, 186, 205 Walton College of Business, 341–342
Tree maps, 194 charting. See Charting data Warehouse, for data, 248, 257, 459, 467
Trend analysis, 411 declarative, 188–191, 205 Wayfair decision, 462
Trendlines, visualizing, 413–414 exploratory, 188–191, 205 Web browser
True negative alarm, 254–255 heat maps and, 181–182, 194, 414 Lab 1-0: How to Complete Labs,
True positive alarm, 254–255 Lab 4-1: Visualize Declarative 36–39
TurboTax, 141, 296 Data—Sláinte, 212–218 Lab 1-1: Data Analytics Questions in
Turnover, of inventory, 246–247, 409 Lab 4-2: Perform Exploratory Financial Accounting, 39–40
Turnover ratios, 408–409 Analysis and Create Lab 1-3: Data Analytics Questions in
Twitter, 7, 461 Dashboards—Sláinte, Auditing, 42–44
Two-sample t-test, 131–132 218–222 Lab 1-4: Questions about Dillard’s
Typical value, describing samples by, Lab 4-3: Create Dashboards— Store Data (Comprehensive
528–529 LendingClub, 223–229 Case), 44–47
Lab 4-4: Comprehensive Case: Lab 5-3: Set Up a Cloud Folder and
Visualize Declarative Review Changes—Sláinte,
U Data—Dillard’s, 229–236 272–274
Uber, 415 Lab 4-5: Comprehensive Case: Lab 5-4: Identify Audit Data
UML diagrams, 56, 255 Visualize Exploratory Requirements—Sláinte,
Underfitting data, 140, 145 Data—Dillard’s, 236–242 275–277
ISTUDY
Index 609
ISTUDY
ISTUDY
ISTUDY
ISTUDY
ISTUDY
ISTUDY